**Fruit and Meta-Analysis**

The fruit in my house weigh on average 93 grams. I know this because I weighed them. The process of doing so is a good analogy for meta-analysis, though a lot less painful.

I bet you find the value of 93 grams rather uninformative. It reflects what my family likes to eat, say bananas more than kiwis and strawberries more than blackberries. In fact, even though I went through the effort of gathering the fruit, setting fruit criteria (for the record, I excluded a cucumber because cucumbers, while bearing seeds, just don't taste like fruit), and weighing them, dare I say this mean doesn't mean anything. And this is the critique Julia Haaf (@JuliaHaaf), Joe Hilgard (@JoeHilgard), Clint Davis-Stober (@ClintinS) and I provide for meta-analytic means in our

just-accepted Psychological Methods paper. Just because you can compute a sample mean doesn't mean that it is automatically helpful.

Means are most meaningful to me when they measure the central tendency of some naturally interesting random process. For example, if I was studying environmental impacts on children's growth in various communities, the mean (and quantiles) of height and weight for children of given ages is certainly meaningful. Even though the sample of kids in a community is diverse in wealth, race, etc., the mean is helpful in understanding say environmental factors such as a local pesticide factory.

In meta-analysis, the mean is over something haphazard...what type of paradigms happen to be trendy for certain questions. The collection of studies is more like a collection of fruit in my house. And just as the fruit mean reflects my family's preferences about fruit as much as any biological variation among seeded plant things, the meta-analytic mean reflects the sociology of researchers (how they decide what data to collect) as much as the phenomenon under study.

**Do All Studies Truly?**

In our recent paper, we dispense with the meta-analytic mean. It simply is not a target for us for scientific inference. Instead, we ask a different question, "Do All Studies Truly...." To set the stage, we note that most findings have a canonical direction. For example, we might think that playing violent video games increases rather than decreases subsequent aggressive behavior. Increases here is the canonical direction, and we can call it the positive effect. If we gather a collection of studies on the effects of video game violence, do all truly have an effect in this positive direction, that is do all truly increase aggression or do some increase and others truly decrease aggression. Next, let's focus on truly. Truly for a study refers to the what would happen in the large-sample limit of many people. In any finite sample for any study, we might observe a negative-direction effect from sampling noise, but the main question is about the true values. Restated, how plausible is it that all studies have a true positive effect even though some might have negative sample effects? Using Julia's and my previous work, we show how to compute this plausibility across a collection of studies.

**So What?**

Let's say, "yes," it is plausible that all studies of a collection truly have effects in a common direction, say violent video games do indeed increase aggression. What is implied is much more constrained than some statement about the meta-analytic mean. It is about robustness. Whatever the causes of variation in the data set, the main finding is robust to these causes. It is not that just the average shows the effect, but all studies plausibly do. What a strong statement to make when it holds!

Now, let's take the opposite possibility, "no." It is not plausible that all studies truly have effects in a common direction. With high probability some have true effects in the opposite direction. The upshot is a rich puzzle. Which studies go one way and which go the other way? Why? What are the mediators?

In our view, then, the very first meta-analytic question is "do all studies truly." The answer will surely shape what we do next.

**Can You Do It Too?**

Maybe, maybe not. The actual steps are not that difficult. One needs to perform a Bayesian analysis and gather the posterior samples. The models are pretty straightforward and are easy to implement in the Bayes Factor package, stan or JAGS. Then, to compute the plausibility of the "do all studies truly" question, one needs to count how many posterior samples fall in certain ranges. So, if you can gather MCMC posterior samples for a model and count, you are in good shape.

We realize that some people may be drawn to the question and may be repelled by the lack of an automatic solution. Julia and I have unrealized dreams of automating the process. But, in the meantime, if you have a cool data set and an interest in the does-every-study-truly question, let us know.

Rouder, JN, Haaf, JM, Davis-Stover, C, Hilgard, J (in press) Beyond Overall Effects: A Bayesian Approach to Finding Constraints Across A Collection Of Studies In Meta-Analysis. *Psychological Methods.*