The Twitter Fight About Numbers of ParticipantsAbout a month back, there was an amusing Twitter fight about how many participants one needs in a within-subject design. Cognitive and perceptual types tend to use just a few participants but run many replicates per subject. For example, we tend to run about 30 people with 100 trials per condition. Social psychologists tend to run between-subject designs with one or a handful of trials per condition, but they tend to run experiments with more people than 30. The Twitter discussion was a critique of the small sample sizes used in cognitive experiments.
In this post, I ask how wise are the cognitive psychologists by examining the ramifications of these small numbers of participants. This examination informed by a dominance principle, a reasonable conjecture on how people differ from one another. I show why the cognitive psychologist right---why you need small numbers of people even to detect small effect.
Consider a small priming effect, say an average 30 ms difference between primed and unprimed conditions. There are a few sources of variation in such an experiment:
The variability within a person and condition across trials. What happens when the same person responds in the same condition? In response times measures, this variation is usually large, say a standard deviation of 300 ms. We can call this within-cell variability.
The variability across people. Regardless of condition, some people just flat out respond faster than others. Let's suppose we knew exactly how fast each person was, that is we had many repetitions in a condition. In my experience, across-people variability is a tad less than within-person-condition variability. Let's take the standard deviation at 200 ms. We can call this across-people variability.
The variability in the effect across people. Suppose we knew exactly how fast each person was in both conditions. The difference is the true effect, and we should assume that this true effect varies. Not everyone is going to have an exact 30 ms priming effect. Some people are going to have a 20 ms effect, others are going to have a 40 ms effect. How big is the variability of the effect across people? Getting a handle on this variability is critical because it is the limiting factor in within-subject designs. And this is where the dominance principle comes in.
A Dominance Principle
The dominance principle here is that nobody has a true reversal of the priming effect. We might see a reversal in observed data, but this reversal is only because of sample noise. Had we enough trials per person per condition, we would see that everyone has at least a positive priming effect. The responses are unambiguously quicker in the primed condition---they dominate.
The Figure below shows the dominance principle in action. Shown are two distributions of true effects across people---an exponential and a truncated normal. The dominance principle stipulates that the true effect is in the same direction for everyone, that is, there is no mass below zero. And if there is no mass below zero and the average is 30 ms, then the distributions cannot be too variable. Indeed, the two shown distributions have a mean of 30 ms, and the standard deviations for these exponential and truncated normal distributions are 30 ms and 20 ms, respectively. This variability is far less than the 300 ms of within-cell variability or the 200 ms of across-people variability. The effect size across people, 30 ms divided by these standard deviations, is actually quite large. It is 1 and 1.5 respectively for the shown distributions.
You may see the wisdom in the dominance principle or you may be skeptical. If you are skeptical (and why not), then hang on. I am first going to explore the ramifications of the principle, and then I am going to show it is probably ok.
Between and Within Subject Designs
The overall variability in a between subject design is the sum of the variabilities, and it is determined in large part by the much larger within-cell and across-people variabilities. This is why it might be hard to see a 30 ms priming effect in a typical between subject design. The effect size is somewhere south of .1.
The overall variability in a within subject design depends on the number of trials per participant. In these designs, we calculate each person's mean effect. This difference has two properties: first, we effectively subtract out across-participant variability; second, the within-cell variability decreases with the number of trials per participant. If this number is great, then the overall variability is limited by the variability in the effect across people. As we stated above, due to the dominance principle, this variability is small, say about the size of the effect under consideration. Therefore, as we increase the numbers of observations per person, we can expect effects of 1 or even bigger.
Simulating Power for Within Subject Designs
Simulations seem to convince people of points perhaps even more than math. So here are mine to show off the power of within-subject designs under the dominance principle. I used the 300 ms within-cell and 200 ms across-people variabilities and sampled 100 observations per person per condition. Each person had a true positive effect, and these effects we sampled from the truncated normal distribution with an overall mean of \( \mu \). Here are the power results for several sample sizes (numbers of people) and value of an average effect \( \mu \).
The news is quite good. Although a 10 ms effect cannot be resolved with fewer than a hundred participants, the power for larger effects is reasonable. For example the power to resolve a 30 ms effect with 30 participants is .93! Indeed, cognitive psychologists know that even small effects can be successfully resolved with limited participants in massively-repeated within-subjects designs. It's why we do it routinely.
The bottom line message is that if one assumes the dominance principle, then the power of within-subject designs is surprisingly high. Of course, without dominance all bets are off. Power remains a function of the variability of the effect across people, which must be specified.
Logic and Defense of the Dominance Principle
You may be skeptical of the dominance principle. I suspect, however, that you will need to assert it.
1. The size of effects are difficult to interpret without the dominance principle. Let's suppose that the dominance principle is massively violated. In what sense is the mean effect useful or interpretable. For example,suppose one has a 30 ms average effect with 60% of people having a true positive effect and 40% of people having a true a negative priming effect . The value of the 30 ms seems unimportant. What is critically important in this case is why the effect is different in direction across people. A good start is exploring what person variables are associated with positive and negative priming?
2. The dominance principle is testable. All you have to do is collect a few thousand trials per person to beat the 300 ms within-cell variability. If you want say 10 ms resolution per person, just collect 1000 observations per person. I have done it on several occasions, collecting as much as 8,000 trials per person on some (see Ratcliff and Rouder, 1998, Psych Sci). I cannot recall a violation though I have no formal analysis....yet. The key is making sure you do not confound within-participant variability, which is often large, with between participant variability. You need a lot of trials per individual to deconfound these sources. If you know of a dominance violation, then please pass the info along.
Odds are you are not going to collect enough data to test for dominance. And odds are that you are going to want to interpret the average effect size across people as meaningful. And to do so, in my view, you will therefore need to assume dominance! And this strikes me as a good thing. Dominance is reasonable in most contexts, strengthens the interpretation of effects, and leads to high-power even with small sample sizes in within-subject designs.