Monday, August 27, 2018

Are there human universals in task performance? How might we know?

Science has traditionally proceeded by understanding constraint among variables. In a series of new papers, Julia Haaf and I ask if there are human universals in perception, cognition, and performance.   Here is an outline of our development.

What Do We Mean By "Human Universal"?


Let's take the Stroop task as an example. Are there any human universals in the Stroop task?  In this task, people name the color of compatible color words (e.g., RED) faster than incompatible ones (e.g., GREEN).   We will call this a positive Stroop effect, and, from many, many experiments, we know that on average, people have positive Stroop effects.  But what about each individual?   A good candidate for a human universal is that each individual shows a true positive effect.  And, conversely, no individual has a true negative Stroop effect where incongruent colors are named faster than congruent ones.  We call this the "Does Everyone" question.  Does everyone have a true nonnegative Stroop effect?  We propose a universal order-constraint on true performance.



The above figure shows observed Stroop effects across many individuals (data courtesy of Claudia von Bastian).  As can be seen, 8 of 121 people have negative Stroop effects.  But that doesn't mean the "Does Everyone" condition is violated.  The negative-going observations might be due to sample noise.  To help calibrate, we added individual 95% CIs.  Just by looking at the figure, it seems plausible from these CIs that indeed, everybody Stroops.

"True effect" in this context means in the limit of many trials, or as the CIs become vanishingly small.  The question is "if you had a really, really large number of trials for each person in the congruent and incongruent conditions, then would each and every individual have a positive effect.

Is The "Does Everyone" Question Interesting?

We would like to hear your opinion here.  In exchange, we offer our own.  

We think the "does everyone" question is fascinating.   We can think of some domains in which it seemingly holds, say Stroop or perception (nobody identifies quite dim words more quickly than modestly brighter ones).  Another domain is priming---it is hard to imagine that there are people who respond faster DOCTOR following PLATE than DOCTOR following NURSE.     And then there are other domains where it is assuredly violated including handedness (some people truly do throw a ball farther with their left hand) and preference (as strange as it may seem, some people truly prefer sweetened tea to unsweetened tea).    Where are the bounds?  Why?

The "does everyone" question seemingly has theoretical ramifications too.  An affirmative answer means that the processes underlying the task may be common across all people---that is, we may all access semantic meaning from text the same way.  A negative answer means that there may be variability in the processes and strategies people bring to bear.

Answering The Does Everyone Question.


The "Does Everyone" question is surprisingly hard to answer.  It is more than just classifying each individual as positive, zero, or negative.  It is a statement about the global configuration of effect.   We have seen cases where we can say with high confidence that at least one person has a negative true effect without being able to say with high confidence who these people are.  This happens when there are too many people with slightly negative effects.  We have been working hard over the last few years to develop the methodology to answer the question.  For the plot above, yes, there is evidence from our developments that everybody Stroops positive.
  

Our Requests:

  • Comment please.  Honestly, we wrote the below papers, but as far as we can tell, they have yet to be noticed.  
  • Is the "Does Everyone" question interesting to you?  Perhaps not.  Perhaps you know the answer for your domain.  
  • Can you think of a cool domain where whether everyone does somethings the same way is really theoretically important?  We are looking for collaborators!  (you can email jrouder@uci.edu or answer publicly).

Some of Our Papers on the Topic:

  1. Haaf & Rouder (2017), Developing Constraint in Bayesian Mixed Models.  Psychological Methods, 22, p 779-.  In this poorly entitled paper, we introduce the Does Everyone question and provide a Bayes factor approach for answering it.  We apply the approach to several data sets with simple inhibition tasks (Stroop, Simon, Flanker).   The bottom line is that when we can see overall effects, we often see evidence that everyone has the effect in the same direction.
  2. Haaf & Rouder (in press), Some do and some don’t? Accounting for variability of individual difference structures.  Psychonomic Bulletin & Review.  We also include a mixture model where some people have identically no effect while others have an order constrained effect.
  3. Thiele, Haaf, and Rouder (2017) Is there variation across individuals in processing? Bayesian analysis for systems factorial technology.  Journal of Mathematical Psychology,  81, p40-54.  A neat application to systems factorial technology (SFT).  SFT is a technique for telling whether people are processing different stimulus dimensions in serial, parallel, or co-actively by looking at the direction of a specific interaction contrast.  We ask whether all people have the same direction of the interaction contrast.

2 comments:

Guillaume Rousselet said...

Hi,
yes the "does everyone" question is fascinating! Individual differences are interesting in their own way and indeed have implications for both theories and for planning experiments (Rouder & Haaf, AMPPS, 2018, which you should list on this page too). The question can also be asked at a lower level of analysis, within participants, by quantifying how distributions differ, and if these differences are consistent across participants. This relates to your work on delta plots and stochastic dominance applied to reaction times. To try to combine the two levels of analysis, I'm planning a project with an honours student, in which we will quantify participants' patterns of reaction time differences using shift functions, and determine proportions of participants with similar patterns.
I've summarised some of these ideas here:
https://www.biorxiv.org/content/early/2017/03/27/121079
More than happy to collaborate with you on this project.

Donald Williams said...

Hi Jeff:
I think this is very important to consider, and something I do often in my own work where I like to see how many of the individual effects were in the same direction, etc. I do think that people have for to long use the p-value as the only source of information, and in doing so, have actually hidden many important things that should be used when making inference. For example, in clinical work, if there is "significant" variation in the treatment effect, I would like to know what this actually means. This information is not provided by reporting a RE variance and a p-value (or a Bayes factor), and it could be that the treatment actually harmed some people.

While I do sometimes use Bayes factors, I sometimes wonder what exactly is really gained from looking at the posterior. As a "toy" example, say we compare three hypotheses: N(-.2, 0.05), N(0, 0.05), and N(.2, 0.05), where we find support for the small positive effect. What exactly has been gained compared to simply looking at the credible interval (or even a confidence interval for that matter)? I think this does apply here (or in a linked paper), where I think I am just as happy with the % of people as I am with the Bayes factor. In fact, I think I would actually just prefer, for each person, the posterior probability in the direction of the posterior mean. This could be a result of my leaning more towards local descriptions, of late, rather than inference to the "population."

That said, I do think this is very (very) important work, if for nothing else showing that our models provide so much more information than a p-value, BF, etc. This also applies to MLMs fitted with ML, REML, etc and I think we would all be better off from visualizing the consistency of the individual (shrunken) effects. So the importance, to me, is not the Bayesian innovation but providing more information for inference (even if it is descriptive).