Thursday, May 24, 2018

Do you study individual difference? A Challenge

Can you solve the following problem that I think is hard, fun, and important.  I cannot.

The problem is that of characterizing individual differences for individuals performing cognitive tasks.  Each task has a baseline and an experimental condition, and the difference, the effect, is the target of interest.   Each person performs a great number of trials in each condition in each tasks, and the outcomes on each trial is quite variable (necessitating the multiple trials).  There are a number of tasks, and the goal is to estimate the correlation matrix among the task effects.
That is, if a person has a large effect in Task 1, are they more likely to have a large effect in Task 2.

Let's try an experiment with 200 people, 6 tasks, and 150 replicates per task per condition with simulated data.  When you factor in the two conditions, there are 360,000 observations in total.  Our goal is to estimate the 15 unique correlation coefficients in the 6-by-6 correlation matrix.  Note we have what seems to be a lot of data, 360K observations, for just 15 critical parameters.   Seems easy.

Unfortunately, the problem is seemingly more difficult, at least for the settings which I think are realistic for priming and context tasks, than one might think.

Here is code to make what I consider realistic data:

library(mvtnorm)

set.seed(123)
I=200 #ppl
J=6 #tasks
K=2 #conditions
L=150 # reps
N=I*J*K*L
sub=rep(1:I,each=J*K*L)
task=rep(rep(1:J,each=K*L),I)
cond=rep(rep(1:K,each=L),I*J)
subtask=cbind(sub,task)

myCor=diag(J)
myCor[1,2]=.8
myCor[3,4]=.8
myCor[5,6]=.8
myCor[1,3]=.4
myCor[1,4]=.4
myCor[2,3]=.4
myCor[2,4]=.4
myCor[lower.tri(myCor)]  <- t(myCor)[lower.tri(myCor)]
myVar=.02^2*myCor

t.alpha=rmvnorm(I,rep(1,J),diag(J)*.2^2)
t.mu=rep(.06,J)
t.theta=rmvnorm(I,t.mu,sigma=myVar)
t.s2=.25^2
t.cell=t.alpha[subtask]+(cond-1)*t.theta[subtask]
rt=rnorm(N,t.cell,sqrt(t.s2))
dat=data.frame(sub,task,cond,rt)

When you create the data, you are trying to estimate the correlation matrix of t.theta, the true effects per person per tasks.

cor(t.theta)

You will notice that Tasks 1 and 2 are highly correlated, Tasks 3 and 4 are highly correlated, and Tasks 5 and 6 are highly correlated.  And there is moderate correlation across Tasks 1 and 3, 1 and 4, 2 and 3, and 2 and 4.  The rest are in the weeds.  Can you estimate that pattern?

If you just take means as estimators, you are swamped by measurement error.  The tight correlation among the pairs of tasks is greatly attenuated.  Here is the code:

mrt=tapply(dat$rt,list(dat$sub,dat$task,dat$cond),mean)
sample.effect=mrt[,,2]-mrt[,,1]
cor(sample.effect)


I guess I am wondering if there is any way to recover the correlations with acceptable precisions.    Perhaps they are forever lost to measurement noise.  I certainly cannot with my home-baked, roll-your-own Bayesian models.   If I tune the priors, I can get high correlations where I am suppose to, but the other ones are too variable to  be useful.  So either the problem is not so tractable or my models/methods are inferior.  I can share what I did if you wish.

So, can you recover the 15 correlations with acceptable precision?  I appreciate your help and insight.

Best,
Jeff