Tuesday, January 12, 2016

Bayesian Modeling to Improve The Spatial and Temporal Resolution in fMRI: Part 1.

My colleagues Marco Ferreira, Yuan Cheng, Jeff Johnson (@JeffMRI)and I have been thinking about how to improve the spatial and temporal resolution of fMRI with big-data type Bayesian models.  This post describes our current approach.

Although plain-old vanilla gradient-echo BOLD totally rules, the physiological mechanisms underlying it are complex and only coarsely understood.  BOLD does not have ideal spatial resolution.  And it has truly horrid temporal resolution.  Other functional MR methods have been developed (e.g., ASL, VASO), but the appeal of BOLD is its superior signal-to-noise ratio (Menon, 2012, NeuroImage).  We think the spatial and temporal resolutions of BOLD can be increased by a bit of sophisticated modeling focused on isolating the controversial initial-dip component.

A Nano-Sized Review

Before we begin, however, here is a a nano-sized review of the usual approach.  The main technique is convolution, and we convolve a biphasic response function with stimuli.  The left panel shows shows a typical biphasic response function, the center panel shows stimulus-presentation times for six stimulus presentations, and the right panel shows the convolution, which is the predicted signal.  The last step is to regress the BOLD signal onto this predicted signal.  Voxels that correlate well are colored yellow, orange, and red, and a pretty map is drawn.

BOLD is a complex and only coarsely understood physiological signal that reflects many variables (van Zijl et al., 2012, NeuroImage).   The prevailing wisdom is that BOLD reflects in large part the pooling of oxygenated blood in venous structures and this pooling  can be millimeters away from the source.  Moreover, the BOLD respond is painfully slow.  It does not start for 3 seconds, takes 6 to 8 seconds to peak, and then has a reversal and an elongated recovery period.  Ugghhh.

The Initial Dip

The initial dip is considered the mythical unicorn of fMRI.  It is a small, negative deflection in the first few seconds that is thought to perhaps reflect the immediate deoxygenation of blood in the capillary beds before the vascular response.  Given this mechanism, it assuredly would better localize the spatial and temporal locations of activation.  The problem is that the initial dip is rarely seen in 3T fMRI settings.  Evidence for it seems to come only from a few labs (Hu and Yacoub, 2012, NeuroImage).

HRF Variability

The overwhelming majority of researchers use a single HRF for all voxels.  In fact, most use a stock HRF in SPM or AFNI or BrainVoyager.   Voxels differ from each other in one parameter or in one way---how well they correlate to the predicted convolution of the stock HRF with the stimulus presentation times.

Yet, there is known variability in the shape of the HRF across many factors including cortical layers, factors that affect blood flow such as caffeine intake, and even across random factors such as people.  Moreover, different voxels contain different proportions of veins, arteries, and capillaries.  Assuredly, the HRF will vary from voxel-to-voxel.  Perhaps the initial dip will be averaged out in voxels with large veinous contributions and only be present in voxels with sufficient capillary contributions.  Searching for the initial dip may be enhanced by a voxel-by-voxel approach where some active voxels have the dip and others do not.

Our Contributions To See the Initial Dip  

1. A parametric HRF with initial-dip components.  The following figure shows our HRF, which is the addition of the initial dip, the main component, and the undershoot.  We estimate a time constant (where the peak is) and a dispersion (width) of each component.  We also estimate the size of the initial dip and undershoot relative to the main component.  The total is 8 parameters.

2. We estimate a separate HRF for each and every voxel.  This way we can draw maps of initial dip sizes and time constants in addition to overall activity.  Woolrich et al., (2004, IEEE Med Imaging) provides a similar voxel-by-voxel approach with initial dip measurements.

3. We use modern Bayesian model selection with nonlocal priors to select active voxels.  I am learning about Johnson & Rossell's (2012,  J Am Stat Assoc)  pMOM priors which stand in contrast to the more familiar JZS, local priors. These priors are especially useful in large-dimension models such as those that account for activity across a large number of voxels.

4. We changed acquisition to maximize the chances of observing and timing the initial dip as follows:  I. We used a slow-event related design where trials were 20 sec.  This choice allows us to better separate the initial dip of one trial from the undershoot of the preceding one.  II.  We used a finely-jittered presentation schedule.  The SOA between trial start and TR was uniformly distributed in units of 16.7ms.  This fine jittering allows for much more precise estimation of HRF parameters.  III. We acquired sequentially rather than interleaved so that spatially adjacent slices were also temporally adjacent.  With this choice, interpolation between slices in motion correction and normalization provides only minimal distortion of temporal information.


The below figure is from a single participant who was presented a flashing checkerboard cue and then an imperative of "L" or "R" to indicate a left or right button press, respectively.  We looked at any voxels correlated with stimulus presentation timings and these voxels include those that respond to the checkerboard, the imperative, or just stray musings like the thought that a trial is occurring.  There are 8 parameters for every active voxel, and to explore the variation across these 8, we used K-means clustering.  In the movie, below, I review the characteristics of each group.  Then I show how these groups are distributed across the brain.  The movie is not too long.



As we see it, there is a surprising amount of structure.  Voxels without undershoot are primarily medial and occipital while those without without initial dips are lateral and relatively dorsal.  Clearly, there is much spatial clustering of the types.   We view this degree of structure as enticing us to keep going with the development.  Your thoughts?


Keith Schneider said...

I don't think that this will improve temporal resolution (which is about 100 ms, because although the HRF needs to be deconvolved from the stimulus, it is actually fairly linear, such that the HRF will be delayed 100 ms if the stimulus is delayed 100 ms. But, there is the idea that the initial dip is spatially more specific since it reflects local neural activity rather than the tidal wave of oxygenated blood that follows. The initial dip was controversial at first. There was a joke that it was spatially localized to Minnesota, because only Ravi Menon (who was there at the time) could find it. It would be interesting to ask Ravi now what he thinks about the initial dip. I haven't seen anyone able to successfully take advantage of it in the last 20 years.

Jeff Rouder said...

Hi Keith, Thanks for the comments. I am pretty familiar with the joke, and thought about including it. I guess we will see what we can learn. Best, Jeff

Sam Schwarzkopf said...

Thanks this is really interesting. In my own experience I have never actually seen this massive variability in HRF shapes, which surprised me given how often this is mentioned. I do see if between people but within a person's occipital cortex (haven't looked at other sensory modalities) it always seemed pretty constant. I always wondered whether this is because we take only grey matter voxels (2.3mm isotropic resoution and approximately from the middle of cortex). This should at least reduce the contribution from large blood vessels, sinuses etc?

Of course, there may be something about how I'm doing it that makes that difference. Also, my comparisons so far were always just looking at the voxels from different ROIs (say, comparing V1, V2, V3, MT, or taking the whole occipital). To construct our HRFs I also preselect voxels that seem to show some visual response. I don't think this should massively skew things but who knows.

Jeff Rouder said...

Thanks Sam. I am surprised and a bit worried by the massive amount of initial dip. It seems too large. Except for its presence or absence, there isnt that much variability here either. I think the big difference between our research and previous ones is on the acquisition side. If your slices are interleaved rather than sequential, you basically kill the temporal structure when you do motion correction or any normalization preprocessing. Plus, we find the fine jittering to be essential. I dont know of any studies with such fine jittering (16.7ms intervals). And the long-duration trials are really helpful for disentangling the initial dip from one presentation relative to the undershoot from the previous one. Best, Jeff

tal said...

I would be wary of interpreting what you're seeing as evidence of the initial dip. As you note, there's enormous variability in the BOLD signal across brain regions and individuals, not to mention that the temporal structure of the design, jittering, etc., can all affect time course estimates in unpredictable ways. One thing I would suggest doing as a complementary approach, if you haven't already, is modeling your data with staggered stick functions (i.e., finite-impulse responses) to estimate the time course in a less biased way. In my experience this is very helpful in calibrating one's understanding of how the HRF behaves. I've seen many, many cases where fitting with a canonical HRF appears to produce robust activation increases, but when you estimate the time course with FIR, it's obvious that something funky is going on, and the HRF just happens to correlate non-negligibly with a key part of the time course.

One counter-argument would be that fMRI data are inherently noisy, so just because a response doesn't look canonical when estimated imprecisely doesn't mean the model is inappropriate. I think that's a reasonable argument (not saying it's yours, but one I've heard fairly often), but it basically amounts to putting all the weight on the prior. I.e., my guess is that most fMRI researchers will look at these results and say, "yeah, well, go collect a lot more trials from a lot more participants in multiple experiments, use fine-grained jittering, and show us FIR estimates where it's clear to the eye that there's an initial dip--and then we'll believe you." The fact that adding an initial dip-like parameter to your HRF helps capture a bunch more variance is not, in my view, likely to convince very many people that what you're seeing is a genuine initial dip, unless they already believe such a thing exists and is reasonably robust.

On a different note, I confess to being a bit skeptical that jittering in increments of 17 ms would dramatically help improve temporal precision. I might buy that in a design where you had hundreds and hundreds of rapidly interleaved trials for participants (so that efficiency is already high), but it seems rather unlikely in a slow event-related design where you have a limited number of observations at each point in the time course. At the very least, I'd want to see FIR estimates with error bars around each timepoint. (You could also do a rough test of how much difference it makes by rounding event onsets to the nearest, say, 250 ms, and seeing if your parameter estimates change much).

That all said, I think this is really neat work. The fact that almost everyone routinely applies some variant of the "canonical" transfer function has always made me a bit uneasy, as there are plenty of cases where we have reason to think we're missing out on important phenomena because of misspecification (e.g., I once argued that non-standard BOLD responses in white matter may account for the general belief that there's no signal there). Looking forward to reading the paper when it comes out.

Jeff Rouder said...

Hi Tal,

Thanks for writing.

We chose a hi-resolution jittered approach and used regression to perform functional estimation. The alternative is to use a few fixed points, say a sample every 2 seconds, and bin up, take a mean, put an error bar around it. Yet, the modern statistical functional estimation literature stresses the increased efficiency in the regression approach, either parametric or not. We could use a nonparametric regression approach, say with B-splines, though we havent developed the method yet. So, is your critique with regression vs. choosing a few levels so one could "see" what is going on (I have a problem with that one), or is it parameteric vs. nonparametric modeling (this is fine, it is a legit question and there are many ways to skin a cat). I think we could do the same at 250ms jitter, but I would not want to go much coarser than that. We did 60hz just because we could. If I could do it continuously, I would.


Tal Yarkoni said...

In general, I'm A-okay with regression; it's clearly much more efficient than binning and averaging (in this particular case, there may not be much of a difference, since the event-related design is slow). My objection is basically that there are many ways to parameterize and interpret the HRF, and the fact that you have a parameter that captures something that's happening early on doesn't mean it's really the long-sought initial dip. I suspect that if you were to add parameters like "absurdly late rise" or "spike around 1.2 seconds after trial onset", you would also identify brain regions that happen to show that particular profile--but you wouldn't want to ascribe any particular meaning to those parameters.

I'm not entirely sure what your objection is to "seeing" what's going on via FIR estimation. You're already estimating quite a few parameters with your current parameterization, and you're plotting one of them and concluding the initial dip is (surprisingly) large because it looks bigger than you would expect. I share your concern about the magnitude of that parameter, and my point is that one way to calibrate one's intuitions about whether it makes sense to think of this as a reflection of the initial dip (rather than uninteresting but systematic variation in HRF shapes across the brain) is to fit an alternative model that doesn't carry the same theoretical assumptions. Meaning, if you fit an FIR model, and it turns out that the estimates you get for the very earliest timepoint are not dramatically different in magnitude from other later estimates that you don't expect (e.g., very late in the timecourse, past the expected peak of the canonical HRF), then you might not want to conclude that you're really seeing the initial dip per se. Brain regions do funny things, and there is a lot of systematic variance that isn't generally captured by the canonical HRF, but also isn't of any intrinsic biological interest.