Wednesday, March 8, 2017

Please Help / Real Criminal Case

Please  help.  This is a real-life case of likely false conviction where your input can help.  A man is spending life in jail without parole for a murder he likely did not commit.

Background:


  • In 1969, Jane Mixer, a law student, was murdered.  The case went cold.  
  • The case was reopened 33 years later when crime-scene evidence was submitted to DNA analysis.  
  • The DNA yielded two matches; both matches were from samples that were analyzed in the same lab and at the same time of the crime-scene DNA analysis.  All three samples were analyzed in late 2001 and early 2002.
  • One match was to John Ruelas.  Mr. Ruelas was 4 in 1969 and was excluded as a suspect.
  • The other match was to Gary Leiterman.  Mr. Leiterman was 26 at the time.  He was convicted in 2005 and is serving life without parole.  His appeal was denied in 2007.
  • There is no doubt that Mr. Leiterman's DNA was deposited on the crime scene sample.  The match is 176-trillion-to-1.
  • The question is whether the DNA was deposited at the crime scene in 1969 or if there was a cross-contamination event in the lab in 2002.

A Very Easy and Helpful PowerPoint:

  • This case comes from John Wixted, a psychologist at UCSD
  • He has made a detailed and convincing presentation.  Click here for The Power Point from John's website.
  • John has helped to persuade the Innocence Clinic at the University of Michigan to investigate the Leiterman case.
  • John and I are convinced this is a an injustice.  We are working pro bono.

Our Job:

  • Our job is to make an educated assessment of Mr. Leiterman's guilt or innocence.  It would greatly help the Innocence Clinic to assess whether there is sufficient evidence to appeal.
  • The jury heard that DNA is a trillion-to-1 accurate and there was only a very tine chance of cross contamination.   Yet, we know these are the wrong conditional probabilities to compute.
  • Consider the two hypotheses above that Leiterman's DNA was deposited at the crime scene or, alternatively, that it was deposited in the lab through cross contamination.  Conditional on the match, compute posterior probabilities.

My Analysis:



I have done my own analyses and typeset them.  But reasoning is tricky, and I would like some backup.  It is just too important to mess up.  Can you try your own analysis?  Then we can decide what is best.

You will need more information.  I used  the following specifications.  Write me if you want more:

  • John and I assumed 2.5M people are possible suspects in 1969.  It is a good guess based on population estimate of Detroit metro area.
  • The lab processes 12,000 samples a year.  The time period the DNA overlapped can be assumed to be 6 months, that is 6,000 other samples could be cross contaminated with Mixer or Leiterman.
  • The known rate of DNA cross-contamination is 1-in-1500.  That is, each time they do a mouth swab from one person, they end up with two or more DNA profiles with probability of 1/1500. We assume this rate holds for unknowable cross-contamination such as that in processing a crime scene.
  • The probability of getting usable DNA from a 33-year-old sample is 1/2.
  • Need other facts?  Just ask in the comments.

Jeff's answer is at GitHub, https://github.com/rouderj/leiterman



Thank you,
Jeff Rouder
John Wixted

Tuesday, January 3, 2017

Why Is It So Hard To Organize My Lab?

It is clear I need to pay more attention to the organization of my lab.   Organization is a challenge to me, it causes much apprehension, and seems to be a chronic need in all aspects of my life.  Let's focus on the lab.

Parameters:


1. Minimizing mistakes.  There is no upside in analyzing the wrong data set, using the wrong parameters, including the wrong figure, or reporting the wrong statistics.  These mistakes are in my view unacceptable in science.  Minimizing them is the highest priority

2.  Knowing what we did.  Some time in the future, way in the future, we or someone else will visit what we did.  Can we figure out what happened?  I'd like to plan on the time scale of decades rather than months or years.

3.  Planning for Human fallible.  Some people think science is for those who are meticulous.  Then count me out.  I am messy, careless, and chronically clueless.  A good organization anticipates human mistakes.

4. Easy to learn.  I collaborate with a lot of people.  The organization structure should be fairly intuitive self explanatory.

What we do:


1. Data acquisition and curation.  I think we have this wired.  We use a born-open data model where data are collected, logged, versioned, and uploaded nightly to GitHub automatically.  We also automatically populate local mysql tables including information on subjects and sessions, and have additional tables for experiments, experimenters, computers, and IRB info.  We even have an adverse-events table to record and address any flaws in the organizational system.  The basic unit of organization is the dataset, and it works well.

2. Outputs.  We have the usual outputs: papers, talks, grant proposals, dissertations, etc.  Some are collaborative; some are individual; some are important; some go nowhere.  The basic unit here is pretty obvious---we know exactly where each paper, talk, dissertation, etc., begins and ends.

3. Value-added endeavors.  A value-added endeavor (VAE) is a small unit of intellectual contribution.  It could be a proof, a simulation, a specific analysis, or (on occasion) a verbal argument.  VAEs, as important as they are, are ill-defined in size and scope.  And it is sometimes unclear (perhaps arbitrary) where one ends and another begins.

The Current System, The Good:

Perhaps the strongest elements of my lab's organization is that we use really good tools for open and high-integrity science.  Pretty much everything is script based, and scripts are in many ways self-documenting, especially when compared to menu-driven alternatives.  Our analyses are done in R, our papers in Latex and Markdown, and the two are integrated with RMarkdown and Knitr.  Moreover, we use a local git server and curate all development in repositories.

The Current System, The Bad and Ugly:


We use projects as our basic organization unit.  Projects are basically repositories on our local git server.  They contain ad-hoc organizations of files.  But what a project encompasses and how it is organized is ad-hoc, disordered, unstandardized, and idiosyncratic.   Here are the issues:

1. There is no natural relation between the three things we do, acquire and curate data, produce outputs, and produce VAEs and projects.  One VAE might serve several different papers; likewise, one dataset might serve several different papers.  Papers and talks encompass several different experiments (usually) and VAEs.

2. Projects have no systematic relations to VAEs, outputs or datasets.  This is why I am unhappy.  Does a project mean one paper?  Does it mean one analysis?  One development?  A collection of related papers?  A paper and all talks and the supporting dissertation?  We have done all of the these.

Help


What do you do?  Are there good standards?  What should be the basic organization unit?  Stay with project?  I am thinking about a strict output model where every output is a repository as the main organizing unit.  The problem is what-to-do about VAEs that span several outputs.  Say I have an analysis or graph that is common for a paper, a dissertation, and a talk.  I don't think I want this VAE repeated in three places.  I don't want symbolic links or hard codings because it makes it difficult to publicly archive.  That is why projects were so handy.   VAEs themselves are too small and too ill-defined to be organizing units.  Ideas?