A new study by Reid et al claims to demonstrate a biological marker for the presence of depression. First we have the boring criticisms, like “32 is not a real sample size, “shotgunning 20 RNA markers and noticing which ones were increased in depressed patients and decreased after treatment is painting the target after you shot the gun” and “you’re comparing treatment group re-draws to control group baseline draws” but anyone could make those. The authors make several of those points themselves. And there are some statistical criticisms that pretty much invalidate the whole thing.* What I find interesting is that even if the results are correct, they may not be useful.
If you look at the table comparing the marker rates in depressed and non-depressed patients, there are 9 markers that differ in a statistically significant way. The problem is that they’re still not very far apart. What you would ideally like to see in a diagnostic test is the following:
because then it easy to translate a test score into a health status. But the markers in this study are more like
Which means that if you know someone is depressed you can generate a pretty good idea of their marker score, but there’s a wide range where knowing their marker score doesn’t give you a good idea if they’re depressed. That makes it pretty useless for a screening test.
But it’s actually worse than that. There are many more undepressed people than depressed people, so the curves could look more like
Under this graph, the sick mean could be four standard deviations out from the health mean, and yet a person with a low marker score is approximately equally likely to be depressed or not. This is a bayesian reasoning problem and doctors are frighteningly bad at those, but then, they’re worse than chance at frequentist statistics too.
In summary, I’m not hopeful this proves to be a useful screening tool for depression.
*They don’t actually prove that the marker values of cured people converge with those of never-depressed people, they just fail to prove they’re statistically different. Those are different things. They also switch between two (equally valid) statistical tests (T-test and Fischer’s) without saying why, which means there is a high probability the answer is “we liked those answers more.”