Sometimes, I am almost more embarrassed than angered at what the people in the lab coats – we can hardly call them ‘scientists’ – are willing to say to cover their suddenly-exposed posteriors. Nothing gets them all aflutter more than people daring to turn a gimlet eye on their methods and results. After all, careers and grants are on the line! Truth? What is that?
Case in point: late last year, a study came out from something called The Reproducibility Project that called into question the validity of psychological studies. If you read the entirety of the linked post, you’ll note that the NYT at first ran the established standard take on this: of course there are a few problems here and there – science is hard, after all – but, with a few minor tweaks and little improved diligence, Science will continue to March On just the same as always. Nothing to see here, move along.
This is the CYA version of the argument that the academic pseudoscientists use to justify their existence: because the discoveries of modern physics makes the computer upon which I’m now typing possible, I must believe what sociologists say, for example, and happily and gratefully agree that my tax dollars will pay their salaries.
Bull. The NYT article linked above was, against all expectations, revised to reveal the true horror of the claims being made: that the bulk of the 100 studies – more than 60 – for which replication was attempted failed in such a way that it would be disingenuous – it would be a lie – to call them science at all. As the Times so carefully understated:
The vetted studies were considered part of the core knowledge by which scientists understand the dynamics of personality, relationships, learning and memory. Therapists and educators rely on such findings to help guide decisions, and the fact that so many of the studies were called into question could sow doubt in the scientific underpinnings of their work.
Called into question? Like a husband surprising his wife in bed with the milkman calls into question her fidelity?
Yet, clearly, this could not be allowed to stand.(1) So, today, some psychologists in an article in the Christian Science Monitor rode to rescue:
The replication paper “provides not a shred of evidence for a replication crisis,” Daniel Gilbert, the first author of the new article inScience commenting on the paper from August, tells The Christian Science Monitor in a phone interview.
See? We can stand down – we have been told by a guy in a lab coat that there’s no problem. At. All.
So Dr. Gilbert, a psychology professor at Harvard University, and three of his colleagues pored over that information in a quest to see if it held up.
And the reviewing team, none of whom had papers tested by the original study, found a few crucial errors that could have led to such dismal results.
Their gripes start with the way studies were selected to be replicated. As Gilbert explains, the 100 studies replicated were from just two disciplines of psychology, social and cognitive psychology, and were not randomly sampled. Instead, the team selected studies published in three prominent psychology journals and the studies had to meet a certain list of criteria, including how complex the methods were.
Note above, from the NYT article, that the studies to be replicated were chosen because they are important – because they form “part of the core knowledge by which scientists understand the dynamics of personality, relationships, learning and memory.” Why would Dr. Gilbert imagine that it is more important for the studies to have been chosen at random, than that important studies failed to hold up? Idiotic misdirection.
Regarding the “problem” that these studies were chosen from only two specialized sub- disciplines – two central and entrenched sub-disciplines, I would add – is again irrelevant. Who cares? What Gilbert would need to show is that these problems are somehow restricted to these two discipline, while the other myriad areas are pristine or at least not fetid piles. Which he doesn’t do at all. Instead, he changes direction again:
But when it came down to replicating the studies, other errors were made. “You might naïvely think that the word replication, since it contains the word replica, means that these studies were done in exactly the same way as the original studies,” Gilbert says. In fact, he points out, some of the studies were conducted using different methods or different sample populations.
Yoo-hoo! Earth to Dr. Gilbert! *Results* are what are supposed to be replicable. Here’s a Science 101 example for you: I boil water near sea level in my lab using distilled water, a Bunsen Burner, an Erlenmeyer flask and a nice lab thermometer. I find it boils at right around 100C. Then I go to my nearby near sea level home, fill a sauce pan from the tap, throw it on the stove and stick a good-quality candy thermometer in it – and it still boils right around 100C. Sure, minerals in the water, changes in air pressure, quality of the thermometers and so on almost certainly will result in some small variation in *results*. But within what we technically call “damn close enough” both methods yield the same results. Close is all we’re really looking for, most of the time. The fun comes in identifying why results using different methods aren’t *exactly* the same – that’s where the more interesting discoveries often take place!
This reveals another important aspect to real science. The people trying to replicate results should in fact use “different methods or different sample populations” insofar as the claimed results are not restricted to those methods and populations. If I claim that water boils at around 100C near sea level, then it’s completely fair to test my assertion by boiling water from different sources in different ways – tap water, bottled water, ice, whatever, brought to a boil on a stove, in a lab, over an open flame, whatever. Now, if somebody disputed my assertion by pointing out that motor oil doesn’t boil at 100C, then I’d have a defense – I said Water! If they say dirty water doesn’t boil until 103C, then we’d need to look into it, OR add the restriction that clean water boils at 100C. Or both.
And, embarrassing as it is (2), the replication team “rigorously redid the experiments in close collaboration with the original authors.” So, Dr. Gilbert, if the method and populations were critical to the results (3), would not the original authors have pointed that out to the replication team?
“It doesn’t stop there,” Gilbert says. It turns out that the researchers made a mathematical error when calculating how many of the studies fail to replicate simply based on chance. Based on their erroneous calculations, the number of studies that failed to replicate far outnumbered those expected to fail by chance. But when that calculation was corrected, says Gilbert, their results could actually be explained by chance alone.
“Any one of [these mistakes] would cast grave doubt on this article,” Gilbert says. “Together, in my view, they utterly eviscerate the conclusion that psychology doesn’t replicate.”
A math error? Chance? So, are we to conclude that the results of studies in psychology are so subject to chance results that 60%+ of them can be expected to yield nonsense? Sure sounds like it. Because all the other reasons presented are bullsh*t.
Note the use of the word ‘eviscerate’ – nice. Utterly. Yep, because if the replication results were to hold up, why, grants and tenure might be at risk. Something here must be eviscerated.
But this is a happy occasion! Let’s not bicker about who killed who! A Dr. Nosek, who heads up the Center for Open Science which sponsored the replication team, starts playing nice and backtracking – he and his team need grants and tenure, too, after all.
Dr. Nosek tells the Monitor in a phone interview that his team wasn’t trying to conclude why the original studies’ results only matched the replicated results about 40 percent of the time. It could be that the original studies were wrong or the replications were wrong, either by chance or by inconsistent methods, he says.
Or perhaps there were conditions necessary to get the original result that the scientists didn’t consider but could in fact further inform the results, he says.
“We don’t have sufficient evidence to draw a conclusion of what combination of these contributed to the results that we observed,” he says.
It could simply come down to how science works.
When reproduction follows, that’s “how science accumulates knowledge,” Nosek says. “A scientific claim becomes credible by the ability to independently reproduce it.”
Nope, I’m betting that the chief causes are rampant and egregious overreach, if not out-and-out fraud, perpetrated because you can’t get a job or a grant as a soft scientists until you’ve ‘discovered’ something shocking or subversive or otherwise earth-shaking. You’ve got thousands and thousands of people seeking PhDs in these fields every year, people with college loans to pay off, people working with other people who have already gotten their job, their grant and even their tenure by getting those shocking results somehow, even when the research is rushed, underfunded and ultimately unreproducible.
Typically, grads in physics can go work on Wall Street or at Google if they don’t hack it in their PhD programs; grads in sociology go work at Starbucks. If they’re lucky.
- I recall what happened among my Democratic friends and family when the IRS scandal first broke: there were a few days of sincere disorientation, until somebody in authority told them that it was all just a misunderstanding, nothing really bad had happened. And they believed it. As Zed so aptly put it, they are “everything we’ve come to expect from years of government training.”
- While it is perfectly reasonable to contact the original study’s authors to make sure you understand what you’re after, one you start the replication effort, you should be able to do it based on the information in the original study. Caesar’s wife, at the very least.
- In other words, if the results of the study were limited to only well-off college kids taking psychology classes under grant-wielding professors, that should be stated. Which means any attempts at generalizing the findings to apply to real people are bogus from the get go.