February 22, 2011

The Decline Effect Is Stupid

we were surprised to find the data fit well within the two axes. Further research is needed

Is there something wrong with the scientific method? asks Jonah Lehrer in The New Yorker.

The premise of the article is a well known phenomenon called the Decline Effect. As described in the story, that's when exciting new results, initially robust, seem not to pan out over time. Today a series of studies shows X, next year studies shows less than X, and in ten years it's no better than nothing.

To be clear, this is what the Decline Effect is not: the finding of better data that shows your initial findings were wrong. The initial findings are right-- they happened-- but they happen less and less each time you repeat the experiments. The Decline Effect is a problem with replication.

An example is ESP: the article describes a study in which a guy showed remarkable ability to "guess which card I'm holding." He was right 50% of the time. That happened. But in subsequent experiments, he could do it less. And less. And then, not at all.

Many critics of Lehrer's article read this and say, a ha! the real explanation is regression to the mean. Flip a coin and get heads nine times in a row: it could happen, but if we flip that coin enough times we will see that it is ultimately 50/50.

But that explanation is incorrect, the article explicitly states that the Decline Effect is not regression to the mean.

The most likely explanation for the decline is an obvious one: regression to the mean. As the experiment is repeated, that is, an early statistical fluke gets canceled out... And yet Schooler has noticed that many of the data sets that end up declining seem statistically solid--that is, they contain enough data that any regression to the mean shouldn't be dramatic. "These are the results that pass all the tests," he says. "The odds of them being random are typically quite remote, like one in a million. This means that the decline effect should almost never happen. But it happens all the time!..."

And this is why Schooler believes that the decline effect deserves more attention: its ubiquity seems to violate the laws of statistics.

Lehrer believes that the Decline Effect is an inexplicable byproduct of the scientific method itself.

So? What gives?

By now, many scientists have weighed in on this article, offering the usual list of explanations-- publication bias, selection bias, regression to the mean. But while these are real problems in the pursuit of science, the real explanation of the Decline Effect goes unmentioned.

A hint of "what gives" is contained in the rest of Schooler's quote, above:

...This means that the decline effect should almost never happen. But it happens all the time! Hell, it's happened to me multiple times."

The true explanation for the Decline Effect is one no one cites because the place you would cite it is the cause itself. I am not exaggerating when I say that the cause of the Decline Effect is The New Yorker.

II.

The Decline Effect is a phenomenon not of the scientific method but of statistics, so right there you know we are out of the realm of logic and into the realm of "well, this sort of looks like a plausible graph, what should we do with it?" Here's the article's money quote:

But now all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It's as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon doesn't yet have an official name, but it's occurring across a wide range of fields, from psychology to ecology.

A wide range of fields from the almost entirely made-up to the slightly less made-up are losing their "truth?" This phenomenon isn't occurring in physics. You could (and people did) build a Saturn V launch platform on the unscarred edifice of Maxwell's equations, and then 40 years later build an iPhone on top of that same edifice. It's amazing what you can do with the black magic of electromagnetic theory.

Psychology, e.g., is different, because it attempts to model the particular minds of some humans at this particular time in this particular culture, and those models may apply 3 or 3000 years from now, or they might not. Ecology attempts to form a static model of the dynamic relationship of constantly evolving organisms to each other and their environment which we are wrenching to and fro in real-time. But there is no static "reality" in these fields to observe. In these soft sciences, the observation of reality doesn't just change the results, sometimes the observation actually changes the reality almost completely.

In these regression sciences, we throw a ton of data into Visicalc and see what curves we can fit to them. And then, with a wink and a nod, we issue extremely broad press releases and don't correct the journalists or students when they confuse correlation with causation. We save that piercing insight for the cushy expert witness gigs.

The problem isn't that the Decline Effect happens in science; the problem is that we think psychology and ecology and economics are sciences. They can be approached scientifically, but their conclusions can not be considered valid outside of their immediate context. The truth, to the extent there is any, is that these fields of study are models, and every model has its error value, it's epsilon that we arbitrarily set to be the difference between the model and observed reality. Quantitative monetary theory predicts that given this money supply and this interest rate, inflation should be 2%, but inflation is actually 0.4%. Then let's just set epsilon to -1.6% and presto! Economics is a Science.

III.

To make its point about the Decline Effect-- and unintentionally making mine about science-- the article predictably focuses on the psych drugs that we hate to love to take, that keep the McMansions heated and the au pairs blondily Russian. "They were found, scientifically, to be great, and now we know, scientifically, that they're not!" Medicine is not a science, and despite the white coats and antisocial demeanor doctors are not scientists. Docs and patients both need to get that into their heads and plan accordingly. That why we say doctors practice medicine. If medicine was a hard science, doctors would not have been surprised and puzzled by the effects of some of these drugs. You can show me Powerpoint slides of depression rating scales for as long as the waitress keeps refilling my drink, but none of that "science" explains why imipramine doubles the mania rate, Depakote does nothing to it, and Zoloft lowers it, with apparent disregard for their scientific classifications.

The problem isn't the Decline Effect, the problem is you believed the data had the force of F=ma. No one should be surprised when medical "truths" turn out to be wrong-- they were never true to begin with. And if you made sweeping policy proclamations based on them, well, you got what you paid for.

IV.

But for all this imprecision, the criticism-- by folks like Jonah Lehrer-- directed at the "social" sciences is even worse. Eggheads are collecting data in routine and predictable ways. They are at least consistently using statistical analysis to analyze that data. It isn't art history by postdocs with warez Photoshop.

So when I read this, I have to manually push in my temporal artery:

Many researchers began to argue that the expensive pharmaceuticals weren't any better than first-generation antipsychotics, which have been in use since the fifties. "In fact, sometimes they now look even worse," John Davis, a professor of psychiatry at the University of Illinois at Chicago, told me.

Shiver me timbers. Okay, Professor Davis, now that your conclusion about the inferiority of the expensive drugs has been read by an audience twenty-five times larger than that of any study you've ever read, let alone written, can you please show us that data that supports your conclusion that atypicals are less efficacious? Oh, that's not what you meant. I'm confused, what do you mean by "worse?" Wait, were you talking about depression or schizophrenia? OCD? I'm lost, let's back up. And while you're at it, please define for us/Jonah Lehrer the other technical terms: "sometimes," "they," "now," "look," and "even," because I have no idea what they hell they mean in this context, and, big money down, you don't either.

This is where the "scientific method" is breaking down. Not in the lab or at the clinical trial. It's breaking down in the sloppiness of the critics. If any researchers want to argue about the efficacy of new drugs over the old ones, there are ways and places to do that. The New Yorker is not among them, because it lets scienticians get away with sloppy soundbites, and leaves anywhere from nine to 3M layman readers with the impression that scientists "know" "the newer meds" are "worse."

And the moment you talk to The New Yorker, your misinterpreted statistical association becomes truth. Certainly for the layman's mind, but also in the mind of the Professor. I'm going to bring up Depakote again until I get a public apology-- do you know how many times a day I have to correct psychiatrists that Depakote does not have "a lot of studies" supporting its efficacy in maintenance bipolar-- let alone an actual indication?

Left alone in his office and a stack of contradictory papers, he probably wouldn't be so flippant about it all. It's slow, excruciating, unexciting work that is the pursuit of science.

But that won't get you any grant money, let alone quoted in The New Yorker.

V.

An example:

What Møller discovered is that female barn swallows were far more likely to mate with male birds that had long, symmetrical feathers. This suggested that the picky females were using symmetry as a proxy for the quality of male genes. Møller's paper, which was published in Nature, set off a frenzy of research. Here was an easily measured, widely applicable indicator of genetic quality, and females could be shown to gravitate toward it. Aesthetics was really about genetics....In the three years following, there were ten independent tests of the role of fluctuating asymmetry in sexual selection, and nine of them found a relationship between symmetry and male reproductive success. It didn't matter if scientists were looking at the hairs on fruit flies or replicating the swallow studies--females seemed to prefer males with mirrored halves.

That's what Lehrer wrote. I know you didn't read it all. Here's what you read:

"Females seem to prefer symmetric males."

The actual study suggested nothing about what the picky females were doing. Lehrer inferred it. By the time we get to the end of the paragraph all the reader remembers is that women prefer to have sex with symmetric guys, which is simply, undeniably, not true. But none of the studies in that paragraph every concluded that. They each made specific conclusions about the specific creature they were studying. And if you think I'm splitting hairs, then you are the reason for the "Decline Effect."

Scientifically detected associations, in specific situations and contexts, are then generalized by the popular press-- or at least by the profession's internal pop culture-- and those generalizations are used as working knowledge. Those generalizations, which were never true, are the starting point for the future decline in effect that Lehrer is worried about.

When the article then goes on to describe the breakdown of this sweeping generalization in studies after 1994 (on other species), it attributes that to the Decline Effect. It's not. When you look at the studies together, what you should have inferred is "symmetry is an associated factor in mate selection by females in only some species and not others and more research is need to explain why." Instead, the article attributes its inability to summarize the variety and complexity of nature in a 140 character Twitter message to an underlying failure in the 500-year-old guiding principle of science.

Worse, as the article points out, sometimes journals want to publish only confirmatory findings, which set the stage for the discovery of a Decline Effect later on. But the article doesn't go far enough: they're not looking for confirmation of a previous study, they are looking for confirmation of a sweeping generalization. Not: "Zyprexa is more efficacious on the PANSS than Haldol for schizophrenia," but "Don't we already know atypicals are better than typicals?" And then those same journals, in the future, will only want negative data because their new sweeping generalization will be popular at Harvard via a grant from NIMH, all the Pharma guys moved on to Ohio. That's not the Decline Effect: it's a pendulum swinging wildly from one extreme to the other, over a pit, in which is tied a guy. You're the guy.

V.

Here's an example of how sloppy science becomes enshrined as "truth" by popular press outlets like The New Yorker.

In 2001, Michael Jennions, a biologist at the Australian National University, set out to analyze "temporal trends" across a wide range of subjects in ecology and evolutionary biology. He looked at hundreds of papers and forty-four meta-analyses (that is, statistical syntheses of related studies), and discovered a consistent decline effect over time, as many of the theories seemed to fade into irrelevance.

Look at that sentence, inadvertently hitting on the truth: the decline effect happened as the theories became irrelevant-- not the other way around. The question isn't what does science say is true; the question is, what does the author want to be true?

But how can the author will a meta-analysis to show what he wants it to show? Maybe he could manipulate an individual study, wouldn't a "study of studies" be immune to his dark sorcery?

Imagine a study of Prozac vs. placebo in 10000 patients, and Prozac is awesome. Imagine two more studies, each with 6 patients, and Prozac doesn't beat placebo in those. I now have three studies. My meta-analysis concludes: "Prozac was found to be superior to placebo in only a third of studies." Boom-- Associate Professor.

When meta-analyses look at only a few studies (e.g. N=4), if even one of them is a poorly designed study you can overwhelm-- or purposely extinguish-- what might actually be a real effect.

In theory, researchers are supposed to be vigilant about the kinds of studies they lump together, making sure they are all similarly designed, etc. In practice, researchers are not, on purpose. Researchers all know what they want to find, and maliciously or unconsciously the studies to be included are selected, and, surprise, the researcher's hypothesis is supported. I have a blog full of examples, but conduct your own experiment: take any meta-analysis, look only at the author's name, find out where he works-- and guess everything else.

While you're wasting your time with that, that author of that meta-analysis is talking to The New Yorker and changing reality, "well, studies have shown that..."

V.

This is going to get worse as the internet allows for popular discussion but not for access to the primary data. I am contacted all the time by the media, "hey, what do you think about the new study that finds that women are hotter when they're ovulating?" I try to drop some knowledge in a media friendly way, but at least a third of the time the reporter just wants me to a agree with that atrocious study and speculate wildly. "Do you think it's because their boobs get bigger?" Let's find out.

It's easy to go through Lehrer's examples and identify the culprits of the supposed Decline Effect, but the best example of why "science" goes bad is, not surprisingly, offered by Lehrer himself. In (brace yourself) Wired, Jonah Lehrer answers some questions about his New Yorker article. Recap: his premise is that the Decline Effect is real, occurs in all sciences, may be a function of the scientific method itself, and eats away at even the most robust findings.

Question 1: Does this mean I don't have to believe in climate change?

Me: I'm afraid not. One of the sad ironies of scientific denialism is that we tend to be skeptical of precisely the wrong kind of scientific claims.

Get that?

Instead of wasting public debate on creationism or the rhetoric of Senator Inhofe [critic of climate change], I wish we'd spend more time considering the value of spinal fusion surgery, or second generation antipsychotics, or the verity of the latest gene association study.

Jonah Lehrer is the Decline Effect. I think he is a good and earnest person, and I know he was previously a scientist himself, but he ultimately grades the science he's not knowledgeable about based on value judgments. Which is fine, it's his life, though I wonder if deep down he believes it. If he goes psychotic, will he actually want me to give him Haldol over Abilify?

The trouble for the Earth is... he writes for The New Yorker. And Wired. Which means that his value judgments carry more weight than the science itself.

If they didn't, I, and those who are real scientists, wouldn't have to explain why the Decline Effect doesn't exist, I wouldn't have to waste time rebutting his article.

But I do. That's the problem.

---
You might also like:

Do video games cause violence?

Is Science Just A Matter Of Faith?

The Last Psychiatrist

Wovon man nicht sprechen kann, darüber muß man schweigen.

The Decline Effect Is Stupid

63 Comments