
There is a very, very strange phenomenon in scientific literature called the decline effect. For some inexplicable reason, many positive scientific effects seem to decline over time as more and more research into the effect is conducted. A theory that seems to be initially supported by overwhelmingly positive evidence can eventually wind up seeming significantly less supported; in some kind of weird way, it may seem like facts are becoming less factual over time.
For example, initial parapsychological research seemed to indicate evidence for psychic ability (ESP in particular), but this effect declined with subsequent studies. A subject of Joseph Banks Rhine who’d initially be able to guess cards vastly over chance in repeated tests gradually became worse at doing so. In fact, over the years this subject became able to guess cards barely above chance.
Jonathan Schooler, who posited the theory of verbal overshadowing, i.e. describing something impairs the ability to remember it more so than just observing something, noticed a similar decline in his results. Since the initial publication of verbal overshadowing theory in 1990, he’s found it increasingly harder to demonstrate positive results. He called it “cosmic habituation”, and joked that the cosmos was interfering in his studies.
Regression to the mean
Some scientists link this to the statistical effect called regression to the mean; the phenomenon that if a variable is extreme on its first measurement, by the second measurement it will be closer to the average. Initial statistical flukes which may have occurred in the first studies will even out over time, according to this theory. Statistical flukes can sneak into even the best of studies, which is why a large number of studies are required to test a single phenomenon; only multiple, large data sets in numerous and distinct conditions can lead towards acceptable of a particular theory. The problem with this explanation for the decline effect is that in many of these early studies with positive results, the data is large enough to render the existence of overly positive statistical flukes highly improbable; at least, it would be highly unlikely to see these positive flukes occurring across a wide number of studies with large data sets. Yes, there should be some fluctuation in results over time, but a noticeable downward trend in effect cannot be explained by regression to the mean.
Unpublished results
There is another potential explanation for this effect; academic journals are highly skewed towards publishing positive results, which hides the true number of negative scientific studies. The vast majority of negative scientific results remain unpublished because of this bias. This is a pervasive problem the world over, and is likely to be skewing the available scientific data pertaining to certain phenomena. When it turned out Eli Lilly (the makers of antidepressants like Prozac) had withheld the results of about a third of the trials conducted to win government approval, the clinical community was up in arms (and quite rightly so).
Unfortunately, this is par the course for contemporary science. Why is this occurring? One possible reason is that companies have a vested interest in publishing only positive results for their products, be they drugs, medical devices or shampoo. This is likely to be the case with Prozac. After all, if you can show your drug to have a great positive body of evidence supporting it then doctors and patients are more likely to prescribe it/buy it. Another possibility is that researchers simply feel like they have failed if they produce negative results. If you have spent years designing a study, carefully selecting your variables, experimental design and hypotheses, getting your hopes up that you might discover a new effect, then a set of negative results may come as a crushing blow. Furthermore, journals are just not that willing to publish negative results. They’re just not as exciting. If anything, negative results may end up harming someone’s entire life’s work, or cause researchers to lose funding.
Unfortunately, as worrying as this is, unpublished results are not the explanation for the decline effect. For one, they don’t explain the personal academic strife of someone like Schooler, who repeatedly failed to replicate his own positive findings in his own future studies. Furthermore, it doesn’t explain initial positive studies that weren’t published. The decline effect has occurred to researchers regardless of whether they published the data or not.

So what the hell is going on?
To be honest, the jury is out here. It could be any number of things. Regression to the mean could be partially to blame, along with unpublished results. Perphaps, as mentioned here, we should stop treating the results of the “soft sciences” (psychology, biology and the like) as factual, but rather as tending towards a soft truth. There may be no actual static reality to be measured here – just a useful approximation towards something like it. Apparently, the decline effect doesn’t occur in physics.
The thing to remember here is that science is not an infallible, immutable entity. Scientists are just people, and people behave rather strangely sometimes (and especially in large numbers). All humans suffer from various cognitive biases, regardless of how smart and well educated you are. One of the most problematic for science is that humans hugely enjoy confirming a hypothesis, and really hate it when a hypothesis is not confirmed. There could be large amounts of unknowing selective reporting happening on a regular basis, with researchers subconsciously working towards rational and acceptable ways to demonstrate that their hypothesis is correct without necessarily exploring the data as fully as they could have.
Jonathan Lehrer, in his famous article on the decline effect, states this:
One of the classic examples of selective reporting concerns the testing of acupuncture in different countries. While acupuncture is widely accepted as a medical treatment in various Asian countries, its use is much more contested in the West. These cultural differences have profoundly influenced the results of clinical trials. Between 1966 and 1995, there were forty-seven studies of acupuncture in China, Taiwan, and Japan, and every single trial concluded that acupuncture was an effective treatment. During the same period, there were ninety-four clinical trials of acupuncture in the United States, Sweden, and the U.K., and only fifty-six per cent of these studies found any therapeutic benefits. As Palmer notes, this wide discrepancy suggests that scientists find ways to confirm their preferred hypothesis, disregarding what they don’t want to see. Our beliefs are a form of blindness.
Lehrer has been accused of being too critical of scientific method, but that is not necessarily fair. He is giving us a fair warning: it is possible for even the brightest and honest of researchers to make mistakes, with subconscious cognitive biases guiding them towards a particular result.