In this blog I have featured many psychological experiments, Stanley Milgram persuaded ordinary people to administer what they thought were 450-volt electric shocks to subjects in an educational experiment. Philip Zimbardo famously had 25 Stanford students play the roles of guards and prisoners in a two-week experiment that had to be suspended in under a week. And David Rosenhan managed to get eight sane people, including himself, admitted to various psychiatric hospitals in the USA, by briefly simulating auditory hallucinations. What proved more difficult was to get them out. All these experiments have achieved iconic status in psychology.
Here is one experiment I haven’t featured before. A study observed that political moderates were better at perceiving shades of grey accurately than left-wing or right-wing extremists. It is an amazing result, and yet it does seem to have a ring of truth about it. The researchers could, perhaps should, have stopped there. They could have sent it in a renowned psychological journal, because this result was eminently publishable. However, they decided to run the test again with a very large sample. Alas, the result didn’t show the second time. Now there hopes of publishing a paper had gone up in smoke. They couldn’t send both studies to a journal – it just wouldn’t be published. Had they just sent the first one, it would have been far more likely to have been accepted.
So, how reliable are psychological experiments? Scientific claims are not based on someone in authority saying that it is true. Reputation is not decisive. What is important is that the findings can be independently reproduced; another scientist following the same procedure will achieve the same results.
Enter Brian Nosek a psychology professor from the University of Virginia, the man behind the study about the shades of grey. Nosek, a social psychologist and the co-founder and director of the Center for Open Science, was able to persuade 270 of his peers to take part in a study repeating 100 published psychological experiments that had been published in prestigious psychology journals in 2008 to see if they could get the same results a second time around. The Reproducibility Project, which began in 2011, was supposed to take between six and nine months. In reality it would last three years. The project was a curious enterprise. There were no eureka moments. This was not cutting-edge research that would lead to fame and glory. Nevertheless, it is vital that it be carried out.
The results were finally published in the prestigious Science magazine last summer. 97% of the 100 studies originally reported statistically significant results. This is what you’d expect. Many experiments fail to produce meaningful results, and they are not generally published. This is what is known as publication bias. It is also sometimes known as the “file drawer effect” – results that do not support their hypotheses will stay in the researchers’ file drawers. In the new study just 36% of the replications reached statistical significance. This needs to be analysed; just because something fails to replicate doesn’t mean it isn’t true. Some of these failures could be down to luck, or poor execution, or an inability to reproduce the conditions needed to show the effect. Nevertheless the results were not good.
What is going on is probably not fraud. The problems are more subtle than that. We must stop treating single studies as unassailable versions of the truth. There is a pressure to publish. It is novel findings that are sought, and there is little incentive for attempts at replicating findings, such as those carried out in the Reproducibility Project. I also think that many of the studies should be true. There is something seductive about them. I remember reading one experiment that showed that if you hear that if you are holding something warm, such as a cup of coffee, you are more likely to perceive someone else as emotionally “warm”, and you are more likely to behave in a friendly, generous way. This may well be true but I would like to know whether or not it has been replicated.
And I fear that this lack of reproducibility is confined to psychology. Another study found that around $28 billion worth of research per year in medical fields is non-reproducible. Discoveries should be thoroughly examined and repeatedly observed before they are universally accepted. Published and true are not synonyms. Scepticism is what makes science so powerful. We are now seeing reforms and they need to continue. There should be more transparent reporting, a clear hypothesis before any data is analysed, and sharing of the results so that they may be vetted. I look forward to more work from the Center for Open Science in the future. There may not be much glory in it, but the researchers who do it, will be making the world a better place