Authors can be deceitful, reviewers can be incompetent, but these aren’t the only ways for mistakes to creep into science. The truth is that the very nature of the scientific process means that most scientific papers published are likely to be wrong.
The Scientific Process
We naturally make assumptions about our world. From the time when we are children, up until when we are adults. We base those assumptions on the world we see before us. The sun rises in the east every day, and sets in the west. A hammer will fall faster than a feather. We assume that these rules like this have been in place forever, and will hold forever.
Except when they don’t. Near the poles, there are days when the sun never rises, and other days where the sun never sets. On the moon, a feather will fall just as fast as a hammer. Our assumptions change as we observe more. We use science to work out which of our assumptions hold out the longest, and which ones do not.
We build up a picture of the wider world around us from assumptions that are based on observations. The problem is, sometimes, our observations can lie to us. Our assumptions can be wrong.
Guinness’ Secret Weapon
This starts with a brewery. The Brewery was Guinness, and they had a problem. The business of brewing was highly competitive, and they needed an edge. The business of brewing is a subtle one. The quality of the hops, barley and yeast all affect the final product. Without understanding how that worked, the beer they produced could vary. One batch might taste different from another. They wanted to standardise the process, and make their brew better.
In 1899, the Guinness Brewery was on the lookout for scientists to help them achieve this. They were lucky enough to have recruited an extraordinary young man named William Sealy Gosset. What he would find during his work at Guinness would go beyond brewing, and affect all of science.
One of Guinness’ problems was counting yeast. Yeast ferment hops and barley, and were essential to the process of making beer. They needed to keep an eye on the yeast numbers to standardise the fermentation process. This was one of Gosset’s tasks. But there was no way he could count every single yeast in the vat.
He could only examine small samples from the beer vats. He had to count each yeast within a sample through a microscope. Theoretically, he could estimate of the total numbers of yeast in the vat from one sample. If you took a millilitre sample from one litre and counted 100 yeast cells in it, then surely there must be a 100,000 yeast cells per litre. Right ?
Except Gosset realised that it wasn’t that straightforward. He took multiple samples, and found very different yeast numbers in each of them. Even when he took them from the same vat, at the same time.
This difference was caused by variation. Turbulence within the vat could throw clumps of yeast together, or push them apart. Which meant that the numbers of yeast captured in one sample was in some part down to sheer happenstance.
This is what made Gosset’s estimates of the total yeast numbers differ from the actual numbers of yeast in the vat. It was all down to “Statistical Error”.
A Room full of British Men
To get an idea of what statistical error is, here is an example. You find yourself in a room with ten British men. Try not to swoon. You want to know the weight of the average British man. You can try to figure it out by weighing all the men in the room. If you take the average of their weight, you would hope that it reflects the national average.
So you take that measurement, and you find that the average is 116kg. Which is not what you expected. It’s much higher than the weight of the average American man, at 88kg. So surely, this must mean that British men are so much fatter than Americans.... right ?
But then you notice that one of the men in the room happened to be Carl Thompson who, at 412 kg, is currently Britain’s fattest man. By sheer bad luck, you managed to have an outlier in your measurements. Which completely threw off your conclusion.
You don’t even need the help of Carl Thompson to throw off your calculations. If you took a sample of Actors like Daniel Radcliffe, Benedict Cumberbatch and Tom Hiddlestone, then you’d get a weight far lower than you would expect. That’s because staying attractive is practically their job descriptions.
As long as you are in a room where there are more people who are edging to one extreme than another, the average you find will not be the “true” average for the whole of the UK. This is the essence statistical error. It is the difference between the average you measure, and the “true” average.
In a room of ten people, we have a pretty small sample of the British population. One person can completely throw off your calculations. But if you have a larger room, with more people, that changes.
For instance, imagine now that you’re in a room with 99 average British men and Carl Thompson. If we take a measurement now, we see the average is now 86.9 kg, as opposed to 116 kg. When you take more measurements, you are less likely to have them thrown off by outliers. With more people in the room, we can get a much better picture of the range of weights adult males can have.
A better understanding of how much weight varies allows us to understand “Statistical Error” better, and to see where it comes from. It’s easy to see why the weight of British men may vary. We all have different lifestyles. What if we tried measuring something that we can be certain of...
How Long is a Piece of String ?
Let’s say we cut a ten centimetre length of string. Surely it will always be ten centimetres ? Right ?
If you cut it up using a ruler, then that really depends on how accurate your ruler was. Is it exactly ten centimetres ? If I were to measure the length from the top-most strand to the bottom most one using an electron microscope, will it be exactly ten centimetres ? If I counted it lengthwise atom by atom, would it add up to 10cm exactly ?
A speck of dust could attach itself to the end of the string, lengthening it. An atom might be sheared off. The elasticity of the string can vary based on the environment. There will be tiny fluctuations that can change its length on the nanoscale level. When you start to reach the limits of our capabilities, all measurements vary.
This becomes a problem for making scientific discoveries, because it can lead us to believe things that aren’t true, simply by chance.
The finicky world of testing hypotheses
We talked about how science works by trying to figure out which of our assumptions about the universe hold up yo the most rigorous tests. Another word for a scientific assumption is ‘hypothesis’.
Where do these hypotheses come from ?
Basically, we make them up. We’ve made them up from the dawn of time, and will continue to make them up. The only difference is that these days, we try to base our hypotheses on testable observations. They can come from us directly observing something. Or, we can build a theory based on our observations that leads to new hypotheses to be tested.
For example, based on the movement of the sun, you can come to the conclusion that the sun rotates around the earth. This can give you the idea that the earth is the centre of the universe. But if the earth is the centre of the universe, that creates a whole new set of hypotheses that can be tested. Do the planets in the sky move like they are orbiting the earth ? It turns out they don’t.
The standard model of physics was developed based on observations from particle accelerators, and predicted the existence of the Higgs-Boson. When the Large Hadron Collider started smashing particles, did those particles behave like the Higgs Boson existed ? It turns out that they did.
Through rigorous testing, and experiment, our picture of the world gets closer to what the world actually is. That is the ideal of science. Unfortunately, reality doesn’t quite live up to the ideal.
Why most published research studies are wrong
This is where John Ioannidis comes in. In 2005 he published the most important scientific article of this century. It’s a paper that affects nearly all scientific fields, and has implications that cut right into the heart of the scientific process. It was titled “Why most published research studies are wrong”
The world of hypothesis testing gets a lot more complicated when variance and statistical error rear their ugly heads. We wrongfully assumed that the average British man was over 20kg heavier than an American simply due to the chance inclusion of one man. Other experiments can be thrown off by variance in the same way.
That means that data, even when collected honestly, can occasionally lead us to conclusions that are actually wrong. It might be a rare occurrence, but it does happen. The problem is that the numbers of hypotheses I can make up out of thin air are unlimited. Whereas the numbers of these hypotheses that are actually true is limited by reality.
So here are the main factors in play here:
- There are many more possible false hypotheses than true ones
- A small percentage of those false hypotheses will be “proven” true
- That small percentage is culled from a much bigger number of false hypotheses.
- Which means that the small percentage of false hypotheses that appear true can actually outnumber the hypotheses that really are true.
Imagine you have a robot in a room testing every possible hypothesis once. I mean every possible hypothesis. It will be able to correctly confirm that the “True” hypotheses are correct. But it will be testing a nearly infinite number of false hypotheses also. Some of these will appear true, when they are not true. Since even a small percentage of “near infinite” can be huge, the robot will actually find itself accepting more false hypotheses than true ones. Thus, its model of the world will be skewed.
Fortunately, this isn’t an insurmountable situation. If you have that robot re-test what it believes to be true, it will be able to weed out most of the false hypotheses. Without these cluttering things up, it can form new models of the world, that generate new hypotheses needing to be tested. Then the cycle begins once more.
Science is constantly in the middle of this process. At any one time, most published scientific studies are wrong.
At the very beginning of this series, I railed against Aristotle for all of his mistakes. It’s pretty fashionable to bash Aristotle. He was wrong about a lot. He thought Buzzards had three testicles, and that the brain was primarily there to cool the blood, and the brain produces phlegm.
We can all share a good laugh at his expense. Whilst we have the benefit of centuries of scientific discovery, he did not. He operated right at the beginning, before that legacy existed. He had no giants to stand on.
The hypotheses he set forth to explain his observations had been plucked out of thin air. It’s no wonder that many of them reflected his own biases an his intellectual forebears. When he saw that the brain was surrounded by mere bone , and only a tiny amount of flesh, he assumed that it must be for heat conduction. The production of phlegm from the head, a liquid that could also conduct heat away, further lent credence to his hypothesis. When he looked through the viscera of a dead bird, he didn’t have a handy anatomical guide to tell him which organ was which. He was working on the boundaries of what people knew at the time.
Just as modern scientists operate at the edge of knowledge today. No doubt, future generations will snark about the mistakes we make now just as I snarked on Aristotle. That is a good thing.
The real reason many thinkers hate Aristotle is not because he was wrong. The problem was when those ideas were made sacrosanct by his followers, who took him at his word, and treated it as holy writ. We only make the same mistake when we decide to treat peer-reviewed articles as if they are set in stone, and not up for debate. Every time someone states that they can “trust” the results of an article from a top tier journal like “Science” or “Nature”, we make the mistake of Aristotles followers. Although instead of putting one man on a pedestal, we enshrine a publishing process.
Peer Review is about more than what gets published
When we talk about peer review being broken, we are talking about how it has ultimately failed to live up to its own myth. That a single scientific study, vetted by experts, can be taken as fact. But now you can see how Peer Review can never live up to that standard. Scientists, Journals and the lay public simply latched onto that myth because it was convenient to believe, not because it was true.
What we think of as Peer Review has only been around for a hundred years, and only ascendant in the last fifty. So you might ask why I decided to start my history so much further back into the past ?
The concept of Peer Review is so much bigger than what people usually talk about. It is also incredibly simple. It’s getting someone who understands a work to examine it so that they can give their own opinion and advice.
If we are simply examining the concept, then not only is “Peer Review” not dead, it is more alive than it ever was. Just as the invention of the photocopier in the 1950’s suddenly allowed more journals to perform Peer Review, a new technology is making waves now. The internet has changed the way we read, critique, and talk about scientific articles. I will be talking about the effects it has already wrought on Peer Review, and speculate on what the future holds next week, in the final part of this series.
Big Bang Theory- Season 4 Episode 9- “The Boyfriend Complexity”
It’s Always Sunny in Philadelphia -Season 8 Episode 10 “Reynolds vs. Reynolds: The Cereal Defense”
p.s. Sorry this was late, stuff kept happening