When you visit your doctor, you might assume that the treatment they prescribe has solid evidence to back it up. But you’d be wrong. Only one in ten medical treatments are supported by high-quality evidence, our latest research shows.
The analysis, which is published in the Journal of Clinical Epidemiology, included 154 Cochrane systematic reviews published between 2015 and 2019. Only 15 (9.9%) had high-quality evidence according to the gold-standard method for determining whether they provide high or low-quality evidence, called GRADE (grading of recommendations, assessment, development and evaluation). Among these, only two had statistically significant results – meaning that the results were unlikely to have arisen due to random error – and were believed by the review authors to be useful in clinical practice. Using the same system, 37% had moderate, 31% had low, and 22% had very low-quality evidence.
The GRADE system looks at things like risk of bias. For example, studies that are “blinded” – in which patients don’t know whether they are getting the actual treatment or a placebo – offer higher-quality evidence than “unblinded” studies. Blinding is important because people who know what treatment they are getting can experience greater placebo effects than those who do not know what treatment they are getting.
Among other things, GRADE also considers whether the studies were imprecise because of differences in the way the treatment was used. In the 2016 review, researchers found that 13.5% – about one in seven – reported that treatments were supported by high-quality evidence. Lack of high-quality evidence, according to GRADE, means that future studies might overturn the results.
The 154 studies were chosen because they were updates of a previous review of 608 systematic reviews, conducted in 2016. This allows us to check whether reviews that had been updated with new evidence had higher-quality evidence. They didn’t. In the 2016 study, 13.5% reported that treatments were supported by high-quality evidence, so there was a trend towards lower quality as more evidence was gathered.
There were a few limitations to the study. First, the sample size in the study may not have been representative, and other studies have found that over 40% of medical treatments are likely to be effective. Also, the sample in the study was not large enough to check whether there were certain types of medical treatments (pharmacological, surgical, psychological) that were better than others. It is also possible that the “gold standard” for ranking evidence (GRADE) is too strict.
Too many low-quality studies
Many poor-quality trials are being published, and our study merely reflected this. Because of the pressure to “publish or perish” to survive in academia, more and more studies are being done. In PubMed alone – a database of published medical papers – more than 12,000 new clinical trials are published every year. That’s 30 trials published every day. Systematic reviews were designed to synthesise these, but now there are too many of those, too: over 2,000 per year published in PubMed alone.
The evidence-based medicine movement has been banging a drum about the need to improve the quality of research for more than 30 years, but, paradoxically, there is no evidence that things have improved despite a proliferation of guidelines and guidance.
In 1994, Doug Altman, a professor of statistics in medicine at Oxford University, pleaded for less, but better, research. This would have been good, but the opposite has happened. Inevitably, the tsunami of trials published every year, combined with the need to publish in order to survive in academia, has led to a great deal of rubbish being published, and this has not changed over time.
Poor-quality evidence is serious: without good evidence, we simply can’t be sure that the treatments we use work.
GRADE system too harsh
A carpenter should only blame their tools as a last resort, so the excuse that GRADE doesn’t work should be only be used cautiously. Yet it’s probably true that the GRADE system is too harsh for some contexts. For example, it is near impossible for any trial evaluating a particular exercise regime to be of high quality.
An exercise trial cannot be “blinded”: anyone doing exercise will know they are in the exercise group, while those in the control group will know they are not doing exercise. Also, it is hard to make large groups of people do exactly the same exercise, whereas it is easier to make everyone take the same pill. These inherent problems condemn exercise trials to being judged to be of lower quality, no matter how useful safe exercise is.
Also, our method was strict. Whereas the systematic reviews had many outcomes (each of which could be high quality), we focused on the primary outcomes. For example, the primary outcome in a review of painkillers would be a reduction in pain. Then they might also measure a range of secondary outcomes, ranging from anxiety reduction to patient satisfaction.
Focusing on the primary outcomes prevents spurious findings. If we look at many outcomes, there is a danger that one of them will be high quality just by chance. To mitigate this, we looked at whether any outcome – even if it wasn’t the primary outcome. We found that one in five treatments had high-quality evidence for any outcome.
On average, most of the medical treatments whose effectiveness has been tested in systematic reviews are not supported by high-quality evidence. We need less, but better, research to address uncertainties so that we can become more confident that the treatments we take work.
Jeremy Howick has received funding from the British Medical Association, the National Institute for Health Research, and the Medical Research Council. The study upon which this article was based was not externally funded.