Library / How Confidence Intervals Become Confusion Intervals


Reference

James McCormack, Ben Vandermeer, G Michael Allan “How confidence intervals become confusion intervals” (2013) // BMC Medical Research Methodology. Publisher: Springer Science and Business Media LLC. Vol. 13. No 1. DOI: 10.1186/1471-2288-13-134

Abstract

In this paper, we review how researchers can look at very similar data yet have completely different conclusions based purely on an over-reliance of statistical significance and an unclear understanding of confidence intervals. The dogmatic adherence to statistical significant thresholds can lead authors to write dichotomized absolute conclusions while ignoring the broader interpretations of very consistent findings. We describe three examples of controversy around the potential benefit of a medication, a comparison between new medications, and a medication with a potential harm. The examples include the highest levels of evidence, both meta-analyses and randomized controlled trials. We will show how in each case the confidence intervals and point estimates were very similar. The only identifiable differences to account for the contrasting conclusions arise from the serendipitous finding of confidence intervals that either marginally cross or just fail to cross the line of statistical significance.

Bib

@Article{mccormack2013,
  title = {How confidence intervals become confusion intervals},
  abstract = {In this paper, we review how researchers can look at very similar data yet have completely different conclusions based purely on an over-reliance of statistical significance and an unclear understanding of confidence intervals. The dogmatic adherence to statistical significant thresholds can lead authors to write dichotomized absolute conclusions while ignoring the broader interpretations of very consistent findings. We describe three examples of controversy around the potential benefit of a medication, a comparison between new medications, and a medication with a potential harm. The examples include the highest levels of evidence, both meta-analyses and randomized controlled trials. We will show how in each case the confidence intervals and point estimates were very similar. The only identifiable differences to account for the contrasting conclusions arise from the serendipitous finding of confidence intervals that either marginally cross or just fail to cross the line of statistical significance.},
  volume = {13},
  issn = {1471-2288},
  url = {http://dx.doi.org/10.1186/1471-2288-13-134},
  doi = {10.1186/1471-2288-13-134},
  number = {1},
  journal = {BMC Medical Research Methodology},
  publisher = {Springer Science and Business Media LLC},
  author = {McCormack, James and Vandermeer, Ben and Allan, G Michael},
  year = {2013},
  month = {oct}
}

Quotes (3)

Same Data, Different Results

Most published reports of clinical studies begin with an abstract – likely the first and perhaps only thing many clinicians, the media and patients will read. Within that abstract, authors/investigators typically provide a brief summary of the results and a 1–2 sentence conclusion. At times, the conclusion of one study will be different, even diametrically opposed, to another despite the authors looking at similar data. In these cases, readers may assume that these individual authors somehow found dramatically different results. While these reported differences may be true some of the time, radically diverse conclusions and ensuing controversies may simply be due to tiny differences in confidence intervals combined with an over-reliance and misunderstanding of a “statistically significant difference.” Unfortunately, this misunderstanding can lead to therapeutic uncertainty for front-line clinicians when in fact the overall data on a particular issue is remarkably consistent.

Page 1

Recommendations on Presenting of Statistical Results in Medical Literature

We encourage authors to avoid statements like “X has no effect on mortality” as they are likely to be both untrue and misleading. This is especially true as results get “close” to being statistically significant. Results should speak for themselves. For that to happen, readers (clinicians and science reporters) need to understand the language of statistics and approach authors’ conclusions with a critical eye. We are not trying to say that the reader should not review the abstract but when authors’ conclusions differ from others, readers must examine and compare the actual results. In fact, all but one of the meta-analyses provided point estimates and CIs in the abstracts. This facilitates quick comparisons to other studies reported to be “completely different,” and to determine if the CIs demonstrate clinically important differences. The problem lies in the authors’ conclusions, which often have little to do with their results but rather what they want the results to show. We encourage journal editors to challenge authors’ conclusions, particularly when they argue they have found something unique or different than other researchers but the difference is based solely on tiny variations in CIs or p-value (statistically significant or not).

We are not suggesting the elimination of statistical testing or statistical significance, but rather that all people (authors, publishers, regulators etc.) who write about medical interventions use common sense and good judgment when presenting results that differ from others and not be so beholden to the “magical” statistical significance level of 0.05. We urge them to consider the degree to which the results of the “differing” study overlap with their own, the true difference in the point estimates and range of possible effects, where the preponderance of the effect lies and how clinicians might apply the evidence.

It appears that readers of the papers discussed here would be better served by reviewing the actual results than reading the authors’ conclusions. To do that, clinicians need to be able to interpret the meaning of CIs and statistical significance.

Page 5

Black and White Medical Conclusions

It appears that medical authors feel the need to make black and white conclusions when their data almost never allows for such dichotomous statements. This is particularly true when comparing results to similar studies with largely overlapping CIs. Virtually all of the conclusion confusion discussed in this paper can be linked to slavish adherence to an arbitrary threshold for statistical significance. Even if the threshold is reasonable, it still cannot be used to make dichotomous conclusions.

Page 5