Scientific Method


The originators of the statistical frameworks that underlie modern epidemiologic studies recognized that their methods could not be interpreted properly without an understanding of their philosophical underpinnings. Neyman held that inductive reasoning was an illusion and that the only meaningful parameters of importance in an experiment were constraints on the number of statistical “errors” we would make, defined before an experiment. Fisher rejected mechanistic approaches to inference, believing in a more flexible, inductive approach to science. One of Fisher’s developments, mathematical likelihood, fit into such an approach. The p value, which Fisher wanted used in a similar manner, invited misinterpretation because it occupied a peculiar middle ground. Because of its resemblance to the pretrial a error, it was absorbed into the hypothesis test framework. This created two illusions: that an “error rate” could be measured after an experiment and that this posttrial “error rate” could be regarded as a measure of inductive evidence. Even though Fisher, Neyman, and many others have recognized these as fallacies, their perpetuation has been encouraged by the manner in which we use the p value today. One consequence is that we overestimate the evidence for associations, particularly with p values in the range of 0.001-0.05, creating misleading impressions of their plausibility. Another result is that we minimize the importance of judgment in inference, because its role is unclear when postexperiment evidential strength is thought to be measurable with preexperiment “error-rates.” Many experienced epidemiologists have tried to correct these problems by offering guidelines about how p values should be used. We may be more effective if, in the spirts of Fisher and Neyman, we instead focus on clarifying what p values mean, and on what we mean by the “scientific method.”