How Not to Be Wrong: The Power of Mathematical Thinking
Excerpts
Statistics pioneer Francis Ysidro Edgeworth proposed that the curve be called the gendarme’s hat,
The law of averages is not very well named, because laws should be true, and this one is false.
We certainly have built-in intuition for thinking about uncertain things, but it’s much harder to articulate. There’s a reason that the mathematical theory of probability came so late in mathematical history, and appears so late in the math curriculum, when it appears at all. When you try to think carefully about what probability means, you get a little woozy.
It’s not enough that the data be consistent with your theory; they have to be inconsistent with the negation of your theory, the dreaded null hypothesis.
So: significance. In common language it means something like “important” or “meaningful.” But the significance test that scientists use doesn’t measure importance.
That’s the power of the method, but also its danger. The truth is, the null hypothesis, if we take it literally, is probably just about always false. When you drop a powerful drug into a patient’s bloodstream, it’s hard to believe the intervention has exactly zero effect on the probability that the patient will develop esophageal cancer, or thrombosis, or bad breath. Every part of the body speaks to every other, in a complex feedback loop of influence and control. Everything you do either gives you cancer or prevents it. In principle, if you carry out a powerful enough study, you can find out which it is. But those effects are usually so minuscule that they can be safely ignored. Just because we can detect them doesn’t always mean they matter.
If only we could go back in time to the dawn of statistical nomenclature and declare that a result passing Fisher’s test with a p-value of less than 0.05 was “statistically noticeable” or “statistically detectable” instead of “statistically significant”! That would be truer to the meaning of the method, which merely counsels us about the existence of an effect but is silent about its size or importance. But it’s too late for that. We have the language we have
Shakespeare failed the significance test. Skinner writes: “In spite of the seeming richness of alliteration in the sonnets, there is no significant evidence of a process of alliteration in the behavior of the poet to which any serious attention should be given. So far as this aspect of poetry is concerned, Shakespeare might as well have drawn his words out of a hat.”
that itself would be evidence that some nonrandom process was at work. The second picture might look “more random” to the naked eye, but it is not; it testifies that the points have a built-in disinclination to crowd.
We can call the number of digits the “fake logarithm,” or flogarithm.
proponent
When they don’t think anyone’s listening, scientists call this practice “torturing the data until it confesses.”
For Neyman and Pearson, the purpose of statistics isn’t to tell us what to believe, but to tell us what to do. Statistics is about making decisions, not answering questions.
if an effect can’t be replicated, despite repeated trials, science backs apologetically away.
When Fisher says that “no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas,” he is saying exactly that scientific inference can’t, or at least shouldn’t, be carried out purely mechanically; our preexisting ideas and beliefs must always be allowed to play a part.
If you’ve ever used America’s most popular sort-of-illegal psychotropic substance, you know what it feels like to have too-flat priors. Every single stimulus that greets you, no matter how ordinary, seems intensely meaningful. Each experience grabs hold of your attention and demands that you take notice. It’s a very interesting mental state to be in. But it’s not conducive to making good inferences.
EXPECTED VALUE IS NOT THE VALUE YOU EXPECT
Discussed
A Method of Estimating Plane Vulnerability Based on Damage of Survivors · 1943 · Abraham Wald