Critique of McClintock's Study of Human Menstrual Cycles

Alex Reinhart · 2015

from Statistics Done Wrong: The Woefully Complete Guide

McClintock’s study of human menstrual cycles went something like this:
Find groups of women who live together in close contactfor instance, college students in dormitories.
Every month or so, ask each woman when her last menstrual period began and to list the other women with whom she spent the most time.
Use these lists to split the women into groups that tend to spend time together.
For each group of women, see how far the average woman’s period start date deviates from the average.
Small deviations would mean the women’s cycles were aligned, all starting at around the same time. Then the researchers tested whether the deviations decreased over time, which would indicate that the women were synchronizing. To do this, they checked the mean deviation at ﬁve different points throughout the study, testing whether the deviation decreased more than could be expected by chance.
Unfortunately, the statistical test they used assumed that if there was no synchronization, the deviations would randomly increase and decrease from one period to another. But imagine two women in the study who start with aligned cycles. One has an average gap of 28 days between periods and the other a gap of roughly 30 days. Their cycles will diverge consistently over the course of the study, starting two days apart, then four days, and so on, with only a bit of random variation because periods are not perfectly timed. Similarly, two women can start the study not aligned but gradually align.
For comparison, if you’ve ever been stuck in traffic, you’ve probably seen how two turn signals blinking at different rates will gradually synchronize and then go out of phase again. If you’re stuck at the intersection long enough, you’ll see this happen multiple times. But to the best of my knowledge, there are no turn signal pheromones.
So we would actually expect two unaligned menstrual cycles to fall into alignment, at least temporarily. The researchers failed to account for this effect in their statistical tests.
They also made an error calculating synchronization at the beginning of the study: if one woman’s period started four days before the study began and another’s started four days after, the difference is only eight days. But periods before the beginning of the study were not counted, so the recorded difference was between the fourth day and the ﬁrst woman’s next period, as much as three weeks later.
These two errors combined meant that the scientists were able to obtain statistically significant results even when there was no synchronization effect outside what would occur without pheromones.