p-value distribution of the Mann–Whitney U test in the finite case

When we work with null hypothesis significance testing and the null hypothesis is true, the distribution of observed p-value is asymptotically uniform. However, the distribution shape is not always uniform in the finite case. For example, when we work with rank-based tests like the Mann–Whitney U test, the distribution of the p-values is discrete with a limited set of possible values. This should be taken into account when we design a testing procedure for small samples and choose the significance level.

Previously, we already discussed the minimum reasonable significance level of the Mann-Whitney U test for small samples. In this post, we explore the full distribution of the p-values for this case.

Student’s t-test

We start with the Student’s t-test to check the p-value distribution in the “simple” case. Let’s generate $$10\,000$$ pairs of samples of size $$5$$ from the standard normal distribution, calculate the p-value using the two-sided Student’s t-test, and build the density plot for the observed p-values:

As we can see, the distribution looks uniform. And this is the desired property of a statistical test. Indeed, the specified significant level $$\alpha$$ is used to specify the desired false-positive rate. Mathematically, it can be expressed as $$\mathbb{P}(p \leq \alpha) = \alpha$$, which is a definition of the uniform distribution. Now let us see what would happen if we switch to the Mann–Whitney U test.

Mann–Whitney U test

Now we generate $$10\,000$$ pairs of samples of size $$n$$ from the standard normal distribution, calculate the p-value using the two-sided Mann–Whitney U test, and build the density plot for the observed p-values. Here is the result for $$n=3$$:

As we can see, if both samples contain exactly three elements each, the p-value always belongs to the following set (assuming the distribution is continuous, the samples do not contain ties): $$\{ 0.1, 0.2, 0.4, 0.7, 1.0 \}$$. Based on the above plot, we can even guess the probability of observing each p-value:

$\mathbb{P}(p = 0.1) = 0.1,$

$\mathbb{P}(p = 0.2) = 0.1,$

$\mathbb{P}(p = 0.4) = 0.2,$

$\mathbb{P}(p = 0.7) = 0.3,$

$\mathbb{P}(p = 1.0) = 0.3.$

Thus, $$\mathbb{P}(p \leq \alpha) = \alpha$$ is true only for $$\alpha$$ values from the same set. However, it is not true for other $$\alpha$$ values. Thus,

$\mathbb{P}(p \leq \alpha) = 0,\quad\textrm{for}\quad \alpha \in [0;0.1),$

$\mathbb{P}(p \leq \alpha) = 0.1,\quad\textrm{for}\quad \alpha \in [0.1;0.2),$

$\mathbb{P}(p \leq \alpha) = 0.2,\quad\textrm{for}\quad \alpha \in [0.2;0.4),$

$\mathbb{P}(p \leq \alpha) = 0.4,\quad\textrm{for}\quad \alpha \in [0.4;0.7),$

$\mathbb{P}(p \leq \alpha) = 0.7,\quad\textrm{for}\quad \alpha \in [0.7;1.0),$

If changes of $$\alpha$$ within any of these intervals (e.g., from $$\alpha = 0.19$$ to $$\alpha = 0.11$$) will not affect the test result.

Now let us look at the same distribution for $$n=5$$, $$n=7$$, and $$n=15$$:

As we can see, as $$n$$ grows, we get more distinct values in the observed distribution of p-values, but the list of the exact values is always limited. It can also be easily shown that when we compare two samples of sizes $$n$$ and $$m$$ using the two-sided Mann–Whitney U test, all possible p-values can be expressed as $$2k/C_{n+m}^n,\;k\in \mathbb{N}$$, and $$2/C_{n+m}^n$$ is the minimum possible value.

The source code of this post and all the relevant files are available on GitHub.
Share: