When Python's Mann-Whitney U test returns extremely distorted p-values
In the previous post, I have discussed a huge difference between p-values evaluated via the R implementation of the Mann-Whitney U test between the exact and asymptotic implementations. This issue is not unique only to R, it is relevant for other statistical packages in other languages as well. In this post, we review this problem in the Python package SciPy.
In SciPy1, the Mann-Whitney U test is available via the function mannwhitneyu. Similarly to R, it has two methods to estimate the value: the exact one and the normally approximated one. The difference between these methods can be shown using the following snippet:
import numpy as np
from scipy.stats import mannwhitneyu
x = np.arange(101, 106) # n = 5
y = np.arange(1, 51) # m = 50
print(mannwhitneyu(x, y, alternative="greater", method="exact"))
print(mannwhitneyu(x, y, alternative="greater", method="asymptotic"))
This code gives the following output:
MannwhitneyuResult(statistic=250.0, pvalue=2.874586670369134e-07)
MannwhitneyuResult(statistic=250.0, pvalue=0.00013370277743792665)
The results match the situation we observed in R: the approximated value is 465 times larger than the exact value.
However, the default strategy for choosing between exact
and asymptotic
is different between R and SciPy.
In R, wilcox.text
uses
the exact strategy when n < 50 AND m < 50
(and there are no tied values).
In SciPy, mannwhitneyu
uses
the exact strategy when n <= 8 OR m <= 8
(and there are no tied values).
Therefore, SciPy switches to the asymptotic
methods when both samples contain at least 9 elements.
Here is a snippet that illustrates this transition:
import numpy as np
from scipy.stats import mannwhitneyu
x = np.arange(101, 109) # n = 8
y = np.arange(1, 9) # m = 8
print("n = m = 8")
print("auto: " + str(mannwhitneyu(x, y, alternative="greater")))
print("exact: " + str(mannwhitneyu(x, y, alternative="greater", method="exact")))
print("asymptotic: " + str(mannwhitneyu(x, y, alternative="greater", method="asymptotic")))
print("")
x = np.arange(101, 110) # n = 9
y = np.arange(1, 10) # m = 9
print("n = m = 9")
print("auto: " + str(mannwhitneyu(x, y, alternative="greater")))
print("exact: " + str(mannwhitneyu(x, y, alternative="greater", method="exact")))
print("asymptotic: " + str(mannwhitneyu(x, y, alternative="greater", method="asymptotic")))
Here is the corresponding output:
n = m = 8
auto: MannwhitneyuResult(statistic=64.0, pvalue=7.77000777000777e-05)
exact: MannwhitneyuResult(statistic=64.0, pvalue=7.77000777000777e-05)
asymptotic: MannwhitneyuResult(statistic=64.0, pvalue=0.00046955284955859495)
n = m = 9
auto: MannwhitneyuResult(statistic=81.0, pvalue=0.00020614740103084563)
exact: MannwhitneyuResult(statistic=81.0, pvalue=2.0567667626491154e-05)
asymptotic: MannwhitneyuResult(statistic=81.0, pvalue=0.00020614740103084563)
Thus, we should be careful when using mannwhitneyu
from SciPy:
the asymptotic
method is enabled by default even for relatively small samples
which can lead to significant errors in p-value evaluation for extreme values of the U statistic.
We consider SciPy 1.10.1 ↩︎