## Corner case of the Brunner–Munzel test

The Brunner–Munzel test is a nonparametric significance test, which can be considered an alternative to the Mann–Whitney U test. However, the Brunner–Munzel test has a corner case that can cause some practical issues with applying this test to real data. In this post, I briefly discuss the test itself and the corresponding corner case.

## Examples of the Mann–Whitney U test misuse cases

The Mann–Whitney U test is one of the most popular nonparametric statistical tests. Its alternative hypothesis claims that one distribution is stochastically greater than the other. However, people often misuse this test and try to apply it to check if two nonparametric distributions are not identical or that there is a difference in distribution medians (while there are no additional assumptions on the shapes of the distributions). In this post, I show several cases in which the Mann–Whitney U test is not applicable for comparing two distributions.

## Types of finite-sample consistency with the standard deviation

Let us say we have a robust dispersion estimator $$\operatorname{T}(X)$$. If it is asymptotically consistent with the standard deviation, we can use such an estimator as a robust replacement for the standard deviation under normality. Thanks to asymptotical consistency, we can use the estimator “as is” for large samples. However, if the number of sample elements is small, we typically need finite-sample bias-correction factors to make the estimator unbiased. Here we should clearly understand what kind of consistency we need.

There are various ways to estimate the standard deviation. Let us consider a sample of random variables $$X = \{ X_1, X_2, \ldots, X_n \}$$. The most popular equation of the standard deviation is given by

$s(X) = \sqrt{\frac{1}{n - 1} \sum_{i=1}^n (X_i - \overline{X})^2}.$

Using this definition, we can get an unbiased estimator for the population variance: $$\mathbb{E}[s^2(X)] = 1$$. However, it is a biased estimator for the population standard deviation: $$\mathbb{E}[s(X)] \neq 1$$. To obtain to corresponding unbiased estimator, we should use $$s(\mathbf{x}) \cdot c_4(n)$$, where $$c_4(n)$$ is a correction factor defined as follows:

$c_4(n) = \sqrt{\frac{2}{n-1}} \cdot \frac{\Gamma\left(\frac{n}{2}\right)}{\Gamma\left(\frac{n-1}{2}\right)}.$

When we define finite-sample bias-correction factors for a robust standard deviation replacement, we should choose which kind of consistency we need. In this post, I briefly explore available options.

## Thoughts about outlier removal and ozone holes

Imagine you work with some data and assume that the underlying distribution is approximately normal. In such cases, the data analysis typically involves non-robust statistics like the mean and the standard deviation. While these metrics are highly efficient under normality, they make the analysis procedure fragile: a single extreme value can corrupt all the results. You may not expect any significant outliers, but you can never be 100% sure. To avoid unexpected surprises and ensure the reliability of the results, it may be tempting to automatically exclude all outliers from the collected samples. While this approach is widely adopted, it conceals an essential part of the obtained data and can lead to fallacious conclusions.

Let me recite a classic story about ozone holes, which is typically used to illustrate the danger of blind outlier removal:

## Nonparametric effect size: Cohen's d vs. Glass's delta

In the previous posts, I discussed the idea of nonparametric effect size measures consistent with Cohen’s d under normality. However, Cohen’s d is not always the best effect size measure, even in the normal case.

In this post, we briefly discuss a case study in which a nonparametric version of Glass’s delta is preferable than the previously suggested Cohen’s d-consistent measure.

## Trinal statistical thresholds

When we design a test for practical significance, which compares two samples, we should somehow express the threshold. The most popular options are the shift, the ratio, and the effect size. Unfortunately, if we have little information about the underlying distributions, it’s hard to get a reliable test based only on a single threshold. And it’s almost impossible to define a generic threshold that fits all situations. After struggling with a lot of different thresholding approaches, I came up with the idea of setting a trinal threshold that includes three individual thresholds for the shift, the ratio, and the effect size.

In this post, I show some examples in which a single threshold is not enough.

## Trimmed Hodges-Lehmann location estimator, Part 2: Gaussian efficiency

In the previous post, we introduced the trimmed Hodges-Lehman location estimator. For a sample $$\mathbf{x} = \{ x_1, x_2, \ldots, x_n \}$$, it is defined as follows:

$\operatorname{THL}(\mathbf{x}, k) = \underset{k < i < j \leq n - k}{\operatorname{median}}\biggl(\frac{x_{(i)} + x_{(j)}}{2}\biggr).$

We also derived the exact expression for its asymptotic and finite-sample breakdown point values. In this post, we explore its Gaussian efficiency.

## Trimmed Hodges-Lehmann location estimator, Part 1: breakdown point

For a sample $$\mathbf{x} = \{ x_1, x_2, \ldots, x_n \}$$, the Hodges-Lehmann location estimator is defined as follows:

$\operatorname{HL}(\mathbf{x}) = \underset{i < j}{\operatorname{median}}\biggl(\frac{x_i + x_j}{2}\biggr).$

Its asymptotic Gaussian efficiency is $$\approx 96\%$$, while its asymptotic breakdown point is $$\approx 29\%$$. This makes the Hodges-Lehmann location estimator a decent robust alternative to the mean.

While the Gaussian efficiency is quite impressive (almost as efficient as the mean), the breakdown point is not as great as in the case of the median (which has a breakdown point of $$50\%$$). Could we change this trade-off a little bit and make this estimator more robust, sacrificing a small portion of efficiency? Yes, we can!

In this post, I want to present the idea of the trimmed Hodges-Lehmann location estimator and provide the exact equation for its breakdown point.

## Median of the shifts vs. shift of the medians, Part 2: Gaussian efficiency

In the previous post, we discussed the difference between shifts of the medians and the Hodges-Lehmann location shift estimator. In this post, we conduct a simple numerical simulation to evaluate the Gaussian efficiency of these two estimators.

## Median of the shifts vs. shift of the medians, Part 1

Let us say that we have two samples $$x = \{ x_1, x_2, \ldots, x_n \}$$, $$y = \{ y_1, y_2, \ldots, y_m \}$$, and we want to estimate the shift of locations between them. In the case of the normal distribution, this task is quite simple and has a lot of straightforward solutions. However, in the nonparametric case, the location shift is an ambiguous metric which heavily depends on the chosen estimator. In the context of this post, we consider two approaches that may look similar. The first one is the shift of the medians:

$\newcommand{\DSM}{\Delta_{\operatorname{SM}}} \DSM = \operatorname{median}(y) - \operatorname{median}(x).$

The second one of the median of all pairwise shifts, also known as the Hodges-Lehmann location shift estimator:

$\newcommand{\DHL}{\Delta_{\operatorname{HL}}} \DHL = \operatorname{median}(y_j - x_i).$

In the case of the normal distributions, these estimators are consistent. However, this post will show an example of multimodal distributions that lead to opposite signs of $$\DSM$$ and $$\DHL$$.