Resistance to the low-density regions: the Hodges-Lehmann location estimator

Andrey Akinshin · 2022-12-13

In the previous posts, I discussed the concept of a resistance function that shows the sensitivity of the given estimator to the low-density regions. I already showed how this function behaves for the mean, the sample median, and the Harrell-Davis median. In this post, I explore this function for the Hodges-Lehmann location estimator.

The resistance function

As was shown in the previous post, we define the function of resistance to the low-density regions as follows:

$$ R(T, n, s) = \max_{s \leq k \leq n} R(T, n, s, k), $$$$ R(T, n, s, k) = |T(\mathbf{x}_k) - T(\mathbf{x}_{k-s})|, $$$$ \mathbf{x}_k = \{ \underbrace{0, 0, \ldots, 0}_{k}, \underbrace{1, 1, \ldots, 1}_{n-k} \}, $$

where $T$ is an estimator, $n$ is the sample size, $s$ is the number of sample values that jump from the first mode to the second one.

Resistance of the Hodges-Lehmann location estimator

For a sample $\mathbf{x} = \{ x_1, x_2, \ldots, x_n \}$, the Hodges-Lehmann location estimator is defined as follows:

$$ \newcommand{\HL}{\operatorname{HL}} \HL(\mathbf{x}) = \underset{i < j}{\textrm{median}} \Bigg( \frac{x_i + x_j}{2} \Bigg). $$

Now it’s time to build the plot of $R(T, n, s)$ that compares the mean, the sample median, and the Harrell-Davis median. In this experiment, we consider $n \leq 100$, $s \in \{1, 2, 3, 4, 5, 6\}$. Here are the plots:

As we can see, the resistance function value for the Hodges-Lehmann location estimator is $0.5$ when the sample size $n$ is sufficiently large.

Deep view of the Hodges-Lehmann location estimator resistance function

Now we explore how $R(\HL, n, s, k)$ depends on $k$:

As we can see, most of the $R(\HL, n, s, k)$ values are zeros expect two regions of $k$ values in which the value is $0.5$. These values correspond to the breakdown point of the Hodges-Lehmann location estimator (its asymptotic value is 29%). Thus, $R(\HL, n, s, k) = 0.5$ for the $k$ values around $0.29 \cdot n$ and $0.71 \cdot n$.

Deep view of various resistance functions

In the previous section, we got interesting plots describing $R(\HL, n, s, k)$. Now let us compare it with similar plots for other previously covered estimators (the mean, the sample median, and the Harrell-Davis median):

Compared to the sample median, the Hodges-Lehmann location estimator has two $R=0.5$ regions instead of one, but it never reaches $R=1$. Compared to the Harrell-Davis median, the Hodges-Lehmann location estimator has much higher $R(\HL, n, s) = 0.5$, but $R(\HL, n, s, k) = 0$ for the middle part of $k$ values (which is much better than the positive values of the Harrell-Davis median). Considering the extremely high Gaussian efficiency of the Hodges-Lehmann location estimator ($94\%$) and its low breakdown point ($29\%$), this estimator can be a good choice for estimating the location of multimodal distributions.