# Resistance to the low-density regions: the Hodges-Lehmann location estimator

In the previous posts, I discussed the concept of a resistance function that shows the sensitivity of the given estimator to the low-density regions. I already showed how this function behaves for the mean, the sample median, and the Harrell-Davis median. In this post, I explore this function for the Hodges-Lehmann location estimator.

### The resistance function

As was shown in the previous post, we define the function of resistance to the low-density regions as follows:

$R(T, n, s) = \max_{s \leq k \leq n} R(T, n, s, k),$

$R(T, n, s, k) = |T(\mathbf{x}_k) - T(\mathbf{x}_{k-s})|,$

$\mathbf{x}_k = \{ \underbrace{0, 0, \ldots, 0}_{k}, \underbrace{1, 1, \ldots, 1}_{n-k} \},$

where $$T$$ is an estimator, $$n$$ is the sample size, $$s$$ is the number of sample values that jump from the first mode to the second one.

### Resistance of the Hodges-Lehmann location estimator

For a sample $$\mathbf{x} = \{ x_1, x_2, \ldots, x_n \}$$, the Hodges-Lehmann location estimator is defined as follows:

$\newcommand{\HL}{\operatorname{HL}} \HL(\mathbf{x}) = \underset{i < j}{\textrm{median}} \Bigg( \frac{x_i + x_j}{2} \Bigg).$

Now it’s time to build the plot of $$R(T, n, s)$$ that compares the mean, the sample median, and the Harrell-Davis median. In this experiment, we consider $$n \leq 100$$, $$s \in \{1, 2, 3, 4, 5, 6\}$$. Here are the plots:

As we can see, the resistance function value for the Hodges-Lehmann location estimator is $$0.5$$ when the sample size $$n$$ is sufficiently large.

### Deep view of the Hodges-Lehmann location estimator resistance function

Now we explore how $$R(\HL, n, s, k)$$ depends on $$k$$:

As we can see, most of the $$R(\HL, n, s, k)$$ values are zeros expect two regions of $$k$$ values in which the value is $$0.5$$. These values correspond to the breakdown point of the Hodges-Lehmann location estimator (its asymptotic value is 29%). Thus, $$R(\HL, n, s, k) = 0.5$$ for the $$k$$ values around $$0.29 \cdot n$$ and $$0.71 \cdot n$$.

### Deep view of various resistance functions

In the previous section, we got interesting plots describing $$R(\HL, n, s, k)$$. Now let us compare it with similar plots for other previously covered estimators (the mean, the sample median, and the Harrell-Davis median):

Compared to the sample median, the Hodges-Lehmann location estimator has two $$R=0.5$$ regions instead of one, but it never reaches $$R=1$$. Compared to the Harrell-Davis median, the Hodges-Lehmann location estimator has much higher $$R(\HL, n, s) = 0.5$$, but $$R(\HL, n, s, k) = 0$$ for the middle part of $$k$$ values (which is much better than the positive values of the Harrell-Davis median). Considering the extremely high Gaussian efficiency of the Hodges-Lehmann location estimator ($$94\%$$) and its low breakdown point ($$29\%$$), this estimator can be a good choice for estimating the location of multimodal distributions.

Share: