Hodges-Lehmann Gaussian efficiency: location shift vs. shift of locations

by Andrey Akinshin · 2023-09-12

Let us consider two samples $\mathbf{x} = (x_1, x_2, \ldots, x_n)$ and $\mathbf{y} = (y_1, y_2, \ldots, y_m)$. The one-sample Hodges-Lehman location estimator is defined as the median of the Walsh (pairwise) averages:

$$ \operatorname{HL}(\mathbf{x}) = \underset{1 \leq i \leq j \leq n}{\operatorname{median}} \left(\frac{x_i + x_j}{2} \right), \quad \operatorname{HL}(\mathbf{y}) = \underset{1 \leq i \leq j \leq m}{\operatorname{median}} \left(\frac{y_i + y_j}{2} \right). $$

For these two samples, we can also define the shift between these two estimations:

$$ \Delta_{\operatorname{HL}}(\mathbf{x}, \mathbf{y}) = \operatorname{HL}(\mathbf{x}) - \operatorname{HL}(\mathbf{y}). $$

The two-sample Hodges-Lehmann location shift estimator is defined as the median of pairwise differences:

$$ \operatorname{HL}(\mathbf{x}, \mathbf{y}) = \underset{1 \leq i \leq n,\,\, 1 \leq j \leq m}{\operatorname{median}} \left(x_i - y_j \right). $$

Previously, I already compared the location shift estimator with the difference of median estimators (1, 2). In this post, I compare the difference between two location estimations and the shift estimations in terms of Gaussian efficiency. Before I started this study, I expected that $\operatorname{HL}$ should be more efficient than $\Delta_{\operatorname{HL}}$. Let us find out if my intuition is correct or not!

For the baseline, we consider the difference between the means:

$$ \Delta_{\operatorname{mean}}(\mathbf{x}, \mathbf{y}) = \operatorname{mean}(\mathbf{x}) - \operatorname{mean}(\mathbf{y}). $$

The relative Gaussian efficiency of $\Delta_{\operatorname{HL}}$ and $\operatorname{HL}$ to $\Delta_{\operatorname{mean}}$ is defined as follows:

$$ \operatorname{eff}_{\mathcal{N}}(\Delta_{\operatorname{HL}}) = \frac{\mathbb{V}_{\mathcal{N}}[\Delta_{\operatorname{mean}}]}{\mathbb{V}_{\mathcal{N}}[\Delta_{\operatorname{HL}}]}, \quad \operatorname{eff}_{\mathcal{N}}(\operatorname{HL}) = \frac{\mathbb{V}_{\mathcal{N}}[\Delta_{\operatorname{mean}}]}{\mathbb{V}_{\mathcal{N}}[\operatorname{HL}]}. $$

Numerical simulations

We conduct the following simulation:

  • Enumerate the sample size $n$ from $3$ to $50$.
  • For each $n$, generate $500\,000$ pairs of random samples from $\mathcal{N}(0, 1)$.
  • For each pair of samples, estimate the shift between them using $\Delta_{\operatorname{HL}}$, $\operatorname{HL}$, and $\Delta_{\operatorname{mean}}$.
  • Calculate the Gaussian efficiency of $\Delta_{\operatorname{HL}}$ and $\operatorname{HL}$ using the above equations.

Here are the results:

Surprisingly, but the shift of the Hodges-Lehmann location estimators $\Delta_{\operatorname{HL}}$ turned out to be more efficient under normality than Hodges-Lehmann location shift estimator $\operatorname{HL}$ (with the only exception of $n=m=4$). For $n,m \geq 15$, the difference is almost negligible, but it’s tangible for small sample sizes.

References

  • [Hodges1963]
    Hodges, J. L., and E. L. Lehmann. 1963. Estimates of location based on rank tests. The Annals of Mathematical Statistics 34 (2):598–611.
    DOI:10.1214/aoms/1177704172