Hodges-Lehmann Gaussian efficiency: location shift vs. shift of locations


Let us consider two samples \(\mathbf{x} = (x_1, x_2, \ldots, x_n)\) and \(\mathbf{y} = (y_1, y_2, \ldots, y_m)\). The one-sample Hodges-Lehman location estimator is defined as the median of the Walsh (pairwise) averages:

\[\operatorname{HL}(\mathbf{x}) = \underset{1 \leq i \leq j \leq n}{\operatorname{median}} \left(\frac{x_i + x_j}{2} \right), \quad \operatorname{HL}(\mathbf{y}) = \underset{1 \leq i \leq j \leq m}{\operatorname{median}} \left(\frac{y_i + y_j}{2} \right). \]

For these two samples, we can also define the shift between these two estimations:

\[\Delta_{\operatorname{HL}}(\mathbf{x}, \mathbf{y}) = \operatorname{HL}(\mathbf{x}) - \operatorname{HL}(\mathbf{y}). \]

The two-sample Hodges-Lehmann location shift estimator is defined as the median of pairwise differences:

\[\operatorname{HL}(\mathbf{x}, \mathbf{y}) = \underset{1 \leq i \leq n,\,\, 1 \leq j \leq m}{\operatorname{median}} \left(x_i - y_j \right). \]

Previously, I already compared the location shift estimator with the difference of median estimators (1, 2). In this post, I compare the difference between two location estimations and the shift estimations in terms of Gaussian efficiency. Before I started this study, I expected that \(\operatorname{HL}\) should be more efficient than \(\Delta_{\operatorname{HL}}\). Let us find out if my intuition is correct or not!

For the baseline, we consider the difference between the means:

\[\Delta_{\operatorname{mean}}(\mathbf{x}, \mathbf{y}) = \operatorname{mean}(\mathbf{x}) - \operatorname{mean}(\mathbf{y}). \]

The relative Gaussian efficiency of \(\Delta_{\operatorname{HL}}\) and \(\operatorname{HL}\) to \(\Delta_{\operatorname{mean}}\) is defined as follows:

\[\operatorname{eff}_{\mathcal{N}}(\Delta_{\operatorname{HL}}) = \frac{\mathbb{V}_{\mathcal{N}}[\Delta_{\operatorname{mean}}]}{\mathbb{V}_{\mathcal{N}}[\Delta_{\operatorname{HL}}]}, \quad \operatorname{eff}_{\mathcal{N}}(\operatorname{HL}) = \frac{\mathbb{V}_{\mathcal{N}}[\Delta_{\operatorname{mean}}]}{\mathbb{V}_{\mathcal{N}}[\operatorname{HL}]}. \]

Numerical simulations

We conduct the following simulation:

  • Enumerate the sample size \(n\) from \(3\) to \(50\).
  • For each \(n\), generate \(500\,000\) pairs of random samples from \(\mathcal{N}(0, 1)\).
  • For each pair of samples, estimate the shift between them using \(\Delta_{\operatorname{HL}}\), \(\operatorname{HL}\), and \(\Delta_{\operatorname{mean}}\).
  • Calculate the Gaussian efficiency of \(\Delta_{\operatorname{HL}}\) and \(\operatorname{HL}\) using the above equations.

Here are the results:

Surprisingly, but the shift of the Hodges-Lehmann location estimators \(\Delta_{\operatorname{HL}}\) turned out to be more efficient under normality than Hodges-Lehmann location shift estimator \(\operatorname{HL}\) (with the only exception of \(n=m=4\)). For \(n,m \geq 15\), the difference is almost negligible, but it’s tangible for small sample sizes.

References

  • [Hodges1963]
    Hodges, J. L., and E. L. Lehmann. 1963. Estimates of location based on rank tests. The Annals of Mathematical Statistics 34 (2):598–611.
    DOI:10.1214/aoms/1177704172