# Hodges-Lehmann Gaussian efficiency: location shift vs. shift of locations

Let us consider two samples $$\mathbf{x} = (x_1, x_2, \ldots, x_n)$$ and $$\mathbf{y} = (y_1, y_2, \ldots, y_m)$$. The one-sample Hodges-Lehman location estimator is defined as the median of the Walsh (pairwise) averages:

$\operatorname{HL}(\mathbf{x}) = \underset{1 \leq i \leq j \leq n}{\operatorname{median}} \left(\frac{x_i + x_j}{2} \right), \quad \operatorname{HL}(\mathbf{y}) = \underset{1 \leq i \leq j \leq m}{\operatorname{median}} \left(\frac{y_i + y_j}{2} \right).$

For these two samples, we can also define the shift between these two estimations:

$\Delta_{\operatorname{HL}}(\mathbf{x}, \mathbf{y}) = \operatorname{HL}(\mathbf{x}) - \operatorname{HL}(\mathbf{y}).$

The two-sample Hodges-Lehmann location shift estimator is defined as the median of pairwise differences:

$\operatorname{HL}(\mathbf{x}, \mathbf{y}) = \underset{1 \leq i \leq n,\,\, 1 \leq j \leq m}{\operatorname{median}} \left(x_i - y_j \right).$

Previously, I already compared the location shift estimator with the difference of median estimators (1, 2). In this post, I compare the difference between two location estimations and the shift estimations in terms of Gaussian efficiency. Before I started this study, I expected that $$\operatorname{HL}$$ should be more efficient than $$\Delta_{\operatorname{HL}}$$. Let us find out if my intuition is correct or not!

For the baseline, we consider the difference between the means:

$\Delta_{\operatorname{mean}}(\mathbf{x}, \mathbf{y}) = \operatorname{mean}(\mathbf{x}) - \operatorname{mean}(\mathbf{y}).$

The relative Gaussian efficiency of $$\Delta_{\operatorname{HL}}$$ and $$\operatorname{HL}$$ to $$\Delta_{\operatorname{mean}}$$ is defined as follows:

$\operatorname{eff}_{\mathcal{N}}(\Delta_{\operatorname{HL}}) = \frac{\mathbb{V}_{\mathcal{N}}[\Delta_{\operatorname{mean}}]}{\mathbb{V}_{\mathcal{N}}[\Delta_{\operatorname{HL}}]}, \quad \operatorname{eff}_{\mathcal{N}}(\operatorname{HL}) = \frac{\mathbb{V}_{\mathcal{N}}[\Delta_{\operatorname{mean}}]}{\mathbb{V}_{\mathcal{N}}[\operatorname{HL}]}.$

### Numerical simulations

We conduct the following simulation:

• Enumerate the sample size $$n$$ from $$3$$ to $$50$$.
• For each $$n$$, generate $$500\,000$$ pairs of random samples from $$\mathcal{N}(0, 1)$$.
• For each pair of samples, estimate the shift between them using $$\Delta_{\operatorname{HL}}$$, $$\operatorname{HL}$$, and $$\Delta_{\operatorname{mean}}$$.
• Calculate the Gaussian efficiency of $$\Delta_{\operatorname{HL}}$$ and $$\operatorname{HL}$$ using the above equations.

Here are the results:

Surprisingly, but the shift of the Hodges-Lehmann location estimators $$\Delta_{\operatorname{HL}}$$ turned out to be more efficient under normality than Hodges-Lehmann location shift estimator $$\operatorname{HL}$$ (with the only exception of $$n=m=4$$). For $$n,m \geq 15$$, the difference is almost negligible, but it’s tangible for small sample sizes.

• [Hodges1963]
Hodges, J. L., and E. L. Lehmann. 1963. Estimates of location based on rank tests. The Annals of Mathematical Statistics 34 (2):598–611.
DOI:10.1214/aoms/1177704172