Median of the shifts vs. shift of the medians, Part 2: Gaussian efficiency

Andrey Akinshin · 2022-12-27
This post evaluates the Gaussian efficiency of the Hodges–Lehmann location shift estimator against the difference of sample medians. In Pragmastat, Shift formalizes the more efficient approach. Confidence intervals are available via ShiftBounds.

In the previous post, we discussed the difference between shifts of the medians and the Hodges-Lehmann EstimatorHodges-Lehmann location shift estimator. In this post, we conduct a simple numerical simulation to evaluate the Gaussian efficiency of these two estimators.

Estimators

We consider two samples of equal size $n$: $x = \{ x_1, x_2, \ldots, x_n \}$, $y = \{ y_1, y_2, \ldots, y_n \}$. We define the shifts of the medians as

$$ \newcommand{\DSM}{\Delta_{\operatorname{SM}}} \DSM = \operatorname{median}(y) - \operatorname{median}(x). $$

and the Hodges-Lehmann location shift estimator as

$$ \newcommand{\DHL}{\Delta_{\operatorname{HL}}} \DHL = \operatorname{median}(y_j - x_i). $$

We also consider the classic estimator that estimates the difference of means:

$$ \newcommand{\Dbase}{\Delta_{\operatorname{0}}} \Dbase = \operatorname{mean}(y) - \operatorname{mean}(x). $$

The Gaussian efficiency of $\DSM$ and $DHL$ can be defined as follows:

$$ e(\DSM) = \frac{\mathbb{V}[\Dbase]}{\mathbb{V}[\DSM]},\quad e(\DHL) = \frac{\mathbb{V}[\Dbase]}{\mathbb{V}[\DHL]}. $$

Numerical simulations

We conduct the following simulation:

  • Enumerate the sample size $n$ from $3$ to $100$.
  • For each $n$, generate $100\,000$ pairs of random samples from $\mathcal{N}(0, 1)$.
  • For each pair of samples, estimate the shift between them using $\Dbase$, $\DSM$, and $\DHL$.
  • Calculate the Gaussian efficiency of $\DSM$ and $DHL$ using the above equations.

Here are the results:

As we can see, the Hodges-Lehmann location shift estimator is much more efficient than the shift of the medians.

References

  • [Hodges1963]
    Hodges, J. L., and E. L. Lehmann. 1963. Estimates of location based on rank tests. The Annals of Mathematical Statistics 34 (2):598–611.
    DOI:10.1214/aoms/1177704172