Statistical efficiency of the Hodges-Lehmann median estimator, Part 2



In the previous post, we evaluated the relative statistical efficiency of the Hodges-Lehmann median estimator against the sample median under the normal distribution. In this post, we extended this experiment to a set of various light-tailed and heavy-tailed distributions.

Introduction

The Hodges-Lehmann median estimator is defined as the sample median of all pair-wise averages of the given sample. However, there are various ways to define an explicit formula. Following an approach from [Park2020], we consider three options:

\[\operatorname{HL}_1 = \underset{i < j}{\operatorname{median}}\Big(\frac{x_i + x_j}{2}\Big),\quad \operatorname{HL}_2 = \underset{i \leq j}{\operatorname{median}}\Big(\frac{x_i + x_j}{2}\Big),\quad \operatorname{HL}_3 = \underset{\forall i, j}{\operatorname{median}}\Big(\frac{x_i + x_j}{2}\Big). \]

We also consider the classic Harrell-Davis quantile estimator which can also be used to estimate the median:

\[Q_\textrm{HD}(p) = \sum_{i=1}^{n} W_{i} \cdot x_{(i)}, \quad W_{i} = I_{i/n}(a, b) - I_{(i-1)/n}(a, b), \quad a = p(n+1),\; b = (1-p)(n+1) \]

where \(I_t(a, b)\) denotes the regularized incomplete beta function, \(x_{(i)}\) is the \(i^\textrm{th}\) order statistics.

In addition, we consider the trimmed Harrell-Davis quantile estimator based on the highest density interval of size \(1/\sqrt{n}\) (we denote it as \(Q_{\operatorname{THD-SQRT}}\)).

Simulation study

In order to evaluate the relative statistical efficiency of the listed median estimators against the sample median, we use the following scheme:

  • Enumerate different sample size values \(n\) from \(3\) to \(20\).
  • Enumerate various light-tailed and heavy-tailed distributions
  • For each sample size, we generate \(1\,000\) samples from the given distribution.
  • For each sample, we estimate the median using the sample median, the Harrell-Davis quantile estimator \(Q_{\operatorname{HD}}\), the trimmed Harrell-Davis quantile estimator \(Q_{\operatorname{THD-SQRT}}\), and three versions of the Hodges-Lehmann median estimator \(Q_{\operatorname{HL1}}\), \(Q_{\operatorname{HL2}}\), \(Q_{\operatorname{HL3}}\).
  • Estimated the relative statistical efficiency of each case.

The results of the performed simulation study are shown in the following figure:


As we can see, the Hodges-Lehmann median estimator works great in the light-tailed case. However, in the heavy-tailed case, the Harrell-Davis quantile estimator and its trimmed modifications have better relative statistical efficiency.

References

  • [Harrell1982]
    Harrell, F.E. and Davis, C.E., 1982. A new distribution-free quantile estimator. Biometrika, 69(3), pp.635-640.
    https://doi.org/10.2307/2335999
  • [Park2020]
    Park, Chanseok, Haewon Kim, and Min Wang. “Investigation of finite-sample properties of robust location and scale estimators.” Communications in Statistics-Simulation and Computation (2020): 1-27.
    https://doi.org/10.1080/03610918.2019.1699114