Weighted Hodges-Lehmann location estimator and mixture distributions

DateTags

The classic non-weighted Hodges-Lehmann location estimator of a sample \(\mathbf{x} = (x_1, x_2, \ldots, x_n)\) is defined as follows:

\[\operatorname{HL}(\mathbf{x}) = \underset{1 \leq i \leq j \leq n}{\operatorname{median}} \left(\frac{x_i + x_j}{2} \right), \]

where \(\operatorname{median}\) is the sample median. Previously, we have defined a weighted version of the Hodges-Lehmann location estimator as follows:

\[\operatorname{WHL}(\mathbf{x}, \mathbf{w}) = \underset{1 \leq i \leq j \leq n}{\operatorname{wmedian}} \left(\frac{x_i + x_j}{2},\; w_i \cdot w_j \right), \]

where \(\mathbf{w} = (w_1, w_2, \ldots, w_n)\) is the vector of weights, \(\operatorname{wmedian}\) is the weighted median. For simplicity, in the scope of the current post, Hyndman-Fan Type 7 quantile estimator is used as the base for the weighted median.

In this post, we consider a numerical simulation in which we compare sampling distribution of \(\operatorname{HL}\) and \(\operatorname{WHL}\) in a case of mixture distribution.

Numerical simulation

We consider the following mixture of two normal distribution:

\[\frac{1}{3} \mathcal{N}(0, 1) + \frac{2}{3} \mathcal{N}(10, 1). \]

For the sample size of \(n=10\), we build two following sampling distributions:

  • \(\operatorname{HL}(\mathbf{x})\), where \(\mathbf{x}\) is randomly taken from \(\frac{1}{3} \mathcal{N}(0, 1) + \frac{2}{3} \mathcal{N}(10, 1)\);
  • \(\operatorname{WHL}(\mathbf{x}, \mathbf{w})\), where
    • \(x_1, x_2, x_3, x_4, x_5\) are randomly taken from \(\mathcal{N}(0, 1)\),
    • \(x_6, x_7, x_8, x_9, x_{10}\) are randomly taken from \(\mathcal{N}(10, 1)\),
    • \(\mathbf{w} = (1, 1, 1, 1, 1, 2, 2, 2, 2, 2)\).

Here are the corresponding sampling distribution density plots:

As we can see, the rebalancing of observed sub-samples from existing modes led to higher statistical efficiency and the lack of bimodality.