Weighted Hodges-Lehmann location estimator and mixture distributions

Andrey Akinshin · 2023-10-03

The classic non-weighted Hodges-Lehmann location estimator of a sample $\mathbf{x} = (x_1, x_2, \ldots, x_n)$ is defined as follows:

$$ \operatorname{HL}(\mathbf{x}) = \underset{1 \leq i \leq j \leq n}{\operatorname{median}} \left(\frac{x_i + x_j}{2} \right), $$

where $\operatorname{median}$ is the sample median. Previously, we have defined a weighted version of the Hodges-Lehmann location estimator as follows:

$$ \operatorname{WHL}(\mathbf{x}, \mathbf{w}) = \underset{1 \leq i \leq j \leq n}{\operatorname{wmedian}} \left(\frac{x_i + x_j}{2},\; w_i \cdot w_j \right), $$

where $\mathbf{w} = (w_1, w_2, \ldots, w_n)$ is the vector of weights, $\operatorname{wmedian}$ is the weighted median. For simplicity, in the scope of the current post, Hyndman-Fan Type 7 quantile estimator is used as the base for the weighted median.

In this post, we consider a numerical simulation in which we compare sampling distribution of $\operatorname{HL}$ and $\operatorname{WHL}$ in a case of mixture distribution.

Numerical simulation

We consider the following mixture of two normal distribution:

$$ \frac{1}{3} \mathcal{N}(0, 1) + \frac{2}{3} \mathcal{N}(10, 1). $$

For the sample size of $n=10$, we build two following sampling distributions:

  • $\operatorname{HL}(\mathbf{x})$, where $\mathbf{x}$ is randomly taken from $\frac{1}{3} \mathcal{N}(0, 1) + \frac{2}{3} \mathcal{N}(10, 1)$;
  • $\operatorname{WHL}(\mathbf{x}, \mathbf{w})$, where
    • $x_1, x_2, x_3, x_4, x_5$ are randomly taken from $\mathcal{N}(0, 1)$,
    • $x_6, x_7, x_8, x_9, x_{10}$ are randomly taken from $\mathcal{N}(10, 1)$,
    • $\mathbf{w} = (1, 1, 1, 1, 1, 2, 2, 2, 2, 2)$.

Here are the corresponding sampling distribution density plots:

As we can see, the rebalancing of observed sub-samples from existing modes led to higher statistical efficiency and the lack of bimodality.