Weighted quantile estimation for a weighted mixture distribution

Update: this blog post is a part of research that aimed to build weighed versions of various quantile estimators. A preprint with final results is available on arXiv: arXiv:2304.07265 [stat.ME]. Some information in this blog post can be obsolete: please, use the preprint as the primary reference.

Let \(\mathbf{x} = \{ x_1, x_2, \ldots, x_n \}\) be a sample of size \(n\). We assign non-negative weight coefficients \(w_i\) with a positive sum for all sample elements:

\[\mathbf{w} = \{ w_1, w_2, \ldots, w_n \}, \quad w_i \geq 0, \quad \sum_{i=1}^{n} w_i > 0. \]

For simplification, we also consider normalized (standardized) weights \(\overline{\mathbf{w}}\):

\[\overline{\mathbf{w}} = \{ \overline{w}_1, \overline{w}_2, \ldots, \overline{w}_n \}, \quad \overline{w}_i = \frac{w_i}{\sum_{i=1}^{n} w_i}. \]

In the non-weighted case, we can consider a quantile estimator \(\operatorname{Q}(\mathbf{x}, p)\) that estimates the \(p^\textrm{th}\) quantile of the underlying distribution. We want to build a weighted quantile estimator \(\operatorname{Q}(\mathbf{x}, \mathbf{w}, p)\) so that we can estimate the quantiles of a weighed sample.

In this post, we consider a specific problem of estimating quantiles of a weighted mixture distribution.

For example, we can consider three distributions given by their cumulative distribution functions (CDFs) \(F_X\), \(F_Y\), and \(F_Z\) with weight coefficients \(w_X\), \(w_Y\), and \(w_Z\). Their weighted mixture is given by \(F=\overline{w}_X F_X + \overline{w}_Y F_Y + \overline{w}_Z F_Z\). Let us say that we have samples \(\mathbf{x}\), \(\mathbf{y}\), and \(\mathbf{z}\) from \(F_X\), \(F_Y\), and \(F_Z\); and we want to estimate the quantile function \(F^{-1}\) of the mixture distribution \(F\). If each sample contains a sufficient number of elements, we can consider a straightforward approach:

  1. Obtain estimations \(\hat{F}^{-1}_X\), \(\hat{F}^{-1}_Y\), \(\hat{F}^{-1}_Z\) of the distribution quantile functions based on the given samples;
  2. Invert quantile functions and obtain estimations \(\hat{F}_X\), \(\hat{F}_Y\), \(\hat{F}_Z\) of the CDFs for each distribution;
  3. Combine these CDFs and build an estimation \(\hat{F}=\overline{w}_X\hat{F}_X+\overline{w}_Y\hat{F}_Y+\overline{w}_Z\hat{F}_Z\) of the mixture CDF;
  4. Invert \(\hat{F}\) and get the estimation \(\hat{F}^{-1}\) of the mixture distribution quantile function.

The approach performs well only when the sample sizes are large enough so that we can efficiently estimate sample quantiles.