# Weighted quantile estimation for a weighted mixture distribution

Update: this blog post is a part of research that aimed to build weighed versions of various quantile estimators. A preprint with final results is available on arXiv: arXiv:2304.07265 [stat.ME]. Some information in this blog post can be obsolete: please, use the preprint as the primary reference.

Let $$\mathbf{x} = \{ x_1, x_2, \ldots, x_n \}$$ be a sample of size $$n$$. We assign non-negative weight coefficients $$w_i$$ with a positive sum for all sample elements:

$\mathbf{w} = \{ w_1, w_2, \ldots, w_n \}, \quad w_i \geq 0, \quad \sum_{i=1}^{n} w_i > 0.$

For simplification, we also consider normalized (standardized) weights $$\overline{\mathbf{w}}$$:

$\overline{\mathbf{w}} = \{ \overline{w}_1, \overline{w}_2, \ldots, \overline{w}_n \}, \quad \overline{w}_i = \frac{w_i}{\sum_{i=1}^{n} w_i}.$

In the non-weighted case, we can consider a quantile estimator $$\operatorname{Q}(\mathbf{x}, p)$$ that estimates the $$p^\textrm{th}$$ quantile of the underlying distribution. We want to build a weighted quantile estimator $$\operatorname{Q}(\mathbf{x}, \mathbf{w}, p)$$ so that we can estimate the quantiles of a weighed sample.

In this post, we consider a specific problem of estimating quantiles of a weighted mixture distribution.

For example, we can consider three distributions given by their cumulative distribution functions (CDFs) $$F_X$$, $$F_Y$$, and $$F_Z$$ with weight coefficients $$w_X$$, $$w_Y$$, and $$w_Z$$. Their weighted mixture is given by $$F=\overline{w}_X F_X + \overline{w}_Y F_Y + \overline{w}_Z F_Z$$. Let us say that we have samples $$\mathbf{x}$$, $$\mathbf{y}$$, and $$\mathbf{z}$$ from $$F_X$$, $$F_Y$$, and $$F_Z$$; and we want to estimate the quantile function $$F^{-1}$$ of the mixture distribution $$F$$. If each sample contains a sufficient number of elements, we can consider a straightforward approach:

1. Obtain estimations $$\hat{F}^{-1}_X$$, $$\hat{F}^{-1}_Y$$, $$\hat{F}^{-1}_Z$$ of the distribution quantile functions based on the given samples;
2. Invert quantile functions and obtain estimations $$\hat{F}_X$$, $$\hat{F}_Y$$, $$\hat{F}_Z$$ of the CDFs for each distribution;
3. Combine these CDFs and build an estimation $$\hat{F}=\overline{w}_X\hat{F}_X+\overline{w}_Y\hat{F}_Y+\overline{w}_Z\hat{F}_Z$$ of the mixture CDF;
4. Invert $$\hat{F}$$ and get the estimation $$\hat{F}^{-1}$$ of the mixture distribution quantile function.

The approach performs well only when the sample sizes are large enough so that we can efficiently estimate sample quantiles.