Quantile estimators based on k order statistics, Part 7: Optimal threshold for the trimmed Harrell-Davis quantile estimator
In the previous post, we have obtained a nice quantile estimator. To be specific, we considered a trimmed modification of the Harrell-Davis quantile estimator based on the highest density interval of the given size. The interval size is a parameter that controls the trade-off between statistical efficiency and robustness. While it’s nice to have the ability to control this trade-off, there is also a need for the default value, which could be used as a starting point when we have neither estimator breakdown point requirements nor prior knowledge about distribution properties.
After a series of unsuccessful attempts, it seems that I have found an acceptable solution.
We should build the new estimator based on
All posts from this series:
- Quantile estimators based on k order statistics, Part 1: Motivation (2021-08-03)
- Quantile estimators based on k order statistics, Part 2: Extending Hyndman-Fan equations (2021-08-10)
- Quantile estimators based on k order statistics, Part 3: Playing with the Beta function (2021-08-17)
- Quantile estimators based on k order statistics, Part 4: Adopting trimmed Harrell-Davis quantile estimator (2021-08-24)
- Quantile estimators based on k order statistics, Part 5: Improving trimmed Harrell-Davis quantile estimator (2021-08-31)
- Quantile estimators based on k order statistics, Part 6: Continuous trimmed Harrell-Davis quantile estimator (2021-09-07)
- Quantile estimators based on k order statistics, Part 7: Optimal threshold for the trimmed Harrell-Davis quantile estimator (2021-09-14)
- Quantile estimators based on k order statistics, Part 8: Winsorized Harrell-Davis quantile estimator (2021-09-21)
The approach
The general idea is the same that was used in one of the previous posts.
We express the estimation of the
where
In the case of the trimmed Harrell-Davis quantile estimator, we use only a part of the Beta distribution
inside the
In the previous post, we discussed the idea of choosing
If we don’t have any specific requirements for the estimator (e.g., the desired breakdown point) and we have no prior knowledge about distribution properties (e.g., the presence of a heavy tail), such an estimator looks like a good default option.
Numerical simulations
The relative efficiency value depends on five parameters:
- Target quantile estimator
- Baseline quantile estimator
- Estimated quantile
- Sample size
- Distribution
As target quantile estimators, we use:
HD
: Classic Harrell-Davis quantile estimatorTHD-SQRT
: The described above trimmed modification of the Harrell-Davis quantile estimator based on highest density interval of size .
The conventional baseline quantile estimator in such simulations is the traditional quantile estimator that is defined as a linear combination of two subsequent order statistics. To be more specific, we are going to use the Type 7 quantile estimator from the Hyndman-Fan classification or HF7. It can be expressed as follows (assuming one-based indexing):
Thus, we are going to estimate the relative efficiency of
the trimmed Harrell-Davis quantile estimator with different percentage values against
the traditional quantile estimator HF7.
For the
where
We are also going to use the following distributions:
Uniform(0,1)
: Continuous uniform distribution;Tri(0,1,2)
: Triangular distribution;Tri(0,0.2,2)
: Triangular distribution;Beta(2,4)
: Beta distribution;Beta(2,10)
: Beta distribution;Normal(0,1^2)
: Standard normal distribution;Weibull(1,2)
: Weibull distribution;Student(3)
: Student distribution;Gumbel(0,1)
: Gumbel distribution;Exp(1)
: Exponential distribution;Cauchy(0,1)
: Standard Cauchy distribution;Pareto(1,0.5)
: Pareto distribution;Pareto(1,2)
: Pareto distribution;LogNormal(0,1^2)
: Log-normal distribution;LogNormal(0,2^2)
: Log-normal distribution;LogNormal(0,3^2)
: Log-normal distribution;Weibull(1,0.5)
: Weibull distribution;Weibull(1,0.3)
: Weibull distribution;Frechet(0,1,1)
: Frechet distribution;Frechet(0,1,3)
: Frechet distribution;
Simulation Results
Conclusion
The trimmed modification of the Harrell-Davis quantile estimator based on the highest density interval of size