## Optimal window of the trimmed Harrell-Davis quantile estimator, Part 2: Trying Planck-taper window

In the previous post, I discussed the problem of non-smooth quantile-respectful density estimation (QRDE) which is generated by the trimmed Harrell-Davis quantile estimator based on the highest density interval of the given width. I assumed that non-smoothness was caused by a non-smooth rectangular window which was used to build the truncated beta distribution. In this post, we are going to try another option: the Planck-taper window.

## Optimal window of the trimmed Harrell-Davis quantile estimator, Part 1: Problems with the rectangular window

In the previous post, we have obtained a nice version of the trimmed Harrell-Davis quantile estimator which provides an opportunity to get a nice trade-off between robustness and statistical efficiency of quantile estimations. Unfortunately, it has a severe drawback. If we build a quantile-respectful density estimation based on the suggested estimator, we won’t get a smooth density function as in the case of the classic Harrell-Davis quantile estimator:

In this blog post series, we are going to find a way to improve the trimmed Harrell-Davis quantile estimator so that it gives a smooth density function and keeps its advantages in terms of robustness and statistical efficiency.

## Beta distribution highest density interval of the given width

In one of the previous posts, I discussed the idea of the trimmed Harrell-Davis quantile estimator based on the highest density interval of the given width. Since the Harrell-Davis quantile estimator uses the Beta distribution, we should be able to find the beta distribution highest density interval of the given width. In this post, I will show how to do this.

## Quantile estimators based on k order statistics, Part 8: Winsorized Harrell-Davis quantile estimator

In the previous post, we have discussed the trimmed modification of the Harrell-Davis quantile estimator based on the highest density interval of size $$\sqrt{n}/n$$. This quantile estimator showed a decent level of statistical efficiency. However, the research wouldn’t be complete without comparison with the winsorized modification. Let’s fix it!

## Quantile estimators based on k order statistics, Part 7: Optimal threshold for the trimmed Harrell-Davis quantile estimator

In the previous post, we have obtained a nice quantile estimator. To be specific, we considered a trimmed modification of the Harrell-Davis quantile estimator based on the highest density interval of the given size. The interval size is a parameter that controls the trade-off between statistical efficiency and robustness. While it’s nice to have the ability to control this trade-off, there is also a need for the default value, which could be used as a starting point when we have neither estimator breakdown point requirements nor prior knowledge about distribution properties.

After a series of unsuccessful attempts, it seems that I have found an acceptable solution. We should build the new estimator based on $$\sqrt{n}/n$$ order statistics. In this post, I’m going to briefly explain the idea behind the suggested estimator and share some numerical simulations that compare the proposed estimator and the classic Harrell-Davis quantile estimator.

## Quantile estimators based on k order statistics, Part 6: Continuous trimmed Harrell-Davis quantile estimator

In my previous post, I tried the idea of using the trimmed modification of the Harrell-Davis quantile estimator based on the highest density interval of the given width. The width was defined so that it covers exactly k order statistics (the width equals $$(k-1)/n$$). I was pretty satisfied with the result and decided to continue evolving this approach. While “k order statistics” is a good mental model that described the trimmed interval, it doesn’t actually require an integer k. In fact, we can use any real number as the trimming percentage.

In this post, we are going to perform numerical simulations that check the statistical efficiency of the trimmed Harrell-Davis quantile estimator with different trimming percentages.

## Quantile estimators based on k order statistics, Part 5: Improving trimmed Harrell-Davis quantile estimator

During the last several months, I have been experimenting with different variations of the trimmed Harrell-Davis quantile estimator. My original idea of using the highest density interval based on the fixed area percentage (e.g., HDI 95% or HDI 99%) led to a set of problems with overtrimming. I tried to solve them with manually customized trimming strategy, but this approach turned out to be too inconvenient; it was too hard to come up with optimal thresholds. One of the main problems was about the suboptimal number of elements that we actually aggregate to obtain the quantile estimation. So, I decided to try an approach that involves exactly k order statistics. The idea was so promising, but numerical simulations haven’t shown the appropriate efficiency level.

This bothered me the whole week. It sounded so reasonable to trim the Harrell-Davis quantile estimator using exactly k order statistics. Why didn’t this work as expected? Finally, I have found a fatal flaw in my previous approach: while it was a good idea to fix the size of the trimming window, I mistakenly chose its location following the equation from the Hyndman-Fan Type 7 quantile estimator!

In this post, we fix this problem and try another modification of the trimmed Harrell-Davis quantile estimator based on k order statistics and highest density intervals at the same time.

## Quantile estimators based on k order statistics, Part 4: Adopting trimmed Harrell-Davis quantile estimator

In the previous posts, I discussed various aspects of quantile estimators based on k order statistics. I already tried a few weight functions that aggregate the sample values to the quantile estimators (see posts about an extension of the Hyndman-Fan Type 7 equation and about adjusted regularized incomplete beta function). In this post, I continue my experiments and try to adopt the trimmed modifications of the Harrell-Davis quantile estimator to this approach.

## Quantile estimators based on k order statistics, Part 3: Playing with the Beta function

In the previous two posts, I discussed the idea of quantile estimators based on k order statistics. A already covered the motivation behind this idea and the statistical efficiency of such estimators using the extended Hyndman-Fan equations as a weight function. Now it’s time to experiment with the Beta function as a primary way to aggregate k order statistics into a single quantile estimation!

## Quantile estimators based on k order statistics, Part 2: Extending Hyndman-Fan equations

In the previous post, I described the idea of using quantile estimators based on k order statistics. Potentially, such estimators could be more robust than estimators based on all samples elements (like Harrell-Davis, Sfakianakis-Verginis, or Navruz-Özdemir) and more statistically efficient than traditional quantile estimators (based on 1 or 2 order statistics). Moreover, we should be able to control this trade-off based on the business requirements (e.g., setting the desired breakdown point).

The only challenging thing here is choosing the weight function that aggregates k order statistics to a single quantile estimation. We are going to try several options, perform Monte-Carlo simulations for each of them, and compare the results. A reasonable starting point is an extension of the traditional quantile estimators. In this post, we are going to extend the Hyndman-Fan Type 7 quantile estimator (nowadays, it’s one of the most popular estimators). It estimates quantiles as a linear interpolation of two subsequent order statistics. We are going to make some modifications, so a new version is going to be based on k order statistics.

Spoiler: this approach doesn’t seem like an optimal one. I’m pretty disappointed with its statistical efficiency on samples from light-tailed distributions. So, what’s the point of writing a blog post about an inefficient approach? Because of the following reasons:

1. I believe it’s crucial to share negative results. Sometimes, knowledge about approaches that don’t work could be more important than knowledge about more effective techniques. Negative results give you a broader view of the problem and protect you from wasting your time on potential promising (but not so useful) ideas.
2. Negative results improve research completeness. When we present an approach, it’s essential to not only show why it solves problems well, but also why it solves problems better than other similar approaches.
3. While I wouldn’t recommend my extension of the Hyndman-Fan Type 7 quantile estimator to the k order statistics case as the default quantile estimator, there are some specific cases where it could be useful. For example, if we estimate the median based on small samples from a symmetric light-tailed distribution, it could outperform not only the original version but also the Harrell-Davis quantile estimator. The “negativity” of the negative results always exists in a specific context. So, there may be cases when negative results for the general case transform to positive results for a particular niche problem.
4. Finally, it’s my personal blog, so I have the freedom to write on any topic I like. My blog posts are not publications to scientific journals (which typically don’t welcome negative results), but rather research notes about conducted experiments. It’s important for me to keep records of all the experiments I perform regardless of the usefulness of the results.

So, let’s briefly look at the results of this not-so-useful approach.