## Customization of the nonparametric Cohen's d-consistent effect size

One year ago, I publish a post called Nonparametric Cohen's d-consistent effect size. During this year, I got a lot of internal and external feedback from my own statistical experiments and people who tried to use the suggested approach. It seems that the nonparametric version of Cohen’s d works much better with real-life not-so-normal data. While the classic Cohen’s d based on the non-robust arithmetic mean and the non-robust standard deviation can be easily corrupted by a single outlier, my approach is much more resistant to unexpected extreme values. Also, it allows exploring the difference between specific quantiles of considered samples, which can be useful in the non-parametric case.

However, I wasn’t satisfied with the results of all of my experiments. While I still like the basic idea (replace the mean with the median; replace the standard deviation with the median absolute deviation), it turned out that the final results heavily depend on the used quantile estimator. To be more specific, the original Harrell-Davis quantile estimator is not always optimal; in most cases, it’s better to replace it with its trimmed modification. However, the particular choice of the quantile estimators depends on the situation. Also, the consistency constant for the median absolute deviation should be adjusted according to the current sample size and the used quantile estimator. Of course, it also can be replaced by other dispersion estimators that can be used as consistent estimators of the standard deviation.

In this post, I want to get a brief overview of possible customizations of the suggested metrics.

## Robust alternative to statistical efficiency

Statistical efficiency is a common measure of the quality of an estimator. Typically, it’s expressed via the mean square error ($$\operatorname{MSE}$$). For the given estimator $$T$$ and the true parameter value $$\theta$$, the $$\operatorname{MSE}$$ can be expressed as follows:

$\operatorname{MSE}(T) = \operatorname{E}[(T-\theta)^2]$

In numerical simulations, the $$\operatorname{MSE}$$ can’t be used as a robust metric because its breakdown point is zero (a corruption of a single measurement leads to a corrupted result). Typically, it’s not a problem for light-tailed distributions. Unfortunately, in the heavy-tailed case, the $$\operatorname{MSE}$$ becomes an unreliable and unreproducible metric because it can be easily spoiled by a single outlier.

I suggest an alternative way to compare statistical estimators. Instead of using non-robust $$\operatorname{MSE}$$, we can use robust quantile estimations of the absolute error distribution. In this post, I want to share numerical simulations that show a problem of irreproducible $$\operatorname{MSE}$$ values and how they can be replaced by reproducible quantile values.

## Improving the efficiency of the Harrell-Davis quantile estimator for special cases using custom winsorizing and trimming strategies

Let’s say we want to estimate the median based on a small sample (3 $$\leq n \leq 7$$) from a right-skewed heavy-tailed distribution with high statistical efficiency.

The traditional median estimator is the most robust estimator, but it’s not the most efficient one. Typically, the Harrell-Davis quantile estimator provides better efficiency, but it’s not robust (its breakdown point is zero), so it may have worse efficiency in the given case. The winsorized and trimmed modifications of the Harrell-Davis quantile estimator provide a good trade-off between efficiency and robustness, but they require a proper winsorizing/trimming rule. A reasonable choice of such a rule for medium-size samples is based on the highest density interval of the Beta function (as described here). Unfortunately, this approach may be suboptimal for small samples. E.g., if we use the 99% highest density interval to estimate the median, it starts to trim sample values only for $$n \geq 8$$.

In this post, we are going to discuss custom winsorizing/trimming strategies for special cases of the quantile estimation problem.

## Comparing the efficiency of the Harrell-Davis, Sfakianakis-Verginis, and Navruz-Özdemir quantile estimators

In the previous posts, I discussed the statistical efficiency of different quantile estimators (Efficiency of the Harrell-Davis quantile estimator and Efficiency of the winsorized and trimmed Harrell-Davis quantile estimators).

In this post, I continue this research and compare the efficiency of the Harrell-Davis quantile estimator, the Sfakianakis-Verginis quantile estimators, and the Navruz-Özdemir quantile estimator.

## Dispersion exponential smoothing

In this previous post, I showed how to apply exponential smoothing to quantiles using the weighted Harrell-Davis quantile estimator. This technique allows getting smooth and stable moving median estimations. In this post, I’m going to discuss how to use the same approach to estimate moving dispersion.

## Quantile exponential smoothing

One of the popular problems in time series analysis is estimating the moving “average” value. Let’s define the “average” as a central tendency metric like the mean or the median. When we talk about the moving value, we assume that we are interested in the average value “at the end” of the time series instead of the average of all available observations.

One of the most straightforward approaches to estimate the moving average is the simple moving mean. Unfortunately, this approach is not robust: outliers can instantly spoil the evaluated mean value. As an alternative, we can consider simple moving median. I already discussed a few of such methods: the MP² quantile estimator and a moving quantile estimator based on partitioning heaps (a modification of the Hardle-Steiger method). When we talk about simple moving averages, we typically assume that we estimate the average value over the last $$k$$ observations ($$k$$ is the window size). This approach is also known as unweighted moving averages because all target observations have the same weight.

As an alternative to the simple moving average, we can also consider the weighted moving average. In this case, we assign a weight for each observation and aggregate the whole time series according to these weights. A famous example of such a weight function is exponential smoothing. And the simplest form of exponential smoothing is the exponentially weighted moving mean. This approach estimates the weighted moving mean using exponentially decreasing weights. Switching from the simple moving mean to the exponentially weighted moving mean provides some benefits in terms of smoothness and estimation efficiency.

Although exponential smoothing has advantages over the simple moving mean, it still estimates the mean value which is not robust. We can improve the robustness of this approach if we reuse the same idea for weighted moving quantiles. It’s possible because the quantiles also can be estimated for weighted samples. In one of my previous posts, I showed how to adapt the Hyndman-Fan Type 7 and Harrell-Davis quantile estimators to the weighted samples. In this post, I’m going to show how we can use this technique to estimate the weighted moving quantiles using exponentially decreasing weights.

## Improving quantile-respectful density estimation for discrete distributions using jittering

In my previous posts, I already discussed the problem that arise when we try to build the kernel density estimation (KDE) for samples with ties. We may get such samples in real life from discrete or mixed discrete/continuous distributions. Even if the original distribution is continuous, we may observe artificial sample discretization due to a limited resolution of the measuring tool. Such discretization may lead to inaccurate density plots due to undersmoothing. The problem can be resolved using a nice technique called jittering. I also discussed how to apply jittering to get a smoother version of KDE.

However, I’m not a huge fan of KDE because of two reasons. The first one is the problem of choosing a proper bandwidth value. With poorly chosen bandwidth, we can easily get oversmoothing or undersmoothing even without the discretization problem. The second one is an inconsistency between the KDE-based probability density function and evaluated sample quantiles. It could lead to inconsistent visualizations (e.g., KDE-based violin plots with non-KDE-based quantile values) or it could introduce problems for algorithms that require density function and quantile values at the same time. The inconsistency could be resolved using quantile-respectful density estimation (QRDE). This kind of estimation builds the density function which matches the evaluated sample quantiles. To get a smooth QRDE, we also need a smooth quantile estimator like the Harrell-Davis quantile estimator. The robustness and componential efficiency of this approach can be improved using the winsorized and trimmed modifications of the Harrell-Davis quantile estimator (which also have a decent statistical efficiency level).

Unfortunately, the straightforward QRDE calculation is not always applicable for samples with ties because it’s impossible to build an “honest” density function for discrete distributions without using the Dirac delta function. This is a severe problem for QRDE-based algorithms like the lowland multimodality detection algorithm. In this post, I will show how jittering could help to solve this problem and get a smooth QRDE on samples with ties.

## How to build a smooth density estimation for a discrete sample using jittering

Let’s say you have a sample with tied values. If you draw a kernel density estimation (KDE) for such a sample, you may get a serrated pattern like this:

KDE requires samples from continuous distributions while tied values arise in discrete or mixture distributions. Even if the original distribution is continuous, you may observe artificial sample discretization due to a limited resolution of the measuring tool. This effect may lead to distorted density plots like in the above picture.

The problem could be solved using a nice technique called jittering. In the simplest case, jittering just adds random noise to each measurement. Such a trick removes all ties from the sample and allows building a smooth density estimation.

However, there are many different ways to apply jittering. The trickiest question here is how to choose proper noise values. In this post, I want to share one of my favorite jittering approaches. It generates a non-randomized noise pattern with a low risk of noticeable sample corruption.

## Kernel density estimation and discrete values

Kernel density estimation (KDE) is a popular technique of data visualization. Based on the given sample, it allows estimating the probability density function (PDF) of the underlying distribution. Here is an example of KDE for x = {3.82, 4.61, 4.89, 4.91, 5.31, 5.6, 5.66, 7.00, 7.00, 7.00} (normal kernel, Sheather & Jones bandwidth selector):

KDE is a simple and straightforward way to build a PDF, but it’s not always the best one. In addition to my concerns about bandwidth selection, continuous use of KDE creates an illusion that all distributions are smooth and continuous. In practice, it’s not always true.

In the above picture, the distribution looks pretty continuous. However, the picture hides the fact that we have three 7.00 elements in the original sample. With continuous distributions, the probability of getting tied observations (that have the same value) is almost zero. If a sample contains ties, we are most likely working with either a discrete distribution or a mixture of discrete and continuous distributions. A KDE for such a sample may significantly differ from the actual PDF. Thus, this technique may mislead us instead of providing insights about the true underlying distribution.

In this post, we discuss the usage of PDF and PMF with continuous and discrete distributions. Also, we look at examples of corrupted density estimation plots for distributions with discrete features.