Finite-Sample Gaussian Efficiency: Shamos vs. Rousseeuw-Croux Qn Scale Estimators

Previously, we compared the finite-sample Gaussian efficiency of the Rousseeuw-Croux scale estimators and the QAD estimator. In this post, we compare the finite-sample Gaussian efficiency of the Shamos scale estimator and the Rousseeuw-Croux $Q_n$ scale estimator. This is a particularly interesting comparison. In the famous “Alternatives to the Median Absolute Deviation” (1993) paper by Peter J. Rousseeuw and Christophe Croux, the authors presented $Q_n$ as an improved version of the Shamos estimator. Both estimators are based on the set of pairwise absolute differences between the elements of the sample. The Shamos estimator takes the median of this set and, therefore, has the asymptotic breakdown point of $\approx 29\%$ and the asymptotic Gaussian efficiency of $\approx 86\%$. $Q_n$ takes the first quartile of this set and, therefore, has the asymptotic breakdown point of $\approx 50\%$ (like the median) and the asymptotic Gaussian efficiency of $\approx 82\%$. It sounds like a good deal: we trade $4\%$ of the asymptotic Gaussian efficiency for $21\%$ of the asymptotic breakdown point. What could possibly stop us from using $Q_n$ everywhere instead of the Shamos estimator?

Well, here is a trick. The breakdown point of $29\%$ is actually a practically reasonable value. If more than $29\%$ of the sample are outliers, we should probably consider them not as outliers but as a separate mode. Such a situation should be handled by a multimodality detector and lead us to a different approach. The usage of dispersion estimators in the case of multimodal distributions is potentially misleading. When such a multimodality diagnostic scheme is used, there is no practical need for a higher breakdown point.

Thus, the breakdown point of $50\%$ is not so impressive property of $Q_n$. Meanwhile, the drop in Gaussian efficiency is not so enjoyable. $4\%$ may sound like a negligible difference, but it is only the asymptotic value. In real life, we typically tend to work with finite samples. Let us explore the actual finite-sample Gaussian efficiency values of these estimators.

Two-Pass Change Point Detection for Temporary Interval Condensation

When we choose a change point detection algorithm, the most important thing is to clearly understand why we want to detect the change points. The knowledge of the final business goals is essential. In this post, I show a simple example of how a business requirement can be translated into algorithm adjustments.

Inconsistent Violin Plots

The usefulness and meaningfulness of the violin plots are dubious (e.g., see this video and the corresponding discussion). While this type of plot inherits issues of density plots (e.g., the bandwidth selection problem) and box plots, it also introduces new problems. One such problem is data inconsistency: default density plots and box plots are often incompatible with each other. In this post, I show an example of this inconsistency.

Sporadic Noise Problem in Change Point Detection

We consider a problem of change point detection at the end of a time series. Let us say that we systematically monitor readings of an indicator, and we want to react to noticeable changes in the measured values as fast as possible. When there are no changes in the underlying distribution, any alerts about detected change points should be considered false positives. Typically, in such problems, we consider the i.i.d. assumption that claims that in the absence of change points, all the measurements are independent and identically distributed. Such an assumption significantly simplifies the mathematical model, but unfortunately, it is rarely fully satisfied in real life. If we want to build a reliable change point detection system, it is important to be aware of possible real-life artifacts that introduce deviations from the declared model. In this problem, I discuss the problem of the sporadic noise.

Resistance to the Low-Density Regions: The Hodges-Lehmann Location Estimator Based on the Harrell-Davis Quantile Estimator

Previously, I have discussed the topic of the resistance to the low-density regions of various estimators including the Hodges-Lehmann location estimator ($\operatorname{HL}$). In general, $\operatorname{HL}$ is a great estimator with great statistical efficiency and a decent breakdown point. Unfortunately, it has low resistance to the low-density regions around $29^\textrm{th}$ and $71^\textrm{th}$ percentiles, which may cause troubles in the case of multimodal distributions. I am trying to find a modification of $\operatorname{HL}$ that performs almost the same as the original $\operatorname{HL}$, but has increased resistance. One of the ideas I had was using the Harrell-Davis quantile estimator instead of the sample median to evaluate $\operatorname{HL}$. Regrettably, this idea did not turn out to be successful: such an estimator has a resistance function similar to the original $\operatorname{HL}$. I believe that it is important to share negative results, and therefore this post contains a bunch of plots, which illustrate results of relevant numerical simulations.

Median vs. Hodges-Lehmann: Compare Efficiency under Heavy-Tailedness

In the previous post, I shared some thoughts on how to evaluate the statistical efficiency of estimators under heavy-tailed distributions. In this post, I apply described ideas to actually compare efficiency values of the Mean, the Sample Median, and the Hodges-Lehmann location estimator under various distributions.

Statistical efficiency is an essential characteristic, which has to be taken into account when we choose between different estimators. When the underlying distribution is a normal one or at least light-tailed, evaluation of the statistical efficiency typically is not so hard. However, when the underlying distribution is a heavy-tailed one, problems appear. The statistical efficiency is usually expressed via the mean squared error or via variance, which are not robust. Therefore, heavy-tailedness may lead to distorted or even infinite efficiency, which is quite impractical. So, how do we compare the efficiency of estimators under a heavy-tailed distribution? Let’s say we want to compare the efficiency of the mean and the median distribution. Under the normal distribution (so-called Gaussian efficiency), this task is trivial: we build the sampling mean distribution and the sampling median distribution, estimate the variance for each of them, and then get the ratio of these variances. However, if we are interested in the median, we are probably expecting some outliers. Most of the significant real-life outliers come from the heavy-tailed distributions. Therefore, Gaussian efficiency is not the most interesting metric. It makes sense to evaluate the efficiency of the considered estimators under various heavy-tailed distributions. Unfortunately, the variance is not a robust measure and is too sensitive to tails: if the sampling distribution is also not normal or even heavy-tailed, the meaningfulness of the true variance value decreases. It seems reasonable to consider alternative robust measures of dispersion. Which one should we choose? Maybe Median Absolute Deviation (MAD)? Well, the asymptotic Gaussian efficiency of MAD is only ~37%. And here we have the same problem: should we trust the Gaussian efficiency under heavy-tailedness? Therefore, we should first evaluate the efficiency of dispersion estimators. But we can’t do it without a previously chosen dispersion estimator! And could we truly express the actual relative efficiency between two estimators under tricky asymmetric multimodal heavy-tailed distributions using a single number? Read more

Finite-Sample Gaussian Efficiency: Quantile Absolute Deviation vs. Rousseeuw-Croux Scale Estimators

In this post, we discuss the finite-sample Gaussian efficiency of various robust dispersion estimators. The classic standard deviation has the highest possible Gaussian efficiency of $100\%$, but it is not robust: a single outlier can completely destroy the estimation. A typical robust alternative to the standard deviation is the Median Absolute Deviation ($\operatorname{MAD}$). While the $\operatorname{MAD}$ is highly robust (the breakdown point is $50\%$), it is not efficient: its asymptotic Gaussian efficiency is only $37\%$. Common alternative to the $\operatorname{MAD}$ is the Rousseeuw-Croux $S_n$ and $Q_n$ scale estimators that provide higher efficiency, keeping the breakdown point of $50\%$. In one of my recent preprints, I introduced the concept of the Quantile Absolute Deviation ($\operatorname{QAD}$) and its specific cases: the Standard Quantile Absolute Deviation ($\operatorname{SQAD}$) and the Optimal Quantile Absolute Deviation ($\operatorname{OQAD}$). Let us review the finite-sample and asymptotic values of the Gaussian efficiency for these estimators.

Mann-Whitney U Test and Heteroscedasticity

Mann-Whitney U test is a good nonparametric test, which mostly targets changes in locations. However, it doesn’t properly support all types of differences between the two distributions. Specifically, it poorly handles changes in variance. In this post, I briefly discuss its behavior in reaction to scaling a distribution without introducing location changes.