Standard trimmed Harrell-Davis median estimator


In one of the previous posts, I suggested a new measure of dispersion called the standard quantile absolute deviation around the median (\(\operatorname{SQAD}\)) which can be used as an alternative to the median absolute deviation (\(\operatorname{MAD}\)) as a consistent estimator for the standard deviation under normality. The Gaussian efficiency of \(\operatorname{SQAD}\) is \(54\%\) (comparing to \(37\%\) for MAD), and its breakdown point is \(32\%\) (comparing to \(50\%\) for MAD). \(\operatorname{SQAD}\) is a symmetric dispersion measure around the median: the interval \([\operatorname{Median} - \operatorname{SQAD}; \operatorname{Median} + \operatorname{SQAD}]\) covers \(68\%\) of the distribution. In the case of the normal distribution, this corresponds to the interval \([\mu - \sigma; \mu + \sigma]\).

If we use \(\operatorname{SQAD}\), we accept the breakdown point of \(32\%\). This makes the sample median a non-optimal choice for the median estimator. Indeed, the sample median has high robustness (the breakdown point is \(50\%\)), but relatively poor Gaussian efficiency. If we use \(\operatorname{SQAD}\), it doesn’t make sense to require a breakdown point of more than \(32\%\). Therefore, we could trade the median robustness for efficiency and come up with a complementary measure of the median for \(\operatorname{SQAD}\).

In this post, we introduce the standard trimmed Harrell-Davis median estimator which shares the breakdown point with \(\operatorname{SQAD}\) and provides better finite-sample efficiency comparing to the sample median.


Read more


Optimal quantile absolute deviation


We consider the quantile absolute deviation around the median defined as follows:

\[\newcommand{\E}{\mathbb{E}} \newcommand{\PR}{\mathbb{P}} \newcommand{\Q}{\operatorname{Q}} \newcommand{\OQAD}{\operatorname{OQAD}} \newcommand{\QAD}{\operatorname{QAD}} \newcommand{\median}{\operatorname{median}} \newcommand{\Exp}{\operatorname{Exp}} \newcommand{\SD}{\operatorname{SD}} \newcommand{\V}{\mathbb{V}} \QAD(X, p) = K_p \Q(|X - \median(X)|, p), \]

where \(\Q\) is a quantile estimator, and \(K_p\) is a scale constant which we use to make \(\QAD(X, p)\) an asymptotically consistent estimator for the standard deviation under the normal distribution.

In this post, we get the exact values of the \(K_p\) values, derive the corresponding equation for the asymptotic Gaussian efficiency of \(\QAD(X, p)\), and find the point in which \(\QAD(X, p)\) achieves the highest Gaussian efficiency.


Read more


Quantile absolute deviation of the Pareto distribution


In this post, we derive the exact equation for the quantile absolute deviation around the median of the Pareto(1,1) distribution.


Read more


Quantile absolute deviation of the Exponential distribution


In this post, we derive the exact equation for the quantile absolute deviation around the median of the Exponential distribution.


Read more


Quantile absolute deviation of the Uniform distribution


In this post, we derive the exact equation for the quantile absolute deviation around the median of the Uniform distribution.


Read more


Quantile absolute deviation of the Normal distribution


In this post, we derive the exact equation for the quantile absolute deviation around the median of the Normal distribution.


Read more


Standard quantile absolute deviation


The median absolute deviation (MAD) is a popular robust replacement of the standard deviation (StdDev). It’s truly robust: its breakdown point is \(50\%\). However, it’s not so efficient when we use it as a consistent estimator for the standard deviation under normality: the asymptotic relative efficiency against StdDev (we call it the Gaussian efficiency) is only about \(\approx 37\%\).

In practice, such robustness is not always essential, while we typically want to have the highest possible efficiency. I already described the concept of the quantile absolute deviation which aims to provide a customizable trade-off between robustness and efficiency. In this post, I would like to suggest a new default option for this measure of dispersion called the standard quantile absolute deviation. Its Gaussian efficiency is \(\approx 54\%\) while the breakdown point is \(\approx 32\%\)


Read more


Asymptotic Gaussian efficiency of the quantile absolute deviation


I have already discussed the concept of the quantile absolute deviation in several previous posts. In this post, we derive the equation for the relative statistical efficiency of the quantile absolute deviation against the standard deviation under the normal distribution (so call Gaussian efficiency).


Read more


Finite-sample efficiency of the Rousseeuw-Croux estimators


The Rousseeuw-Croux \(S_n\) and \(Q_n\) estimators are robust and efficient measures of scale. Their breakdown points are equal to \(0.5\) which is also the breakdown point of the median absolute deviation (MAD). However, their statistical efficiency values are much better than the efficiency of MAD. To be specific, the MAD asymptotic relative Gaussian efficiency against the standard deviation is about \(37\%\), whereas the corresponding values for \(S_n\) and \(Q_n\) are \(58\%\) and \(82\%\) respectively. Although these numbers are quite impressive, they are only asymptotic values. In practice, we work with finite samples. And the finite-sample efficiency could be much lower than the asymptotic one. In this post, we perform a simulation study in order to obtain the actual finite-sample efficiency values for these two estimators.


Read more


Caveats of using the median absolute deviation


The median absolute deviation is a measure of dispersion which can be used as a robust alternative to the standard deviation. It works great for slight deviations from normality (e.g., for contaminated normal distributions or slightly skewed unimodal distributions). Unfortunately, if we apply it to distributions with huge deviations from normality, we may experience a lot of troubles. In this post, I discuss some of the most important caveats which we should keep in mind if we use the median absolute deviation.


Read more