Andrey Akinshin's blog (Page 14)

Gamma effect size powered by the middle non-zero quantile absolute deviation

February 22, 2022 Mathematics Statistics Research [Research] Gamma Effect Size [Research] Quantile absolute deviation

In previous posts, I covered the concept of the gamma effect size. It’s a nonparametric effect size which is consistent with Cohen’s d under the normal distribution. However, the original definition has drawbacks: this statistic becomes zero if half of the sample elements are equal to each other. Last time, I suggested) a workaround for this problem: we can replace the median absolute deviation by the quantile absolute deviation. Unfortunately, this trick requires parameter tuning: we should choose a proper quantile position to make this approach work. Today I want to suggest a strategy that provides a way to make a generic choice: we can use the middle non-zero quantile absolute deviation.

Middle non-zero quantile absolute deviation

February 15, 2022 Mathematics Statistics Research [Research] Quantile absolute deviation

Median absolute deviation ($\operatorname{MAD}$) around the median is a popular robust measure of statistical dispersion. Unfortunately, if we work with discrete distributions, we could get zero $\operatorname{MAD}$ values. It could bring some problems if we use $\operatorname{MAD}$ as a denominator. Such a problem is also relevant to some other quantile-based measures of dispersion like interquartile range ($\operatorname{IQR}$).

This problem could be solved using the quantile absolute deviation around the median. However, it’s not always clear how to choose the right quantile to estimate. In this post, I’m going to suggest a choosing approach that is consistent with the classic $\operatorname{MAD}$ under continuous distributions (and samples without tied values).

Unbiased median absolute deviation based on the trimmed Harrell-Davis quantile estimator

February 8, 2022 Mathematics Statistics Research [Research] Unbiased median absolute deviation

The median absolute deviation ($\operatorname{MAD}$) is a robust measure of scale. For a sample $x = \{ x_1, x_2, \ldots, x_n \}$, it’s defined as follows:

$$ \operatorname{MAD}_n = C_n \cdot \operatorname{median}(|x - \operatorname{median}(x)|) $$

where $\operatorname{median}$ is a median estimator, $C_n$ is a scale factor. Using the right scale factor, we can use $\operatorname{MAD}$ as a consistent estimator for the estimation of the standard deviation under the normal distribution. For huge samples, we can use the asymptotic value of $C_n$ which is

$$ C_\infty = \dfrac{1}{\Phi^{-1}(3/4)} \approx 1.4826022185056. $$

For small samples, we should use adjusted values $C_n$ which depend on the sample size. However, $C_n$ depends not only on the sample size but also on the median estimator. I have already covered how to obtain this values for the traditional median estimator and the Harrell-Davis median estimator. It’s time to get the $C_n$ values for the trimmed Harrell-Davis median estimator.

Median absolute deviation vs. Shamos estimator

February 1, 2022 Mathematics Statistics Research Shamos Estimator

There are multiple ways to estimate statistical dispersion. The standard deviation is the most popular one, but it’s not robust: a single outlier could heavily corrupt the results. Fortunately, we have robust measures of dispersions like the median absolute deviation and the Shamos estimator. In this post, we perform numerical simulations and compare these two estimators on different distributions and sample sizes.

Moving extended P² quantile estimator

January 25, 2022 Mathematics Statistics Research P² Quantile Estimator

In the previous posts, I discussed the P² quantile estimator (a sequential estimator which takes $O(1)$ memory and estimates a single predefined quantile), the moving P² quantile estimator (a moving modification of P² which estimates quantiles within the moving window), and the extended P² quantile estimator (a sequential estimator which takes $O(m)$ memory and estimates $m$ predefined quantiles).

Now it’s time to build the moving modification of the extended P² quantile estimator which estimates $m$ predefined quantiles using $O(m)$ memory within the moving window.

Extended P² quantile estimator

January 18, 2022 Mathematics Statistics Research P² Quantile Estimator

I already covered the P² quantile estimator and its possible implementation improvements in several blog posts. This sequential estimator uses $O(1)$ memory and allows estimating a single predefined quantile. Now it’s time to discuss the extended P² quantile estimator that allows estimating multiple predefined quantiles. This extended version was suggested in the paper “Simultaneous estimation of several percentiles”. In this post, we briefly discuss the approach from this paper and how we can improve its implementation.

P² quantile estimator marker adjusting order

January 11, 2022 Mathematics Statistics Research P² Quantile Estimator

I have already written a few blog posts about the P² quantile estimator (which is a sequential estimator that uses $O(1)$ memory):

In this post, we continue improving the P² implementation so that it gives better estimations for streams with a small number of elements.

P² quantile estimator initialization strategy

January 4, 2022 Mathematics Statistics Research Quantile Estimators P² Quantile Estimator

Update: the estimator accuracy could be improved using a bunch of patches.

The P² quantile estimator is a sequential estimator that uses $O(1)$ memory. Thus, for the given sequence of numbers, it allows estimating quantiles without storing values. I have already written a few blog posts about it:

I tried this estimator in various contexts, and it shows pretty decent results. However, recently I stumbled on a corner case: if we want to estimate extreme quantile ($p < 0.1$ or $p > 0.9$), this estimator provides inaccurate results on small number streams ($n < 10$). While it looks like a minor issue, it would be nice to fix it. In this post, we briefly discuss choosing a better initialization strategy to workaround this problem.