Customization of the nonparametric Cohen's d-consistent effect size


One year ago, I publish a post called Nonparametric Cohen's d-consistent effect size. During this year, I got a lot of internal and external feedback from my own statistical experiments and people who tried to use the suggested approach. It seems that the nonparametric version of Cohen’s d works much better with real-life not-so-normal data. While the classic Cohen’s d based on the non-robust arithmetic mean and the non-robust standard deviation can be easily corrupted by a single outlier, my approach is much more resistant to unexpected extreme values. Also, it allows exploring the difference between specific quantiles of considered samples, which can be useful in the non-parametric case.

However, I wasn’t satisfied with the results of all of my experiments. While I still like the basic idea (replace the mean with the median; replace the standard deviation with the median absolute deviation), it turned out that the final results heavily depend on the used quantile estimator. To be more specific, the original Harrell-Davis quantile estimator is not always optimal; in most cases, it’s better to replace it with its trimmed modification. However, the particular choice of the quantile estimators depends on the situation. Also, the consistency constant for the median absolute deviation should be adjusted according to the current sample size and the used quantile estimator. Of course, it also can be replaced by other dispersion estimators that can be used as consistent estimators of the standard deviation.

In this post, I want to get a brief overview of possible customizations of the suggested metrics.

The generic equations

Let’s say we have two samples $x = \{ x_1, x_2, \ldots, x_{n_x} \}$ and $y = \{ y_1, y_2, \ldots, y_{n_y} \}$. The “classic” Cohen’s d can be defined as follows:

$$ d = \frac{\overline{y}-\overline{x}}{s} $$

where $s$ is the pooled standard deviation:

$$ s = \sqrt{\frac{(n_x - 1) s^2_x + (n_y - 1) s^2_y}{n_x + n_y - 2}}. $$

And here is the quantile-specific effect size suggested in the previous post:

$$ \gamma_p = \frac{Q_p(y) - Q_p(x)}{\operatorname{PMAD}_{xy}} $$

where $Q_p$ is a quantile estimator of the $p^\textrm{th}$ quantile, $\operatorname{PMAD}_{xy}$ is the pooled median absolute deviation:

$$ \operatorname{PMAD}_{xy} = \sqrt{\frac{(n_x - 1) \operatorname{MAD}^2_x + (n_y - 1) \operatorname{MAD}^2_y}{n_x + n_y - 2}}, $$

$\operatorname{MAD}_x$ and $\operatorname{MAD}_y$ are the median absolute deviations of $x$ and $y$:

$$ \operatorname{MAD}_x = C_{n_x} \cdot Q_{0.5}(|x_i - Q_{0.5}(x)|), \quad \operatorname{MAD}_y = C_{n_y} \cdot Q_{0.5}(|y_i - Q_{0.5}(y)|), $$

$C_{n_x}$ and $C_{n_y}$ are consistency constants that makes $\operatorname{MAD}$ a consistent estimator for the standard deviation estimation.

For the normal distribution, the Cohen’s d equals to $\gamma_{0.5}$:

$$ d = \frac{\overline{y}-\overline{x}}{s} \approx \frac{Q_{0.5}(y) - Q_{0.5}(x)}{\mathcal{PMAD}_{xy}} = \gamma_{0.5}. $$

Thus, $\gamma_{0.5}$ can be used as a robust alternative to the original Cohen’s d.

Customization

There are several things that we could customize in the above equations.

Summary

There are three main ways to adopt the nonparametric Cohen’s d-consistent effect size:

  • An easy way
    If you want to get the most simple solution, just use the traditional quantile estimator (if $n$ is odd, the median is the middle element of the sorted sample; if $n$ is even, the median is the arithmetic average of the two middle elements of the sorted sample). The $\operatorname{MAD}$ consistency constant should be taken from the main table of this post.
  • A relatively easy way
    If you want to get a relatively simple but more efficient solution, use the trimmed modifications of the Harrell-Davis quantile estimator and the $\operatorname{MAD}$ consistency constant from the main table of this post.
  • A hard way
    If you want to get the most efficient solution, you should spend some time on research. First of all, you should explore all available options (you can find some by following the below links). Next, you should think about the properties of your data sets (what kind of distribution you have, and what are your typical sample sizes). Finally, you should try different approaches with your data and check which one provides the most reliable results.

Further reading


References (17)

  1. A single outlier could completely distort your Cohen's d value (2021-01-26) 2 1 Mathematics Statistics Research
  2. Comparing distribution quantiles using gamma effect size (2021-02-02) 2 1 Mathematics Statistics Research
  3. Improving the efficiency of the Harrell-Davis quantile estimator for special cases using custom winsorizing and trimming strategies (2021-05-25) 2 3 Mathematics Statistics Research
  4. Comparing the efficiency of the Harrell-Davis, Sfakianakis-Verginis, and Navruz-Özdemir quantile estimators (2021-05-18) 7 1 Mathematics Statistics Research
  5. Efficiency of the Harrell-Davis quantile estimator (2021-03-23) 2 4 Mathematics Statistics Research
  6. Misleading standard deviation (2021-02-23) 1 2 Mathematics Statistics Research
  7. Navruz-Özdemir quantile estimator (2021-03-16) 3 3 Mathematics Statistics Research
  8. Nonparametric Cohen's d-consistent effect size (2020-06-25) 7 8 Mathematics Statistics Research
  9. Quantile absolute deviation: estimating statistical dispersion around quantiles (2020-12-01) 3 11 Mathematics Statistics Research
  10. Robust alternative to statistical efficiency (2021-06-01) 4 3 Mathematics Statistics Research
  11. Sfakianakis-Verginis quantile estimator (2021-03-09) 5 Mathematics Statistics Research
  12. Shamos Estimator
  13. Trimmed modification of the Harrell-Davis quantile estimator (2021-03-30) 3 11 Mathematics Statistics Research
  14. Unbiased median absolute deviation (2021-02-09) 5 7 Mathematics Statistics Research
  15. Unbiased median absolute deviation based on the Harrell-Davis quantile estimator (2021-02-16) 4 5 Mathematics Statistics Research
  16. Winsorized modification of the Harrell-Davis quantile estimator (2021-03-02) 1 8 Mathematics Statistics Research
  17. Efficiency of the winsorized and trimmed Harrell-Davis quantile estimators (2021-04-06) 4 6 Mathematics Statistics Research
  1. Effect Sizes and Asymmetry (2024-03-12) 5 Mathematics Statistics Research
  2. Multimodal distributions and effect size (2023-07-18) 2 Mathematics Statistics Research
  3. Nonparametric Cohen's d-consistent effect size (2020-06-25) 7 8 Mathematics Statistics Research
  4. Calculating gamma effect size for samples with zero median absolute deviation (2021-06-22) 4 1 Mathematics Statistics Research