A single outlier could completely distort your Cohen's d value

by Andrey Akinshin · 2021-01-26

Cohen’s d is a popular way to estimate the effect size between two samples. It works excellent for perfectly normal distributions. Usually, people think that slight deviations from normality shouldn’t produce a noticeable impact on the result. Unfortunately, it’s not always true. In fact, a single outlier value can completely distort the result even in large samples.

In this post, I will present some illustrations for this problem and will show how to fix it.

Cohen’s d

First of all, let’s recall the definition of Cohen’s d. For two samples $x = \{ x_1, x_2, \ldots, x_n \}$ and $y = \{ y_1, y_2, \ldots, y_n \}$, the Cohen’s d is defined as follows ([Cohen1988]):

$$ d_{xy} = \frac{\overline{y}-\overline{x}}{s_{xy}} $$

where $s_{xy}$ is the pooled standard deviation:

$$ s_{xy} = \sqrt{\frac{(n_x - 1) s^2_x + (n_y - 1) s^2_y}{n_x + n_y - 2}}. $$

There is a rule of thumb that is widely used to interpret Cohen’s d value:


E.g., if $d_{xy} < 0.2$, we can say that the difference between $x$ and $y$ is small. If $d_{xy} > 0.8$, the difference is large

The problem

Now we are going to discuss a simple example that demonstrates the effect of a single outlier. Let’s consider the two following small samples:

$$ x = \{ -1.4,\; -1,\; -0.2,\; 0,\; 0.2,\; 1,\; 1.4 \} \quad \big( \overline{x} = 0,\; s_x = 1 \big), $$ $$ y = \{ -0.4,\; 0,\; 0.8,\; 1,\; 1.2,\; 2,\; 2.4 \} \quad \big( \overline{y} = 0,\; s_y = 1 \big). $$

Thus, the Cohen’s d equals $1$:

$$ d_{xy} = \frac{\overline{y}-\overline{x}}{s_{xy}} = \frac{1 - 0}{1} = 1. $$

We can see that $d_{xy}$ describes a large effect (because it’s larger than 0.8 which is the large effect threshold).

Now let’s replace the last element of $y$ with a high outlier and build a new sample $z$:

$$ z = \{ -0.4,\; 0,\; 0.8,\; 1,\; 1.2,\; 2,\; 100 \} \quad \big( \overline{z} \approx 14.08,\; s_{z} \approx 37.89 \big). $$

Since the mean value has been significantly increased ($\overline{z} \approx 14.08 \gg \overline{y} = 0$), we could expect that the Cohen’s d value should be increased as well. However, we observe an opposite situation because of the increased pooled standard deviation:

$$ s_{xz} = \sqrt{\frac{(n_x - 1) s^2_x + (n_z - 1) s^2_z}{n_x + n_z - 2}} = \sqrt{\frac{6\cdot 1^2 + 6\cdot 37.89^2}{12}} \approx 26.8. $$ $$ d_{xz} = \frac{\overline{z}-\overline{x}}{s_{xz}} \approx \frac{14.08}{26.8} \approx 0.53. $$

As we can see, this outlier spoiled our conclusion. Now, the Cohen’s d equals 0.53 (medium effect) instead of 1.0 (large effect). Technically, the result is correct (because the standard deviation of $z$ is huge), but it doesn’t properly describe the actual difference between $x$ and $y$.

Here you could say that the size of the considered samples is too small, it’s not enough to get a reasonable Cohen’s d value. OK, let’s see what kind of situation we get on larger samples.

Numerical simulations

Let’s conduct the following simulation:

  • Generate random sample $x = \{x_1, \ldots, x_n \}$ from $\mathcal{N}(0, 1^2)$.
  • Generate random sample $y = \{y_1, \ldots, y_n \}$ from $\mathcal{N}(1, 1^2)$ and replace $y_n$ with $y_n = 100$.
  • Calculate the Cohen’s d value between $x$ and $y$.
  • Repeat steps previous three steps 1000 times.
  • Build a distribution based on 1000 collected Cohen’s d values.

Below you can see corresponding density plots (KDE, normal kernel, Sheather & Jones) for $n = 50$, $n = 500$, and $n = 1000$.

In these simulation, we got the following results:

  • $n=50$: all Cohen’s d values are inside $[0.23; 0.37]$
  • $n=500$: all Cohen’s d values are inside $[0.30; 0.43]$
  • $n=1000$: all Cohen’s d values are inside $[0.39; 0.51]$

As you can see, instead of the expected large effect ($d = 1$), we constantly get small or medium effect ($d < 0.52$). Even when $n = 1000$, a single extreme number could completely distort the result.

So, how to solve this problem?

The solution

In one of the previous posts, I described a nonparametric effect size estimator which is consistent with Cohen’s d. Here is a quick definition:

The effect size $\gamma_p$ can be estimated as follows:

$$ \gamma_p = \frac{Q_p(y) - Q_p(x)}{\mathcal{PMAD}_{xy}}, $$

where $\mathcal{PMAD}_{xy}$ is the pooled median absolute deviation:

$$ \mathcal{PMAD}_{xy} = \sqrt{\frac{(n_y - 1) \mathcal{MAD}^2_y + (n_y - 1) \mathcal{MAD}^2_y}{n_x + n_y - 2}}. $$

In this post, we are going to apply it only for the median (we need only $\gamma_{0.5}$). In order to improve the accuracy, we use the Harrell-Davis quantile estimator (harrell1982) to estimate the median ($Q_{0.5}$) and the median absolute deviation ($\mathcal{MAD}_x$, $\mathcal{MAD}_y$). The consistency constant $C$ for $\mathcal{MAD}$ equals $1.4826$, which makes $\mathcal{MAD}$ a consistent estimator for the standard deviation estimation.

Thus, in the case of the normal distribution, $\gamma_p$ could be used as a good approximation of the Cohen’s d. In this case of non-normal distribution, it provides a robust and stable alternative to the Cohen’s d.

Let’s repeat the above simulation with outliers and build corresponding density plots for $\gamma_{0.5}$:

As you can see, a single outlier couldn’t spoil the $\gamma_{0.5}$ values. The obtained effect size values are normally distributed around $1.0$, which is the true effect size value.


The $\gamma_p$ effect size is a good alternative for Cohen’s d. For the normal distributions, it works similar to Cohen’s d. $\gamma_p$ based on two robust metrics (the Harrell-Davis powered medians and median absolution deviations) instead of non-robust metrics in the original Cohen’s d equation (the mean and the standard deviation). Thus $\gamma_p$ is also robust and works well even when we observe deviations from normality. In the case of non-normal distributions, it allows comparing individual quantiles instead of focusing only on the central tendency like the mean or the median.