A single outlier could completely distort your Cohen's d value

Cohen’s d is a popular way to estimate the effect size between two samples. It works excellent for perfectly normal distributions. Usually, people think that slight deviations from normality shouldn’t produce a noticeable impact on the result. Unfortunately, it’s not always true. In fact, a single outlier value can completely distort the result even in large samples.

In this post, I will present some illustrations for this problem and will show how to fix it.

Cohen’s d

First of all, let’s recall the definition of Cohen’s d. For two samples \(x = \{ x_1, x_2, \ldots, x_n \}\) and \(y = \{ y_1, y_2, \ldots, y_n \}\), the Cohen’s d is defined as follows ([Cohen1988]):

\[d_{xy} = \frac{\overline{y}-\overline{x}}{s_{xy}} \]

where \(s_{xy}\) is the pooled standard deviation:

\[s_{xy} = \sqrt{\frac{(n_x - 1) s^2_x + (n_y - 1) s^2_y}{n_x + n_y - 2}}. \]

There is a rule of thumb that is widely used to interpret Cohen’s d value:


E.g., if \(d_{xy} < 0.2\), we can say that the difference between \(x\) and \(y\) is small. If \(d_{xy} > 0.8\), the difference is large

The problem

Now we are going to discuss a simple example that demonstrates the effect of a single outlier. Let’s consider the two following small samples:

\[x = \{ -1.4,\; -1,\; -0.2,\; 0,\; 0.2,\; 1,\; 1.4 \} \quad \big( \overline{x} = 0,\; s_x = 1 \big), \]

\[y = \{ -0.4,\; 0,\; 0.8,\; 1,\; 1.2,\; 2,\; 2.4 \} \quad \big( \overline{y} = 0,\; s_y = 1 \big). \]

Thus, the Cohen’s d equals \(1\):

\[d_{xy} = \frac{\overline{y}-\overline{x}}{s_{xy}} = \frac{1 - 0}{1} = 1. \]

We can see that \(d_{xy}\) describes a large effect (because it’s larger than 0.8 which is the large effect threshold).

Now let’s replace the last element of \(y\) with a high outlier and build a new sample \(z\):

\[z = \{ -0.4,\; 0,\; 0.8,\; 1,\; 1.2,\; 2,\; 100 \} \quad \big( \overline{z} \approx 14.08,\; s_{z} \approx 37.89 \big). \]

Since the mean value has been significantly increased (\(\overline{z} \approx 14.08 \gg \overline{y} = 0\)), we could expect that the Cohen’s d value should be increased as well. However, we observe an opposite situation because of the increased pooled standard deviation:

\[s_{xz} = \sqrt{\frac{(n_x - 1) s^2_x + (n_z - 1) s^2_z}{n_x + n_z - 2}} = \sqrt{\frac{6\cdot 1^2 + 6\cdot 37.89^2}{12}} \approx 26.8. \]

\[d_{xz} = \frac{\overline{z}-\overline{x}}{s_{xz}} \approx \frac{14.08}{26.8} \approx 0.53. \]

As we can see, this outlier spoiled our conclusion. Now, the Cohen’s d equals 0.53 (medium effect) instead of 1.0 (large effect). Technically, the result is correct (because the standard deviation of \(z\) is huge), but it doesn’t properly describe the actual difference between \(x\) and \(y\).

Here you could say that the size of the considered samples is too small, it’s not enough to get a reasonable Cohen’s d value. OK, let’s see what kind of situation we get on larger samples.

Numerical simulations

Let’s conduct the following simulation:

  • Generate random sample \(x = \{x_1, \ldots, x_n \}\) from \(\mathcal{N}(0, 1^2)\).
  • Generate random sample \(y = \{y_1, \ldots, y_n \}\) from \(\mathcal{N}(1, 1^2)\) and replace \(y_n\) with \(y_n = 100\).
  • Calculate the Cohen’s d value between \(x\) and \(y\).
  • Repeat steps previous three steps 1000 times.
  • Build a distribution based on 1000 collected Cohen’s d values.

Below you can see corresponding density plots (KDE, normal kernel, Sheather & Jones) for \(n = 50\), \(n = 500\), and \(n = 1000\).

In these simulation, we got the following results:

  • \(n=50\): all Cohen’s d values are inside \([0.23; 0.37]\)
  • \(n=500\): all Cohen’s d values are inside \([0.30; 0.43]\)
  • \(n=1000\): all Cohen’s d values are inside \([0.39; 0.51]\)

As you can see, instead of the expected large effect (\(d = 1\)), we constantly get small or medium effect (\(d < 0.52\)). Even when \(n = 1000\), a single extreme number could completely distort the result.

So, how to solve this problem?

The solution

In one of the previous posts, I described a nonparametric effect size estimator which is consistent with Cohen’s d. Here is a quick definition:

The effect size \(\gamma_p\) can be estimated as follows:

\[\gamma_p = \frac{Q_p(y) - Q_p(x)}{\mathcal{PMAD}_{xy}}, \]

where \(\mathcal{PMAD}_{xy}\) is the pooled median absolute deviation:

\[\mathcal{PMAD}_{xy} = \sqrt{\frac{(n_y - 1) \mathcal{MAD}^2_y + (n_y - 1) \mathcal{MAD}^2_y}{n_x + n_y - 2}}. \]

In this post, we are going to apply it only for the median (we need only \(\gamma_{0.5}\)). In order to improve the accuracy, we use the Harrell-Davis quantile estimator ([Harrell1982]) to estimate the median (\(Q_{0.5}\)) and the median absolute deviation (\(\mathcal{MAD}_x\), \(\mathcal{MAD}_y\)). The consistency constant \(C\) for \(\mathcal{MAD}\) equals \(1.4826\), which makes \(\mathcal{MAD}\) a consistent estimator for the standard deviation estimation.

Thus, in the case of the normal distribution, \(\gamma_p\) could be used as a good approximation of the Cohen’s d. In this case of non-normal distribution, it provides a robust and stable alternative to the Cohen’s d.

Let’s repeat the above simulation with outliers and build corresponding density plots for \(\gamma_{0.5}\):

As you can see, a single outlier couldn’t spoil the \(\gamma_{0.5}\) values. The obtained effect size values are normally distributed around \(1.0\), which is the true effect size value.


The \(\gamma_p\) effect size is a good alternative for Cohen’s d. For the normal distributions, it works similar to Cohen’s d. \(\gamma_p\) based on two robust metrics (the Harrell-Davis powered medians and median absolution deviations) instead of non-robust metrics in the original Cohen’s d equation (the mean and the standard deviation). Thus \(\gamma_p\) is also robust and works well even when we observe deviations from normality. In the case of non-normal distributions, it allows comparing individual quantiles instead of focusing only on the central tendency like the mean or the median.