## The median absolute deviation value of the Gumbel distribution

The Gumbel distribution is not only a useful model in the extreme value theory, but it’s also a nice example of a slightly right-skewed distribution (skewness \(\approx 1.14\)). Here is its density plot:

In some of my statistical experiments, I like to use the Gumbel distribution as a sample generator for hypothesis checking or unit tests.
I also prefer the median absolute deviation (MAD) over the standard deviation as a measure of dispersion because it’s more robust in the case of non-parametric distributions.
Numerical hypothesis verification often requires the exact value of the median absolute deviation of the original distribution.
I didn’t find this value in the reference tables, so I decided to do another exercise and derive it myself.
In this post, you will find a short derivation and the result (spoiler: the exact value is `0.767049251325708 * β`

).
The general approach of the MAD derivation is common for most distributions, so it can be easily reused.

### General approach

The median absolute deviation of a sample has a simple definition:

\[\mathcal{MAD}_0 = \textrm{Median}(|x_i - \textrm{Median}(x_i)|). \]

The exact formula for the distribution looks similar:

\[\mathcal{MAD}_0 = \textrm{Median}(|X - \textrm{Median}(X)|). \]

It’s convenient to use a scaled version of MAD in many practical cases to make it a consistent estimator for the standard deviation estimation. It’s often denoted as \(\mathcal{MAD}\) without any clarification. To avoid misinterpretation and highlight that we are working with a non-scaled version of MAD, we denote it as \(\mathcal{MAD}_0\) instead of just \(\mathcal{MAD}\).

We also denote the median distribution value as \(M = \textrm{Median}(X)\). Thus, the \(\mathcal{MAD}_0\) definition can be rewritten as

\[\mathcal{MAD}_0 = \textrm{Median}(|X - M|). \]

Since \(\mathcal{MAD}_0\) is the median of \(|X - M|\), we can conclude that the range \([M - \mathcal{MAD}_0; M + \mathcal{MAD}_0]\) contains \(50\%\) of the distribution:

This can be expressed using the distribution CDF (let’s call it \(F\)):

\[\begin{equation} \label{eq:main}\tag{1} F(M + \mathcal{MAD}_0) - F(M - \mathcal{MAD}_0) = 0.5. \end{equation} \]

Equation (\(\ref{eq:main}\)) is an important property of the median absolute deviation, which we will use to calculate it’s exact value.

## Gumbel distribution

The Gumbel distribution is parametrized by \(\mu\) (location) and \(\beta\) (scale). For these parameters, the median value and the CDF are well-known (see [HoSM], 1.3.6.6.16):

\[M = \mu - \beta \cdot \ln(\ln(2)), \quad F(x) = e^{-e^{-(x-\mu)/\beta}}. \]

The \(\mathcal{MAD}_0\) value is directly proportional to the scale parameter \(\beta\), so let’s introduce an auxiliary variable \(p\) for the “descaled version” of \(\mathcal{MAD}_0\):

\[p = \mathcal{MAD}_0 / \beta. \]

Next, let’s express \(F(M + \mathcal{MAD}_0)\) via \(p\):

\[\begin{split} F(M + \mathcal{MAD}_0) & = F(\mu - \beta \cdot \ln(\ln(2)) + p \beta) = \\ & = e^{-e^{-(\mu - \beta \cdot \ln(\ln(2)) + p \beta - \mu)/\beta}} = \\ & = e^{-e^{\ln(\ln(2)) - p}} = e^{-\ln(2) e^{-p}} = 0.5^{e^{-p}}, \end{split} \]

In the same way, we can show that

\[F(M - \mathcal{MAD}_0) = 0.5^{e^{p}}. \]

Now, equation (\(\ref{eq:main}\)) can be transformed to

\[0.5^{e^{-p}} - 0.5^{e^{p}} = 0.5. \]

This equation can be solved numerically. Here is a short R script which calculates the result:

```
library(rootSolve)
options(digits = 15)
f <- function(p) 0.5^exp(-p) - 0.5^exp(p) - 0.5
uniroot.all(f, c(-10, 10), tol = 1e-15)
```

The numerical solution is \(p = 0.767049251325708 \). Since \(p = \mathcal{MAD}_0 / \beta\), the exact solution looks as follows:

\[\mathcal{MAD}_0 = 0.767049251325708 \beta. \]

Now we can use it in experiments (copy-pastable version: `0.767049251325708 * β`

).

### Numerical simulation

Let’s double-check that our calculations are correct. Below you can see an R script that generates 1000 random samples from the Gumbel distribution with \(\mu = 0,\; \beta = 1\) (1000 elements in each sample). Next, it calculates the MAD estimation for each sample using the Harrell-Davis quantile estimator for the median estimations (see [Harrell1982]). After that, it calculates the median of all MAD estimations and prints the result.

```
library(evd)
library(Hmisc)
# Setting seed for rgumbel to achieve deterministic result
set.seed(38)
# Harrell-Davis-powered median
hdmedian <- function(x) as.numeric(hdquantile(x, 0.5))
# MAD value which is based on the Harrell-Davis-powered median
hdmad <- function(x) hdmedian(abs(x - hdmedian(x)))
# 1000 trails of MAD estimating for a sample from the Gubmel distribution
mads <- sapply(1:1000, function(i) hdmad(rgumbel(1000)))
# Print median of all trials
hdmedian(mads)
```

This script’s output is `0.7670284`

which is close enough to the exact value `0.767049251325708`

.

### Conclusion

The Gumbel distribution can be used as a good model of a slightly right-skewed distribution. If we are playing with an algorithm that involves MAD estimations (e.g., MAD-based outlier detector), we can check the precision of our calculations using the exact MAD value which is \(\mathcal{MAD}_0 = 0.767049251325708 \beta\).

The general approach can be reused for any other distributions with known median and CDF. You can get the exact \(\mathcal{MAD}_0\) value by solving the equation \(F(M + \mathcal{MAD}_0) - F(M - \mathcal{MAD}_0) = 0.5\) which is correct for the distribution of any form.

### References

**[HoSM]**

NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/, October 6, 2020. https://doi.org/10.18434/M32189**[Harrell1982]**

Harrell, F.E. and Davis, C.E., 1982. A new distribution-free quantile estimator.*Biometrika*, 69(3), pp.635-640.

https://pdfs.semanticscholar.org/1a48/9bb74293753023c5bb6bff8e41e8fe68060f.pdf