# The median absolute deviation value of the Gumbel distribution

The Gumbel distribution is not only a useful model in the extreme value theory, but it’s also a nice example of a slightly right-skewed distribution (skewness $$\approx 1.14$$). Here is its density plot:

In some of my statistical experiments, I like to use the Gumbel distribution as a sample generator for hypothesis checking or unit tests. I also prefer the median absolute deviation (MAD) over the standard deviation as a measure of dispersion because it’s more robust in the case of non-parametric distributions. Numerical hypothesis verification often requires the exact value of the median absolute deviation of the original distribution. I didn’t find this value in the reference tables, so I decided to do another exercise and derive it myself. In this post, you will find a short derivation and the result (spoiler: the exact value is 0.767049251325708 * β). The general approach of the MAD derivation is common for most distributions, so it can be easily reused.

### General approach

The median absolute deviation of a sample has a simple definition:

$\mathcal{MAD}_0 = \textrm{Median}(|x_i - \textrm{Median}(x_i)|).$

The exact formula for the distribution looks similar:

$\mathcal{MAD}_0 = \textrm{Median}(|X - \textrm{Median}(X)|).$

It’s convenient to use a scaled version of MAD in many practical cases to make it a consistent estimator for the standard deviation estimation. It’s often denoted as $$\mathcal{MAD}$$ without any clarification. To avoid misinterpretation and highlight that we are working with a non-scaled version of MAD, we denote it as $$\mathcal{MAD}_0$$ instead of just $$\mathcal{MAD}$$.

We also denote the median distribution value as $$M = \textrm{Median}(X)$$. Thus, the $$\mathcal{MAD}_0$$ definition can be rewritten as

$\mathcal{MAD}_0 = \textrm{Median}(|X - M|).$

Since $$\mathcal{MAD}_0$$ is the median of $$|X - M|$$, we can conclude that the range $$[M - \mathcal{MAD}_0; M + \mathcal{MAD}_0]$$ contains $$50\%$$ of the distribution:

This can be expressed using the distribution CDF (let’s call it $$F$$):

$$$\label{eq:main}\tag{1} F(M + \mathcal{MAD}_0) - F(M - \mathcal{MAD}_0) = 0.5.$$$

Equation ($$\ref{eq:main}$$) is an important property of the median absolute deviation, which we will use to calculate it’s exact value.

## Gumbel distribution

The Gumbel distribution is parametrized by $$\mu$$ (location) and $$\beta$$ (scale). For these parameters, the median value and the CDF are well-known (see [HoSM], 1.3.6.6.16):

$M = \mu - \beta \cdot \ln(\ln(2)), \quad F(x) = e^{-e^{-(x-\mu)/\beta}}.$

The $$\mathcal{MAD}_0$$ value is directly proportional to the scale parameter $$\beta$$, so let’s introduce an auxiliary variable $$p$$ for the “descaled version” of $$\mathcal{MAD}_0$$:

$p = \mathcal{MAD}_0 / \beta.$

Next, let’s express $$F(M + \mathcal{MAD}_0)$$ via $$p$$:

$\begin{split} F(M + \mathcal{MAD}_0) & = F(\mu - \beta \cdot \ln(\ln(2)) + p \beta) = \\ & = e^{-e^{-(\mu - \beta \cdot \ln(\ln(2)) + p \beta - \mu)/\beta}} = \\ & = e^{-e^{\ln(\ln(2)) - p}} = e^{-\ln(2) e^{-p}} = 0.5^{e^{-p}}, \end{split}$

In the same way, we can show that

$F(M - \mathcal{MAD}_0) = 0.5^{e^{p}}.$

Now, equation ($$\ref{eq:main}$$) can be transformed to

$0.5^{e^{-p}} - 0.5^{e^{p}} = 0.5.$

This equation can be solved numerically. Here is a short R script which calculates the result:

library(rootSolve)
options(digits = 15)
f <- function(p) 0.5^exp(-p) - 0.5^exp(p) - 0.5
uniroot.all(f, c(-10, 10), tol = 1e-15)


The numerical solution is $$p = 0.767049251325708$$. Since $$p = \mathcal{MAD}_0 / \beta$$, the exact solution looks as follows:

$\mathcal{MAD}_0 = 0.767049251325708 \beta.$

Now we can use it in experiments (copy-pastable version: 0.767049251325708 * β).

### Numerical simulation

Let’s double-check that our calculations are correct. Below you can see an R script that generates 1000 random samples from the Gumbel distribution with $$\mu = 0,\; \beta = 1$$ (1000 elements in each sample). Next, it calculates the MAD estimation for each sample using the Harrell-Davis quantile estimator for the median estimations (see [Harrell1982]). After that, it calculates the median of all MAD estimations and prints the result.

library(evd)
library(Hmisc)

# Setting seed for rgumbel to achieve deterministic result
set.seed(38)
# Harrell-Davis-powered median
hdmedian <- function(x) as.numeric(hdquantile(x, 0.5))
# MAD value which is based on the Harrell-Davis-powered median
hdmad <- function(x) hdmedian(abs(x - hdmedian(x)))
# 1000 trails of MAD estimating for a sample from the Gubmel distribution

This script’s output is 0.7670284 which is close enough to the exact value 0.767049251325708.
The Gumbel distribution can be used as a good model of a slightly right-skewed distribution. If we are playing with an algorithm that involves MAD estimations (e.g., MAD-based outlier detector), we can check the precision of our calculations using the exact MAD value which is $$\mathcal{MAD}_0 = 0.767049251325708 \beta$$.
The general approach can be reused for any other distributions with known median and CDF. You can get the exact $$\mathcal{MAD}_0$$ value by solving the equation $$F(M + \mathcal{MAD}_0) - F(M - \mathcal{MAD}_0) = 0.5$$ which is correct for the distribution of any form.