The median absolute deviation value of the Gumbel distribution


The Gumbel distribution is not only a useful model in the extreme value theory, but it’s also a nice example of a slightly right-skewed distribution (skewness $\approx 1.14$). Here is its density plot:


In some of my statistical experiments, I like to use the Gumbel distribution as a sample generator for hypothesis checking or unit tests. I also prefer the median absolute deviation (MAD) over the standard deviation as a measure of dispersion because it’s more robust in the case of non-parametric distributions. Numerical hypothesis verification often requires the exact value of the median absolute deviation of the original distribution. I didn’t find this value in the reference tables, so I decided to do another exercise and derive it myself. In this post, you will find a short derivation and the result (spoiler: the exact value is 0.767049251325708 * β). The general approach of the MAD derivation is common for most distributions, so it can be easily reused.

General approach

The median absolute deviation of a sample has a simple definition:

$$ \mathcal{MAD}_0 = \textrm{Median}(|x_i - \textrm{Median}(x_i)|). $$

The exact formula for the distribution looks similar:

$$ \mathcal{MAD}_0 = \textrm{Median}(|X - \textrm{Median}(X)|). $$

It’s convenient to use a scaled version of MAD in many practical cases to make it a consistent estimator for the standard deviation estimation. It’s often denoted as $\mathcal{MAD}$ without any clarification. To avoid misinterpretation and highlight that we are working with a non-scaled version of MAD, we denote it as $\mathcal{MAD}_0$ instead of just $\mathcal{MAD}$.

We also denote the median distribution value as $M = \textrm{Median}(X)$. Thus, the $\mathcal{MAD}_0$ definition can be rewritten as

$$ \mathcal{MAD}_0 = \textrm{Median}(|X - M|). $$

Since $\mathcal{MAD}_0$ is the median of $|X - M|$, we can conclude that the range $[M - \mathcal{MAD}_0; M + \mathcal{MAD}_0]$ contains $50\%$ of the distribution:


This can be expressed using the distribution CDF (let’s call it $F$):

$$ \begin{equation} \label{eq:main}\tag{1} F(M + \mathcal{MAD}_0) - F(M - \mathcal{MAD}_0) = 0.5. \end{equation} $$

Equation ($\ref{eq:main}$) is an important property of the median absolute deviation, which we will use to calculate it’s exact value.

Gumbel distribution

The Gumbel distribution is parametrized by $\mu$ (location) and $\beta$ (scale). For these parameters, the median value and the CDF are well-known (see [HoSM], 1.3.6.6.16):

$$ M = \mu - \beta \cdot \ln(\ln(2)), \quad F(x) = e^{-e^{-(x-\mu)/\beta}}. $$

The $\mathcal{MAD}_0$ value is directly proportional to the scale parameter $\beta$, so let’s introduce an auxiliary variable $p$ for the “descaled version” of $\mathcal{MAD}_0$:

$$ p = \mathcal{MAD}_0 / \beta. $$

Next, let’s express $F(M + \mathcal{MAD}_0)$ via $p$:

$$ \begin{split} F(M + \mathcal{MAD}_0) & = F(\mu - \beta \cdot \ln(\ln(2)) + p \beta) = \\ & = e^{-e^{-(\mu - \beta \cdot \ln(\ln(2)) + p \beta - \mu)/\beta}} = \\ & = e^{-e^{\ln(\ln(2)) - p}} = e^{-\ln(2) e^{-p}} = 0.5^{e^{-p}}, \end{split} $$

In the same way, we can show that

$$ F(M - \mathcal{MAD}_0) = 0.5^{e^{p}}. $$

Now, equation ($\ref{eq:main}$) can be transformed to

$$ 0.5^{e^{-p}} - 0.5^{e^{p}} = 0.5. $$

This equation can be solved numerically. Here is a short R script which calculates the result:

library(rootSolve)
options(digits = 15)
f <- function(p) 0.5^exp(-p) - 0.5^exp(p) - 0.5
uniroot.all(f, c(-10, 10), tol = 1e-15)

The numerical solution is $p = 0.767049251325708 $. Since $p = \mathcal{MAD}_0 / \beta$, the exact solution looks as follows:

$$ \mathcal{MAD}_0 = 0.767049251325708 \beta. $$

Now we can use it in experiments (copy-pastable version: 0.767049251325708 * β).

Numerical simulation

Let’s double-check that our calculations are correct. Below you can see an R script that generates 1000 random samples from the Gumbel distribution with $\mu = 0,\; \beta = 1$ (1000 elements in each sample). Next, it calculates the MAD estimation for each sample using the Harrell-Davis quantile estimator for the median estimations (see harrell1982). After that, it calculates the median of all MAD estimations and prints the result.

library(evd)
library(Hmisc)

# Setting seed for rgumbel to achieve deterministic result
set.seed(38)
# Harrell-Davis-powered median
hdmedian <- function(x) as.numeric(hdquantile(x, 0.5))
# MAD value which is based on the Harrell-Davis-powered median
hdmad <- function(x) hdmedian(abs(x - hdmedian(x)))
# 1000 trails of MAD estimating for a sample from the Gubmel distribution
mads <- sapply(1:1000, function(i) hdmad(rgumbel(1000)))
# Print median of all trials
hdmedian(mads)

This script’s output is 0.7670284 which is close enough to the exact value 0.767049251325708.

Conclusion

The Gumbel distribution can be used as a good model of a slightly right-skewed distribution. If we are playing with an algorithm that involves MAD estimations (e.g., MAD-based outlier detector), we can check the precision of our calculations using the exact MAD value which is $\mathcal{MAD}_0 = 0.767049251325708 \beta$.

The general approach can be reused for any other distributions with known median and CDF. You can get the exact $\mathcal{MAD}_0$ value by solving the equation $F(M + \mathcal{MAD}_0) - F(M - \mathcal{MAD}_0) = 0.5$ which is correct for the distribution of any form.

References


References (2)

  1. DoubleMAD outlier detector based on the Harrell-Davis quantile estimator (2020-06-22) 4 2 Mathematics Research Statistics
  2. A New distribution-free Quantile Estimator (1982) by Frank E Harrell et al. 25 Mathematics Statistics Decent
  1. Case study: Accuracy of the MAD estimation using the Harrell-Davis quantile estimator (Gumbel distribution) (2021-01-05) 5 Mathematics Statistics
  2. Quantile absolute deviation: estimating statistical dispersion around quantiles (2020-12-01) 3 11 Mathematics Statistics Research