Unbiased median absolute deviation



The median absolute deviation (\(\textrm{MAD}\)) is a robust measure of scale. For distribution \(X\), it can be calculated as follows:

\[\textrm{MAD} = C \cdot \textrm{median}(|X - \textrm{median}(X)|) \]

where \(C\) is a constant scale factor. This metric can be used as a robust alternative to the standard deviation. If we want to use the \(\textrm{MAD}\) as a consistent estimator for the standard deviation under the normal distribution, we should set

\[C = C_{\infty} = \dfrac{1}{\Phi^{-1}(3/4)} \approx 1.4826022185056. \]

where \(\Phi^{-1}\) is the quantile function of the standard normal distribution (or the inverse of the cumulative distribution function). If \(X\) is the normal distribution, we get \(\textrm{MAD} = \sigma\) where \(\sigma\) is the standard deviation.

Not let’s consider a sample \(x = \{ x_1, x_2, \ldots x_n \}\). Let’s denote the median absolute deviation for a sample of size \(n\) as \(\textrm{MAD}_n\). The corresponding equation looks similar to the definition of \(\textrm{MAD}\) for a distribution:

\[\textrm{MAD}_n = C_n \cdot \textrm{median}(|x - \textrm{median}(x)|). \]

Let’s assume that \(\textrm{median}\) is the straightforward definition of the median (if \(n\) is odd, the median is the middle element of the sorted sample, if \(n\) is even, the median is the arithmetic average of the two middle elements of the sorted sample). We still can use \(C_n = C_{\infty}\) for extremely large sample sizes. However, for small \(n\), \(\textrm{MAD}_n\) becomes a biased estimator. If we want to get an unbiased version, we should adjust the value of \(C_n\).

In this post, we look at the possible approaches and learn the way to get the exact value of \(C_n\) that makes \(\textrm{MAD}_n\) unbiased estimator of the median absolute deviation for any \(n\).

The bias

Let’s briefly discuss the impact of the bias on our measurements. To illustrate the problem, we take \(100,000\) samples of size \(n = 5\) from the standard normal distribution and calculate \(\textrm{MAD}_5\) for each of them using \(C = 1\). The obtained numbers form the following distribution:


If we try to use \(\textrm{MAD}_5\) with \(C = 1\) as a standard deviation estimator, it would be a biased estimator. Indeed, the standard deviation equals \(1\) (the true value), but the expected value of \(\textrm{MAD}_5\) is about \(E[\textrm{MAD}_5] \approx 0.5542\). In order to make it unbiased, we should set \(C_5 = 1 / 0.5542 \approx 1.804\). If we repeat the experiment with the modified scale factor, we get a modified version of our distribution:


Now \(E[\textrm{MAD}_5] \approx 1\) which makes \(\textrm{MAD}_5\) unbiased estimator.

Note that \(C_5 = 1.804\) differs from \(C_{\infty} \approx 1.4826\) which is the proper scale factor for \(n \to \infty\). Each sample size needs its own scale factor to make \(\textrm{MAD}_n\) unbiased. Let’s review some papers and look at different approaches to find the optimal scale factor value.

Literature overview

One of the first mentions of the median absolute deviation can be found in [Hampel1974]. In this paper, Frank R Hampel introduced \(\textrm{MAD}\) as a robust measure of scale (attributed to Gauss). I have found four papers that describe unbiased versions: [Croux1992], [Williams2011], [Hayes2014], and [Park2020]. Let’s briefly discuss approaches from these papers.

The Croux-Rousseeuw approach

In [Croux1992], Christophe Croux and Peter J. Rousseeuw described an unbiased version of \(\textrm{MAD}\). They suggested using the following equations:

\[C_n = \dfrac{b_n}{\Phi^{-1}(3/4)}. \]

For \(n \leq 9\), the approximated values of \(b_n\) were defined as follows:

n\(b_n\)
21.196
31.495
41.363
51.206
61.200
71.140
81.129
91.107

For \(n > 9\), they suggested to use the following equation:

\[b_n = \dfrac{n}{n-0.8}. \]

The Williams approach

In [Williams2011], Dennis C. Williams improved this approach. Firstly, he provided updated \(b_n\) values for small \(n\):

n\(b_n\) by Croux\(b_n\) by Williams
21.1961.197
31.4951.490
41.3631.360
51.2061.217
61.2001.189
71.1401.138
81.1291.127
91.1071.101

Secondly, he also introduced a small correction for the general equation:

\[b_n = \dfrac{n}{n-0.801}. \]

Also, he discussed another kind of approximation equation for such kind of bias-correction factors:

\[b_n \cong 1 + cn^{-d}. \]

In his paper, he applied the above equation only for Shorth (is the smallest interval that contains at least half of the data points), but this approach can also be applied for other measures of scale.

The Hayes approach

Next, in [Hayes2014], Kevin Hayes suggested another kind of prediction equation for \(n \geq 9\):

\[C_n = \dfrac{1}{\hat{a}_n} \]

where

\[\hat{a}_n = \Phi^{-1}(3/4) \Bigg( 1 - \dfrac{\alpha}{n} - \dfrac{\beta}{n^2} \Bigg). \]

Here are the suggested constants:

n\(\alpha\)\(\beta\)
odd0.76350.565
even0.76121.123

The Park-Kim-Wang approach

Finally, in [Park2020], Chanseok Park, Haewon Kim, and Min Wang aggregated all of the previous results. They used the following form of the main equation:

\[C_n = \dfrac{1}{\Phi^{-1}(3/4) \cdot (1+A_n)} \]

For \(n > 100\), they suggested to approaches. The first one is based on [Hayes2014] (the same equation for both odd and even \(n\) values):

\[A_n = -\dfrac{0.76213}{n} - \dfrac{0.86413}{n^2} \]

The second one is based on [Williams2011]:

\[A_n = -0.804168866 \cdot n^{-1.008922} \]

Both approaches produce almost identical results, so it doesn’t actually matter which one to use.

For \(2 \leq n \leq 100\), they suggested to use predefined constants: (the below values are calculated based on Table A2 from [Park2020]):

n\(C_n\)n\(C_n\)
1NA511.505611
21.772150521.505172
32.204907531.504575
42.016673541.504417
51.803927551.503713
61.763788561.503604
71.686813571.503095
81.671843581.502864
91.632940591.502253
101.624681601.502085
111.601308611.501611
121.596155621.501460
131.580754631.501019
141.577272641.500841
151.566339651.500331
161.563769661.500343
171.555284671.499877
181.553370681.499772
191.547206691.499291
201.545705701.499216
211.540681711.498922
221.539302721.498838
231.535165731.498491
241.534053741.498399
251.530517751.497917
261.529996761.497901
271.526916771.497489
281.526422781.497544
291.523608791.497248
301.523031801.497185
311.520732811.496797
321.520333821.496779
331.518509831.496428
341.517941841.496501
351.516279851.496295
361.516070861.496089
371.514425871.495794
381.513989881.495796
391.512747891.495557
401.512418901.495420
411.511078911.495270
421.511041921.495141
431.509858931.494944
441.509499941.494958
451.508529951.494706
461.508365961.494665
471.507535971.494379
481.507247981.494331
491.506382991.494113
501.5063071001.494199

Here is the corresponding plot:


Conclusion

Currently, my tool-of-choice is the approach from [Park2020]]. I verified all the predefined constants and equations from the paper using numerical simulations. I can confirm that the suggested approach produces a reliable estimate of the unbiased median absolute deviation \(\textrm{MAD}_n\).

References

  • [Hampel1974]
    Hampel, Frank R. “The influence curve and its role in robust estimation.” Journal of the american statistical association 69, no. 346 (1974): 383-393.
    https://doi.org/10.2307/2285666
  • [Croux1992]
    Croux, Christophe, and Peter J. Rousseeuw. “Time-efficient algorithms for two highly robust estimators of scale.“In Computational statistics, pp. 411-428. Physica, Heidelberg, 1992.
    https://doi.org/10.1007/978-3-662-26811-7_58
  • [Williams2011]
    Williams, Dennis C. “Finite sample correction factors for several simple robust estimators of normal standard deviation.” Journal of Statistical Computation and Simulation 81, no. 11 (2011): 1697-1702.
    https://doi.org/10.1080/00949655.2010.499516
  • [Hayes2014]
    Hayes, Kevin. “Finite-sample bias-correction factors for the median absolute deviation.” Communications in Statistics-Simulation and Computation 43, no. 10 (2014): 2205-2212.
    https://doi.org/10.1080/03610918.2012.748913
  • [Park2020]
    Park, Chanseok, Haewon Kim, and Min Wang. “Investigation of finite-sample properties of robust location and scale estimators.” Communications in Statistics-Simulation and Computation (2020): 1-27.
    https://doi.org/10.1080/03610918.2019.1699114