# Unbiased median absolute deviation based on the Harrell-Davis quantile estimator

The below text contains an intermediate snapshot of the research and is preserved for historical purposes.

The median absolute deviation ($\textrm{MAD}$) is a robust measure of scale. In the previous post, I showed how to use the unbiased version of the $\textrm{MAD}$ estimator as a robust alternative to the standard deviation. “Unbiasedness” means that such estimator’s expected value equals the true value of the standard deviation. Unfortunately, there is such thing as the bias–variance tradeoff: when we remove the bias of the $\textrm{MAD}$ estimator, we increase its variance and mean squared error ($\textrm{MSE}$).

In this post, I want to suggest a more efficient unbiased $\textrm{MAD}$ estimator. It’s also a consistent estimator for the standard deviation, but it has smaller $\textrm{MSE}$. To build this estimator, we should replace the classic “straightforward” median estimator with the Harrell-Davis quantile estimator and adjust bias-correction factors. Let’s discuss this approach in detail.

## Introduction

Let’s consider a sample $x = \{ x_1, x_2, \ldots, x_n \}$. Its median absolute deviation $\textrm{MAD}_n$ can be defined as follows:

$$ \textrm{MAD}_n = C_n \cdot \textrm{median}(|X - \textrm{median}(X)|) $$where $\textrm{median}$ is a median estimator, $C_n$ is a scale factor (or consistency constant).

Typically, $\textrm{median}$ assumes the classic “straightforward” median estimator:

- If $n$ is odd, the median is the middle element of the sorted sample
- If $n$ is even, the median is the arithmetic average of the two middle elements of the sorted sample

However, we can use other median estimators. Let’s consider $\textrm{median}_{\textrm{HD}}$ which calculates the median using the Harrell-Davis quantile estimator (see harrell1982):

$$ \textrm{median}_{\textrm{HD}}(x) = \sum_{i=1}^n W_i x_i, \quad W_i = I_{i/n} \bigg( \frac{n+1}{2}, \frac{n+1}{2} \bigg) - I_{(i-1)/n} \bigg( \frac{n+1}{2}, \frac{n+1}{2} \bigg) $$where $I_t(a, b)$ denotes the regularized incomplete beta function.

When $n \to \infty$, we have the exact value of $C_n$ which makes $\textrm{MAD}_n$ an unbiased estimator for the standard deviation:

$$ C_n = C_\infty = \dfrac{1}{\Phi^{-1}(3/4)} \approx 1.4826022185056 $$For finite values of $n$, we should use adjusted $C_n$ values. We already discussed these values for the straightforward median estimator in the previous post (this problem is well-covered in park2020). For $\textrm{median}_{\textrm{HD}}$, we should use another set of $C_n$ values. We reuse the approach from hayes2014 with the following notation:

$$ C_n = \dfrac{1}{\hat{a}_n} $$## Factors for small n

Factors for the small values of $n$ can be obtained using numerical experiments. Here is the scheme for the performed simulation:

- Enumerate $n$ from $2$ to $100$
- For each $n$, generate $2\cdot 10^8$ samples from the normal distribution of size $n$ (the Box–Muller transform is used, see [Box1958])
- For each sample, estimate $\textrm{MAD}_n$ using $C_n = 1$
- Calculate the arithmetic average of all $\textrm{MAD}_n$ values for the same $n$: the result is the value of $\hat{a}_n$

As the result of this simulation, I got the following table with $\hat{a}_n$ and $C_n$ values for $n \in \{ 2..100 \}$.

$n$ | $\hat{a}_n$ | $C_n$ | $n$ | $\hat{a}_n$ | $C_n$ |
---|---|---|---|---|---|

1 | NA | NA | 51 | 0.66649 | 1.50039 |

2 | 0.56417 | 1.77250 | 52 | 0.66668 | 1.49998 |

3 | 0.63769 | 1.56816 | 53 | 0.66682 | 1.49966 |

4 | 0.62661 | 1.59589 | 54 | 0.66699 | 1.49926 |

5 | 0.63853 | 1.56611 | 55 | 0.66713 | 1.49895 |

6 | 0.63834 | 1.56656 | 56 | 0.66728 | 1.49863 |

7 | 0.63915 | 1.56458 | 57 | 0.66741 | 1.49833 |

8 | 0.64141 | 1.55908 | 58 | 0.66753 | 1.49805 |

9 | 0.64237 | 1.55675 | 59 | 0.66767 | 1.49774 |

10 | 0.64397 | 1.55288 | 60 | 0.66780 | 1.49746 |

11 | 0.64535 | 1.54955 | 61 | 0.66791 | 1.49720 |

12 | 0.64662 | 1.54651 | 62 | 0.66803 | 1.49694 |

13 | 0.64790 | 1.54346 | 63 | 0.66815 | 1.49667 |

14 | 0.64908 | 1.54064 | 64 | 0.66825 | 1.49644 |

15 | 0.65018 | 1.53803 | 65 | 0.66836 | 1.49621 |

16 | 0.65125 | 1.53552 | 66 | 0.66846 | 1.49597 |

17 | 0.65226 | 1.53313 | 67 | 0.66857 | 1.49574 |

18 | 0.65317 | 1.53101 | 68 | 0.66865 | 1.49555 |

19 | 0.65404 | 1.52896 | 69 | 0.66876 | 1.49531 |

20 | 0.65489 | 1.52698 | 70 | 0.66883 | 1.49514 |

21 | 0.65565 | 1.52520 | 71 | 0.66893 | 1.49493 |

22 | 0.65638 | 1.52351 | 72 | 0.66901 | 1.49475 |

23 | 0.65708 | 1.52190 | 73 | 0.66910 | 1.49456 |

24 | 0.65771 | 1.52043 | 74 | 0.66918 | 1.49437 |

25 | 0.65832 | 1.51902 | 75 | 0.66925 | 1.49422 |

26 | 0.65888 | 1.51772 | 76 | 0.66933 | 1.49402 |

27 | 0.65943 | 1.51647 | 77 | 0.66940 | 1.49387 |

28 | 0.65991 | 1.51536 | 78 | 0.66948 | 1.49370 |

29 | 0.66036 | 1.51433 | 79 | 0.66955 | 1.49354 |

30 | 0.66082 | 1.51328 | 80 | 0.66962 | 1.49339 |

31 | 0.66123 | 1.51233 | 81 | 0.66968 | 1.49325 |

32 | 0.66161 | 1.51146 | 82 | 0.66974 | 1.49312 |

33 | 0.66200 | 1.51057 | 83 | 0.66980 | 1.49298 |

34 | 0.66235 | 1.50977 | 84 | 0.66988 | 1.49281 |

35 | 0.66270 | 1.50899 | 85 | 0.66993 | 1.49270 |

36 | 0.66302 | 1.50824 | 86 | 0.66999 | 1.49257 |

37 | 0.66334 | 1.50753 | 87 | 0.67005 | 1.49244 |

38 | 0.66362 | 1.50688 | 88 | 0.67009 | 1.49233 |

39 | 0.66391 | 1.50623 | 89 | 0.67016 | 1.49219 |

40 | 0.66417 | 1.50563 | 90 | 0.67021 | 1.49207 |

41 | 0.66443 | 1.50504 | 91 | 0.67026 | 1.49196 |

42 | 0.66469 | 1.50447 | 92 | 0.67031 | 1.49185 |

43 | 0.66493 | 1.50393 | 93 | 0.67036 | 1.49174 |

44 | 0.66515 | 1.50341 | 94 | 0.67041 | 1.49161 |

45 | 0.66539 | 1.50289 | 95 | 0.67046 | 1.49152 |

46 | 0.66557 | 1.50246 | 96 | 0.67049 | 1.49144 |

47 | 0.66578 | 1.50200 | 97 | 0.67055 | 1.49131 |

48 | 0.66598 | 1.50155 | 98 | 0.67060 | 1.49121 |

49 | 0.66616 | 1.50115 | 99 | 0.67063 | 1.49114 |

50 | 0.66633 | 1.50076 | 100 | 0.67068 | 1.49102 |

Here is a visualization of this table:

## Factors for huge n

To build the equation for $n > 100$, we also continue the approach suggested in hayes2014 (pages 2208-2209) and use the prediction equation of the following form:

$$ \hat{a}_n = \Phi^{-1}(3/4) \Bigg( 1 - \dfrac{\alpha}{n} - \dfrac{\beta}{n^2} \Bigg). $$We simulated approximations of $\hat{a}_n$ for some large $n$ values:

$n$ | $\hat{a}_n$ |
---|---|

100 | 0.6707 |

110 | 0.6711 |

120 | 0.6714 |

130 | 0.6716 |

140 | 0.6719 |

150 | 0.6720 |

200 | 0.6727 |

250 | 0.6731 |

300 | 0.6733 |

350 | 0.6735 |

400 | 0.6736 |

450 | 0.6737 |

500 | 0.6738 |

1000 | 0.6742 |

1500 | 0.6743 |

2000 | 0.6743 |

Next, we fitted these values using the multiple linear regression. The dependent variable is $y = 1 - \hat{a}_n / \Phi^{-1}(3/4)$; the independent variables are $n^{-1}$ and $n^{-2}$; the intercept is zero:

$$ y = \alpha n^{-1} + \beta n^{-2}. $$The fitted model was adjusted a little bit to get nice-looking values (keeping the good accuracy level):

$$ \alpha = 0.5,\quad \beta = 6.5. $$These values give pretty accurate estimation of $\hat{a}_n$ for $n > 100$ (the accuracy is less than $10^{-4}$).

## The mean squared error

It’s time to compare the straightforward and the Harrell-Davis-based $\textrm{MAD}$ estimators in terms of the mean squared error. To estimate the $\textrm{MSE}$, we also use numerical simulations. Here are the results for some $n$ values:

n | Straightforward | Harrell-Davis |
---|---|---|

3 | 0.690 | 0.272 |

4 | 0.327 | 0.205 |

5 | 0.341 | 0.181 |

10 | 0.136 | 0.100 |

20 | 0.069 | 0.055 |

30 | 0.045 | 0.038 |

40 | 0.034 | 0.029 |

50 | 0.027 | 0.024 |

100 | 0.014 | 0.012 |

As we can see, the difference between estimators for small samples is tangible. For example, for $n = 3$, we got $\textrm{MSE} \approx 0.69$ for the straightforward estimator vs. $\textrm{MSE} \approx 0.27$ for the Harrell-Davis-based estimator. Below you can see density plots of $\textrm{MSE}$ for the above $n$ values.

## Conclusion

In this post, we discussed the unbiased $\textrm{MAD}_n$ estimator which is based on the Harrell-Davis quantile estimator instead of the straightforward median estimator. The suggested estimator is much more efficient for small samples because it has smaller $\textrm{MSE}$. It allows using $\textrm{MAD}_n$ as a efficient alternative to the standard deviation under the normal distribution with higher accuracy.

## References

**[Harrell1982]**

Harrell, F.E. and Davis, C.E., 1982. A new distribution-free quantile estimator.*Biometrika*, 69(3), pp.635-640.

https://doi.org/10.2307/2335999**[Hayes2014]**

Hayes, Kevin. “Finite-sample bias-correction factors for the median absolute deviation.” Communications in Statistics-Simulation and Computation 43, no. 10 (2014): 2205-2212.

https://doi.org/10.1080/03610918.2012.748913**[Park2020]**

Park, Chanseok, Haewon Kim, and Min Wang. “Investigation of finite-sample properties of robust location and scale estimators.” Communications in Statistics-Simulation and Computation (2020): 1-27.

https://doi.org/10.1080/03610918.2019.1699114**[Box1958]**

Box, George EP. “A note on the generation of random normal deviates.” Ann. Math. Statist. 29 (1958): 610-611.

https://doi.org/10.1214/aoms/1177706645