# Unbiased median absolute deviation based on the Harrell-Davis quantile estimator

The median absolute deviation (\(\textrm{MAD}\)) is a robust measure of scale. In the previous post, I showed how to use the unbiased version of the \(\textrm{MAD}\) estimator as a robust alternative to the standard deviation. “Unbiasedness” means that such estimator’s expected value equals the true value of the standard deviation. Unfortunately, there is such thing as the bias–variance tradeoff: when we remove the bias of the \(\textrm{MAD}\) estimator, we increase its variance and mean squared error (\(\textrm{MSE}\)).

In this post, I want to suggest a more efficient unbiased \(\textrm{MAD}\) estimator. It’s also a consistent estimator for the standard deviation, but it has smaller \(\textrm{MSE}\). To build this estimator, we should replace the classic “straightforward” median estimator with the Harrell-Davis quantile estimator and adjust bias-correction factors. Let’s discuss this approach in detail.

### Introduction

Let’s consider a sample \(x = \{ x_1, x_2, \ldots, x_n \}\). Its median absolute deviation \(\textrm{MAD}_n\) can be defined as follows:

\[\textrm{MAD}_n = C_n \cdot \textrm{median}(|X - \textrm{median}(X)|) \]

where \(\textrm{median}\) is a median estimator, \(C_n\) is a scale factor (or consistency constant).

Typically, \(\textrm{median}\) assumes the classic “straightforward” median estimator:

- If \(n\) is odd, the median is the middle element of the sorted sample
- If \(n\) is even, the median is the arithmetic average of the two middle elements of the sorted sample

However, we can use other median estimators. Let’s consider \(\textrm{median}_{\textrm{HD}}\) which calculates the median using the Harrell-Davis quantile estimator (see [Harrell1982]):

\[\textrm{median}_{\textrm{HD}}(x) = \sum_{i=1}^n W_i x_i, \quad W_i = I_{i/n} \bigg( \frac{n+1}{2}, \frac{n+1}{2} \bigg) - I_{(i-1)/n} \bigg( \frac{n+1}{2}, \frac{n+1}{2} \bigg) \]

where \(I_t(a, b)\) denotes the regularized incomplete beta function.

When \(n \to \infty\), we have the exact value of \(C_n\) which makes \(\textrm{MAD}_n\) an unbiased estimator for the standard deviation:

\[C_n = C_\infty = \dfrac{1}{\Phi^{-1}(3/4)} \approx 1.4826022185056 \]

For finite values of \(n\), we should use adjusted \(C_n\) values. We already discussed these values for the straightforward median estimator in the previous post (this problem is well-covered in [Park2020]). For \(\textrm{median}_{\textrm{HD}}\), we should use another set of \(C_n\) values. We reuse the approach from [Hayes2014] with the following notation:

\[C_n = \dfrac{1}{\hat{a}_n} \]

### Factors for small n

Factors for the small values of \(n\) can be obtained using numerical experiments. Here is the scheme for the performed simulation:

- Enumerate \(n\) from \(2\) to \(100\)
- For each \(n\), generate \(2\cdot 10^8\) samples from the normal distribution of size \(n\) (the Box–Muller transform is used, see [Box1958])
- For each sample, estimate \(\textrm{MAD}_n\) using \(C_n = 1\)
- Calculate the arithmetic average of all \(\textrm{MAD}_n\) values for the same \(n\): the result is the value of \(\hat{a}_n\)

As the result of this simulation, I got the following table with \(\hat{a}_n\) and \(C_n\) values for \(n \in \{ 2..100 \}\).

\(n\) | \(\hat{a}_n\) | \(C_n\) | \(n\) | \(\hat{a}_n\) | \(C_n\) |
---|---|---|---|---|---|

1 | NA | NA | 51 | 0.66649 | 1.50039 |

2 | 0.56417 | 1.77250 | 52 | 0.66668 | 1.49998 |

3 | 0.63769 | 1.56816 | 53 | 0.66682 | 1.49966 |

4 | 0.62661 | 1.59589 | 54 | 0.66699 | 1.49926 |

5 | 0.63853 | 1.56611 | 55 | 0.66713 | 1.49895 |

6 | 0.63834 | 1.56656 | 56 | 0.66728 | 1.49863 |

7 | 0.63915 | 1.56458 | 57 | 0.66741 | 1.49833 |

8 | 0.64141 | 1.55908 | 58 | 0.66753 | 1.49805 |

9 | 0.64237 | 1.55675 | 59 | 0.66767 | 1.49774 |

10 | 0.64397 | 1.55288 | 60 | 0.66780 | 1.49746 |

11 | 0.64535 | 1.54955 | 61 | 0.66791 | 1.49720 |

12 | 0.64662 | 1.54651 | 62 | 0.66803 | 1.49694 |

13 | 0.64790 | 1.54346 | 63 | 0.66815 | 1.49667 |

14 | 0.64908 | 1.54064 | 64 | 0.66825 | 1.49644 |

15 | 0.65018 | 1.53803 | 65 | 0.66836 | 1.49621 |

16 | 0.65125 | 1.53552 | 66 | 0.66846 | 1.49597 |

17 | 0.65226 | 1.53313 | 67 | 0.66857 | 1.49574 |

18 | 0.65317 | 1.53101 | 68 | 0.66865 | 1.49555 |

19 | 0.65404 | 1.52896 | 69 | 0.66876 | 1.49531 |

20 | 0.65489 | 1.52698 | 70 | 0.66883 | 1.49514 |

21 | 0.65565 | 1.52520 | 71 | 0.66893 | 1.49493 |

22 | 0.65638 | 1.52351 | 72 | 0.66901 | 1.49475 |

23 | 0.65708 | 1.52190 | 73 | 0.66910 | 1.49456 |

24 | 0.65771 | 1.52043 | 74 | 0.66918 | 1.49437 |

25 | 0.65832 | 1.51902 | 75 | 0.66925 | 1.49422 |

26 | 0.65888 | 1.51772 | 76 | 0.66933 | 1.49402 |

27 | 0.65943 | 1.51647 | 77 | 0.66940 | 1.49387 |

28 | 0.65991 | 1.51536 | 78 | 0.66948 | 1.49370 |

29 | 0.66036 | 1.51433 | 79 | 0.66955 | 1.49354 |

30 | 0.66082 | 1.51328 | 80 | 0.66962 | 1.49339 |

31 | 0.66123 | 1.51233 | 81 | 0.66968 | 1.49325 |

32 | 0.66161 | 1.51146 | 82 | 0.66974 | 1.49312 |

33 | 0.66200 | 1.51057 | 83 | 0.66980 | 1.49298 |

34 | 0.66235 | 1.50977 | 84 | 0.66988 | 1.49281 |

35 | 0.66270 | 1.50899 | 85 | 0.66993 | 1.49270 |

36 | 0.66302 | 1.50824 | 86 | 0.66999 | 1.49257 |

37 | 0.66334 | 1.50753 | 87 | 0.67005 | 1.49244 |

38 | 0.66362 | 1.50688 | 88 | 0.67009 | 1.49233 |

39 | 0.66391 | 1.50623 | 89 | 0.67016 | 1.49219 |

40 | 0.66417 | 1.50563 | 90 | 0.67021 | 1.49207 |

41 | 0.66443 | 1.50504 | 91 | 0.67026 | 1.49196 |

42 | 0.66469 | 1.50447 | 92 | 0.67031 | 1.49185 |

43 | 0.66493 | 1.50393 | 93 | 0.67036 | 1.49174 |

44 | 0.66515 | 1.50341 | 94 | 0.67041 | 1.49161 |

45 | 0.66539 | 1.50289 | 95 | 0.67046 | 1.49152 |

46 | 0.66557 | 1.50246 | 96 | 0.67049 | 1.49144 |

47 | 0.66578 | 1.50200 | 97 | 0.67055 | 1.49131 |

48 | 0.66598 | 1.50155 | 98 | 0.67060 | 1.49121 |

49 | 0.66616 | 1.50115 | 99 | 0.67063 | 1.49114 |

50 | 0.66633 | 1.50076 | 100 | 0.67068 | 1.49102 |

Here is a visualization of this table:

### Factors for huge n

To build the equation for \(n > 100\), we also continue the approach suggested in [Hayes2014] (pages 2208-2209) and use the prediction equation of the following form:

\[\hat{a}_n = \Phi^{-1}(3/4) \Bigg( 1 - \dfrac{\alpha}{n} - \dfrac{\beta}{n^2} \Bigg). \]

We simulated approximations of \(\hat{a}_n\) for some large \(n\) values:

\(n\) | \(\hat{a}_n\) |
---|---|

100 | 0.6707 |

110 | 0.6711 |

120 | 0.6714 |

130 | 0.6716 |

140 | 0.6719 |

150 | 0.6720 |

200 | 0.6727 |

250 | 0.6731 |

300 | 0.6733 |

350 | 0.6735 |

400 | 0.6736 |

450 | 0.6737 |

500 | 0.6738 |

1000 | 0.6742 |

1500 | 0.6743 |

2000 | 0.6743 |

Next, we fitted these values using the multiple linear regression. The dependent variable is \(y = 1 - \hat{a}_n / \Phi^{-1}(3/4)\); the independent variables are \(n^{-1}\) and \(n^{-2}\); the intercept is zero:

\[y = \alpha n^{-1} + \beta n^{-2}. \]

The fitted model was adjusted a little bit to get nice-looking values (keeping the good accuracy level):

\[\alpha = 0.5,\quad \beta = 6.5. \]

These values give pretty accurate estimation of \(\hat{a}_n\) for \(n > 100\) (the accuracy is less than \(10^{-4}\)).

### The mean squared error

It’s time to compare the straightforward and the Harrell-Davis-based \(\textrm{MAD}\) estimators in terms of the mean squared error. To estimate the \(\textrm{MSE}\), we also use numerical simulations. Here are the results for some \(n\) values:

n | Straightforward | Harrell-Davis |
---|---|---|

3 | 0.690 | 0.272 |

4 | 0.327 | 0.205 |

5 | 0.341 | 0.181 |

10 | 0.136 | 0.100 |

20 | 0.069 | 0.055 |

30 | 0.045 | 0.038 |

40 | 0.034 | 0.029 |

50 | 0.027 | 0.024 |

100 | 0.014 | 0.012 |

As we can see, the difference between estimators for small samples is tangible. For example, for \(n = 3\), we got \(\textrm{MSE} \approx 0.69\) for the straightforward estimator vs. \(\textrm{MSE} \approx 0.27\) for the Harrell-Davis-based estimator. Below you can see density plots of \(\textrm{MSE}\) for the above \(n\) values.

### Conclusion

In this post, we discussed the unbiased \(\textrm{MAD}_n\) estimator which is based on the Harrell-Davis quantile estimator instead of the straightforward median estimator. The suggested estimator is much more efficient for small samples because it has smaller \(\textrm{MSE}\). It allows using \(\textrm{MAD}_n\) as a efficient alternative to the standard deviation under the normal distribution with higher accuracy.

### References

**[Harrell1982]**

Harrell, F.E. and Davis, C.E., 1982. A new distribution-free quantile estimator.*Biometrika*, 69(3), pp.635-640.

https://doi.org/10.2307/2335999**[Hayes2014]**

Hayes, Kevin. “Finite-sample bias-correction factors for the median absolute deviation.” Communications in Statistics-Simulation and Computation 43, no. 10 (2014): 2205-2212.

https://doi.org/10.1080/03610918.2012.748913**[Park2020]**

Park, Chanseok, Haewon Kim, and Min Wang. “Investigation of finite-sample properties of robust location and scale estimators.” Communications in Statistics-Simulation and Computation (2020): 1-27.

https://doi.org/10.1080/03610918.2019.1699114**[Box1958]**

Box, George EP. “A note on the generation of random normal deviates.” Ann. Math. Statist. 29 (1958): 610-611.

https://doi.org/10.1214/aoms/1177706645