Finite-sample bias correction factors for Rousseeuw-Croux scale estimators

Andrey Akinshin · 2022-09-06

Update: this blog post is a part of research that aimed to investigate finite-sample properties of the Rousseeuw-Croux scale estimators. A preprint with final results is available on arXiv: arXiv:2209.12268 [stat.ME]. Some information in this blog post can be obsolete: please, use the preprint as the primary reference.

The Rousseeuw-Croux scale estimators $S_n$ and $Q_n$ are efficient alternatives to the median absolute deviation ($\operatorname{MAD}_n$). While all three estimators have the same breakdown point of $50\%$, $S_n$ and $Q_n$ have higher statistical efficiency than $\operatorname{MAD}_n$. The asymptotic Gaussian efficiency values of $\operatorname{MAD}_n$, $S_n$, and $Q_n$ are $37\%$, $58\%$, and $82\%$ respectively.

Using scale constants, we can make $S_n$ and $Q_n$ consistent estimators for the standard deviation under normality. The asymptotic values of these constants are well-known. However, for finite-samples, only approximated scale constants are known. In this post, we provide refined values of these constants with higher accuracy.

Introduction

The $S_n$ and $Q_n$ estimators are presented in Alternatives to the Median Absolute Deviation
By Peter J Rousseeuw, Christophe Croux · 1993 rousseeuw1993. For a sample $x = \{ x_1, x_2, \ldots, x_n \}$, they are defined as follows:

$$ \newcommand{\Sn}{\operatorname{S}_n} \newcommand{\Qn}{\operatorname{Q}_n} \newcommand{\MAD}{\operatorname{MAD}} \newcommand{\med}{\operatorname{med}} \newcommand{\lomed}{\operatorname{lomed}} \newcommand{\himed}{\operatorname{himed}} S_n = c_n \cdot 1.1926 \cdot \operatorname{lowmed}_i \; \operatorname{highmed}_j \; |x_i - x_j|, $$$$ Q_n = d_n \cdot 2.2191 \cdot \{ |x_i-x_j|; i < j \}_{(k)}, $$

where

$\operatorname{lowmed}$ is the $\lfloor (n+1) / 2 \rfloor^\textrm{th}$ order statistic out of $n$ numbers,
$\operatorname{highmed}$ is the $(\lfloor n / 2 \rfloor + 1)^\textrm{th}$ order statistic out of $n$ numbers,
$c_n$, $d_n$ are bias-correction factors for finite samples (some approximation can be found in [Rousseeuw1993]),
$k = \binom{\lfloor n / 2 \rfloor + 1}{2}$,
${}_{(k)}$ is the $k^\textrm{th}$ order statistic.

In Time-Efficient Algorithms for Two Highly Robust Estimators of Scale
By Christophe Croux, Peter J Rousseeuw, Yadolah Dodge, Joe Whittaker · 1992 croux1992), rough approximations of $c_n$ and $d_n$ are given. The values for $n \leq 9$ are presented in the following table:

n	$c_n$	$d_n$
2	0.743	0.399
3	1.851	0.994
4	0.954	0.512
5	1.351	0.844
6	0.993	0.611
7	1.198	0.857
8	1.005	0.669
9	1.131	0.872

For $n \geq 10$, they suggested using the following prediction equations:

$$ c_n = \frac{n}{n - 0.9}, \quad \textrm{for odd}\; n, $$$$ c_n = 1, \quad \textrm{for even}\; n, $$$$ d_n = \frac{n}{n + 1.4}, \quad \textrm{for odd}\; n, $$$$ d_n = \frac{n}{n + 3.8}, \quad \textrm{for even}\; n. $$

In the R package robustbase, adjusted values of $d_n$ are used:

n	$d_n$
2	0.399356
3	0.99365
4	0.51321
5	0.84401
6	0.61220
7	0.85877
8	0.66993
9	0.87344
10	0.72014
11	0.88906
12	0.75743

For $n > 12$, the following prediction equations are used:

$$ d_n = \big( 1 + 1.60188 n^{-1} - 2.1284 n^{-2} - 5.172 n^{-3} \big)^{-1}, \quad \textrm{for odd}\; n, $$$$ d_n = \big( 1 + 3.67561 n^{-1} + 1.9654 n^{-2} + 6.987 n^{-3} - 77 n^{-4} \big)^{-1}, \quad \textrm{for even}\; n. $$

Refined bias-correction factors

In order to obtain refined values of the bias-correction factors, we perform an extensive Monte-Carlo simulations. Here are the raw results for $n \leq 100$:

n	$c_n$	$d_n$
2	0.74303	0.39954
3	1.84983	0.99386
4	0.95505	0.51333
5	1.34857	0.84412
6	0.99413	0.61224
7	1.19832	0.85886
8	1.00496	0.67000
9	1.13178	0.87359
10	1.00689	0.72007
11	1.09592	0.88902
12	1.00635	0.75748
13	1.07423	0.90232
14	1.00513	0.78551
15	1.06006	0.91248
16	1.00384	0.80779
17	1.05006	0.92106
18	1.00281	0.82600
19	1.04297	0.92793
20	1.00219	0.84105
21	1.03738	0.93380
22	1.00139	0.85367
23	1.03311	0.93894
24	1.00091	0.86441
25	1.02969	0.94303
26	1.00066	0.87372
27	1.02686	0.94680
28	1.00045	0.88186
29	1.02449	0.95009
30	1.00005	0.88901
31	1.02260	0.95304
32	0.99995	0.89531
33	1.02087	0.95566
34	0.99974	0.90099
35	1.01950	0.95789
36	0.99978	0.90600
37	1.01830	0.96004
38	0.99960	0.91061
39	1.01717	0.96192
40	0.99969	0.91480
41	1.01619	0.96361
42	0.99960	0.91852
43	1.01538	0.96522
44	0.99955	0.92200
45	1.01460	0.96668
46	0.99960	0.92515
47	1.01391	0.96802
48	0.99948	0.92809
49	1.01324	0.96923
50	0.99953	0.93085
51	1.01264	0.97040
52	0.99954	0.93334
53	1.01228	0.97147
54	0.99949	0.93566
55	1.01175	0.97237
56	0.99950	0.93781
57	1.01127	0.97328
58	0.99955	0.93985
59	1.01090	0.97421
60	0.99959	0.94180
61	1.01054	0.97496
62	0.99954	0.94355
63	1.01023	0.97573
64	0.99963	0.94525
65	1.00988	0.97648
66	0.99968	0.94687
67	1.00951	0.97710
68	0.99959	0.94837
69	1.00923	0.97773
70	0.99966	0.94978
71	1.00902	0.97837
72	0.99965	0.95112
73	1.00877	0.97891
74	0.99964	0.95235
75	1.00851	0.97944
76	0.99966	0.95359
77	1.00835	0.97999
78	0.99968	0.95472
79	1.00810	0.98049
80	0.99966	0.95579
81	1.00790	0.98090
82	0.99970	0.95677
83	1.00765	0.98138
84	0.99970	0.95781
85	1.00762	0.98179
86	0.99968	0.95871
87	1.00740	0.98216
88	0.99972	0.95967
89	1.00723	0.98255
90	0.99973	0.96051
91	1.00705	0.98295
92	0.99974	0.96139
93	1.00689	0.98329
94	0.99974	0.96212
95	1.00674	0.98363
96	0.99978	0.96294
97	1.00661	0.98399
98	0.99973	0.96364
99	1.00650	0.98430
100	0.99982	0.96438

For $n > 100$, we suggest using the following prediction equations:

$$ c_n = 1 + 0.7096 n^{-1} - 7.3604 n^{-2}, \quad \textrm{for odd}\; n, $$$$ c_n = 1 + 0.0391 n^{-1} - 6.1719 n^{-2}, \quad \textrm{for even}\; n, $$$$ d_n = 1 - 1.6022 n^{-1} + 4.7453 n^{-2}, \quad \textrm{for odd}\; n, $$$$ d_n = 1 -3.6741 n^{-1} + 11.1030 n^{-2}, \quad \textrm{for even}\; n. $$

Here are the corresponding plots:

References

[Croux1992]
Croux, Christophe, and Peter J. Rousseeuw. “Time-Efficient Algorithms for Two Highly Robust Estimators of Scale.” In Computational Statistics, edited by Yadolah Dodge and Joe Whittaker, 411–28. Heidelberg: Physica-Verlag HD, 1992.
https://doi.org/10.1007/978-3-662-26811-7_58.
[Rousseeuw1993]
Rousseeuw, Peter J., and Christophe Croux. “Alternatives to the Median Absolute Deviation.” Journal of the American Statistical Association 88, no. 424 (December 1, 1993): 1273–83.
https://doi.org/10.1080/01621459.1993.10476408.