Fence-based outlier detectors, Part 1
In previous posts, I discussed properties of Tukey’s fences and asymmetric decile-based outlier detector (Part 1, Part 2). In this post, I discuss the generalization of fence-based outlier detectors.
Notation
A Symmetric Fence-based outlier detector could be defined using the following range:
$$ SF(p, k) = [Q_p - k(Q_{1-p} - Q_p), Q_{1-p} + k(Q_{1-p} - Q_p)] $$where $Q_s$ is an estimation of the $s^\textrm{th}$ quantile, $p \in [0, 0.5]$.
All the sample elements outside this range are marked as outliers. Using this notation, Tukey’s fences could be defined as $SF(0.25, k)$.
An Asymmetric Fence-based outlier detector is defined using the following range:
$$ AF(p, k) = [Q_p - 2k(Q_{0.5} - Q_{p}), Q_{1-p} + 2k(Q_{1-p} - Q_{0.5})]. $$An asymmetric decile-based outlier detector could be defined as $AF(0.1, k)$.
Simulation 1
Let’s perform the following experiment:
- Enumerate two types of fence-based outlier detectors: asymmetric and symmetric.
- Enumerate different $p$ values: $0.1$ (deciles) and $0.25$ (quartiles).
- Enumerate different $k$ values: $1.0$, $1.5$, $2.0$, $2.5$, $3.0$, $3.5$, $4.0$.
- Enumerate different distributions: the normal distribution, the exponential distribution, the Gumbel distribution.
- For each combination of the above parameters, estimate the fence values assuming that $Q_s$ is the true value of $s^\textrm{th}$ quantile. Next, calculate the portion of the distribution outside the fences. Thus, we get the probability of observing a single outlier.
The results are below.
Normal distribution:
type | p | k | outliers |
---|---|---|---|
AF | 0.10 | 1.0 | 0.00012072230941067211 |
AF | 0.10 | 1.5 | 0.00000029563871592760 |
AF | 0.10 | 2.0 | 0.00000000014767521496 |
AF | 0.10 | 2.5 | 0.00000000000001483506 |
AF | 0.10 | 3.0 | 0.00000000000000000015 |
AF | 0.10 | 3.5 | 0.00000000000000000000 |
AF | 0.10 | 4.0 | 0.00000000000000000000 |
AF | 0.25 | 1.0 | 0.04302479073838957196 |
AF | 0.25 | 1.5 | 0.00697660323928020812 |
AF | 0.25 | 2.0 | 0.00074502950319118666 |
AF | 0.25 | 2.5 | 0.00005189186759204938 |
AF | 0.25 | 3.0 | 0.00000234194246287498 |
AF | 0.25 | 3.5 | 0.00000006817407812073 |
AF | 0.25 | 4.0 | 0.00000000127585892633 |
SF | 0.10 | 1.0 | 0.00012072230941067211 |
SF | 0.10 | 1.5 | 0.00000029563871592760 |
SF | 0.10 | 2.0 | 0.00000000014767521496 |
SF | 0.10 | 2.5 | 0.00000000000001483506 |
SF | 0.10 | 3.0 | 0.00000000000000000015 |
SF | 0.10 | 3.5 | 0.00000000000000000000 |
SF | 0.10 | 4.0 | 0.00000000000000000000 |
SF | 0.25 | 1.0 | 0.04302479073838957196 |
SF | 0.25 | 1.5 | 0.00697660323928020812 |
SF | 0.25 | 2.0 | 0.00074502950319118666 |
SF | 0.25 | 2.5 | 0.00005189186759204938 |
SF | 0.25 | 3.0 | 0.00000234194246287498 |
SF | 0.25 | 3.5 | 0.00000006817407812073 |
SF | 0.25 | 4.0 | 0.00000000127585892633 |
Exponential distribution:
type | p | k | outliers |
---|---|---|---|
AF | 0.10 | 1.0 | 0.00400000000 |
AF | 0.10 | 1.5 | 0.00080000000 |
AF | 0.10 | 2.0 | 0.00016000000 |
AF | 0.10 | 2.5 | 0.00003200000 |
AF | 0.10 | 3.0 | 0.00000640000 |
AF | 0.10 | 3.5 | 0.00000128000 |
AF | 0.10 | 4.0 | 0.00000025600 |
AF | 0.25 | 1.0 | 0.06250000000 |
AF | 0.25 | 1.5 | 0.03125000000 |
AF | 0.25 | 2.0 | 0.01562500000 |
AF | 0.25 | 2.5 | 0.00781250000 |
AF | 0.25 | 3.0 | 0.00390625000 |
AF | 0.25 | 3.5 | 0.00195312500 |
AF | 0.25 | 4.0 | 0.00097656250 |
SF | 0.10 | 1.0 | 0.01111111111 |
SF | 0.10 | 1.5 | 0.00370370370 |
SF | 0.10 | 2.0 | 0.00123456790 |
SF | 0.10 | 2.5 | 0.00041152263 |
SF | 0.10 | 3.0 | 0.00013717421 |
SF | 0.10 | 3.5 | 0.00004572474 |
SF | 0.10 | 4.0 | 0.00001524158 |
SF | 0.25 | 1.0 | 0.08333333333 |
SF | 0.25 | 1.5 | 0.04811252243 |
SF | 0.25 | 2.0 | 0.02777777778 |
SF | 0.25 | 2.5 | 0.01603750748 |
SF | 0.25 | 3.0 | 0.00925925926 |
SF | 0.25 | 3.5 | 0.00534583583 |
SF | 0.25 | 4.0 | 0.00308641975 |
Gumbel distribution:
type | p | k | outliers |
---|---|---|---|
AF | 0.10 | 1.0 | 0.00243138782251 |
AF | 0.10 | 1.5 | 0.00036996004078 |
AF | 0.10 | 2.0 | 0.00005624389383 |
AF | 0.10 | 2.5 | 0.00000854944973 |
AF | 0.10 | 3.0 | 0.00000129954752 |
AF | 0.10 | 3.5 | 0.00000019753535 |
AF | 0.10 | 4.0 | 0.00000003002599 |
AF | 0.25 | 1.0 | 0.05225343350821 |
AF | 0.25 | 1.5 | 0.02037237984519 |
AF | 0.25 | 2.0 | 0.00849982291839 |
AF | 0.25 | 2.5 | 0.00353655486394 |
AF | 0.25 | 3.0 | 0.00146932399013 |
AF | 0.25 | 3.5 | 0.00061008683006 |
AF | 0.25 | 4.0 | 0.00025325410919 |
SF | 0.10 | 1.0 | 0.00480943027492 |
SF | 0.10 | 1.5 | 0.00103073558090 |
SF | 0.10 | 2.0 | 0.00022057403284 |
SF | 0.10 | 2.5 | 0.00004718708372 |
SF | 0.10 | 3.0 | 0.00001009397656 |
SF | 0.10 | 3.5 | 0.00000215921115 |
SF | 0.10 | 4.0 | 0.00000046187726 |
SF | 0.25 | 1.0 | 0.05920771183234 |
SF | 0.25 | 1.5 | 0.02682956730034 |
SF | 0.25 | 2.0 | 0.01231232518267 |
SF | 0.25 | 2.5 | 0.00562770388698 |
SF | 0.25 | 3.0 | 0.00256759594377 |
SF | 0.25 | 3.5 | 0.00117046709217 |
SF | 0.25 | 4.0 | 0.00053336722084 |
Simulation 2
Let’s perform the following experiment:
- Enumerate two types of fence-based outlier detectors: asymmetric and symmetric.
- Enumerate different $p$ values: $0.1$ (deciles) and $0.25$ (quartiles).
- Enumerate different $k$ values: $1.0$, $1.5$, $2.0$, $2.5$, $3.0$, $3.5$, $4.0$.
- Enumerate different distributions: the normal distribution, the exponential distribution, the Gumbel distribution.
- Enumerate different sample sizes $n$: $5$, $10$, $50$, $100$, $500$, $1000$.
- For each combination of the above parameters, estimate the fence values assuming that $Q_s$ is the true value of $s^\textrm{th}$ quantile. Next, calculate the probability of having outliers for the given sample size $n$.
The results are below.
Normal distribution, SF, p=0.1:
k/n | 5 | 10 | 50 | 100 | 500 | 1000 |
---|---|---|---|---|---|---|
1 | 0.0006 | 0.00121 | 0.00602 | 0.01200 | 0.05858 | 0.11373 |
1.5 | 0.0000 | 0.00000 | 0.00001 | 0.00003 | 0.00015 | 0.00030 |
2 | 0.0000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
2.5 | 0.0000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
3 | 0.0000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
3.5 | 0.0000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
4 | 0.0000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
Normal distribution, SF, p=0.25:
k/n | 5 | 10 | 50 | 100 | 500 | 1000 |
---|---|---|---|---|---|---|
1 | 0.19739 | 0.35582 | 0.88907 | 0.98770 | 1.00000 | 1.00000 |
1.5 | 0.03440 | 0.06762 | 0.29535 | 0.50347 | 0.96982 | 0.99909 |
2 | 0.00372 | 0.00743 | 0.03658 | 0.07182 | 0.31110 | 0.52541 |
2.5 | 0.00026 | 0.00052 | 0.00259 | 0.00518 | 0.02561 | 0.05057 |
3 | 0.00001 | 0.00002 | 0.00012 | 0.00023 | 0.00117 | 0.00234 |
3.5 | 0.00000 | 0.00000 | 0.00000 | 0.00001 | 0.00003 | 0.00007 |
4 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
Normal distribution, AF, p=0.1:
k/n | 5 | 10 | 50 | 100 | 500 | 1000 |
---|---|---|---|---|---|---|
1 | 0.0006 | 0.00121 | 0.00602 | 0.01200 | 0.05858 | 0.11373 |
1.5 | 0.0000 | 0.00000 | 0.00001 | 0.00003 | 0.00015 | 0.00030 |
2 | 0.0000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
2.5 | 0.0000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
3 | 0.0000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
3.5 | 0.0000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
4 | 0.0000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
Normal distribution, AF, p=0.25:
k/n | 5 | 10 | 50 | 100 | 500 | 1000 |
---|---|---|---|---|---|---|
1 | 0.19739 | 0.35582 | 0.88907 | 0.98770 | 1.00000 | 1.00000 |
1.5 | 0.03440 | 0.06762 | 0.29535 | 0.50347 | 0.96982 | 0.99909 |
2 | 0.00372 | 0.00743 | 0.03658 | 0.07182 | 0.31110 | 0.52541 |
2.5 | 0.00026 | 0.00052 | 0.00259 | 0.00518 | 0.02561 | 0.05057 |
3 | 0.00001 | 0.00002 | 0.00012 | 0.00023 | 0.00117 | 0.00234 |
3.5 | 0.00000 | 0.00000 | 0.00000 | 0.00001 | 0.00003 | 0.00007 |
4 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
Exponential distribution, SF, p=0.1:
k/n | 5 | 10 | 50 | 100 | 500 | 1000 |
---|---|---|---|---|---|---|
1 | 0.05433 | 0.10572 | 0.42803 | 0.67285 | 0.99625 | 0.99999 |
1.5 | 0.01838 | 0.03643 | 0.16934 | 0.31000 | 0.84359 | 0.97554 |
2 | 0.00616 | 0.01228 | 0.05990 | 0.11621 | 0.46080 | 0.70926 |
2.5 | 0.00206 | 0.00411 | 0.02037 | 0.04033 | 0.18601 | 0.33742 |
3 | 0.00069 | 0.00137 | 0.00684 | 0.01362 | 0.06629 | 0.12819 |
3.5 | 0.00023 | 0.00046 | 0.00228 | 0.00456 | 0.02260 | 0.04470 |
4 | 0.00008 | 0.00015 | 0.00076 | 0.00152 | 0.00759 | 0.01513 |
Exponential distribution, SF, p=0.25:
k/n | 5 | 10 | 50 | 100 | 500 | 1000 |
---|---|---|---|---|---|---|
1 | 0.35277 | 0.58110 | 0.98710 | 0.99983 | 1.00000 | 1.00000 |
1.5 | 0.21850 | 0.38926 | 0.91503 | 0.99278 | 1.00000 | 1.00000 |
2 | 0.13138 | 0.24551 | 0.75550 | 0.94022 | 1.00000 | 1.00000 |
2.5 | 0.07766 | 0.14928 | 0.55442 | 0.80146 | 0.99969 | 1.00000 |
3 | 0.04545 | 0.08883 | 0.37194 | 0.60554 | 0.99045 | 0.99991 |
3.5 | 0.02644 | 0.05219 | 0.23510 | 0.41493 | 0.93144 | 0.99530 |
4 | 0.01534 | 0.03044 | 0.14321 | 0.26591 | 0.78682 | 0.95455 |
Exponential distribution, AF, p=0.1:
k/n | 5 | 10 | 50 | 100 | 500 | 1000 |
---|---|---|---|---|---|---|
1 | 0.01984 | 0.03929 | 0.18160 | 0.33022 | 0.86521 | 0.98183 |
1.5 | 0.00399 | 0.00797 | 0.03923 | 0.07691 | 0.32979 | 0.55081 |
2 | 0.00080 | 0.00160 | 0.00797 | 0.01587 | 0.07689 | 0.14787 |
2.5 | 0.00016 | 0.00032 | 0.00160 | 0.00319 | 0.01587 | 0.03149 |
3 | 0.00003 | 0.00006 | 0.00032 | 0.00064 | 0.00319 | 0.00638 |
3.5 | 0.00001 | 0.00001 | 0.00006 | 0.00013 | 0.00064 | 0.00128 |
4 | 0.00000 | 0.00000 | 0.00001 | 0.00003 | 0.00013 | 0.00026 |
Exponential distribution, AF, p=0.25:
k/n | 5 | 10 | 50 | 100 | 500 | 1000 |
---|---|---|---|---|---|---|
1 | 0.27580 | 0.47554 | 0.96032 | 0.99843 | 1.00000 | 1.00000 |
1.5 | 0.14678 | 0.27202 | 0.79555 | 0.95820 | 1.00000 | 1.00000 |
2 | 0.07572 | 0.14571 | 0.54498 | 0.79296 | 0.99962 | 1.00000 |
2.5 | 0.03846 | 0.07543 | 0.32440 | 0.54357 | 0.98019 | 0.99961 |
3 | 0.01938 | 0.03838 | 0.17774 | 0.32388 | 0.85871 | 0.98004 |
3.5 | 0.00973 | 0.01936 | 0.09313 | 0.17758 | 0.62376 | 0.85844 |
4 | 0.00487 | 0.00972 | 0.04768 | 0.09308 | 0.38647 | 0.62358 |
Gumbel distribution, SF, p=0.1:
k/n | 5 | 10 | 50 | 100 | 500 | 1000 |
---|---|---|---|---|---|---|
1 | 0.02382 | 0.04707 | 0.21420 | 0.38252 | 0.91023 | 0.99194 |
1.5 | 0.00514 | 0.01026 | 0.05026 | 0.09799 | 0.40288 | 0.64345 |
2 | 0.00110 | 0.00220 | 0.01097 | 0.02182 | 0.10443 | 0.19796 |
2.5 | 0.00024 | 0.00047 | 0.00236 | 0.00471 | 0.02332 | 0.04609 |
3 | 0.00005 | 0.00010 | 0.00050 | 0.00101 | 0.00503 | 0.01004 |
3.5 | 0.00001 | 0.00002 | 0.00011 | 0.00022 | 0.00108 | 0.00216 |
4 | 0.00000 | 0.00000 | 0.00002 | 0.00005 | 0.00023 | 0.00046 |
Gumbel distribution, SF, p=0.25:
k/n | 5 | 10 | 50 | 100 | 500 | 1000 |
---|---|---|---|---|---|---|
1 | 0.26300 | 0.45683 | 0.95272 | 0.99776 | 1.00000 | 1.00000 |
1.5 | 0.12714 | 0.23812 | 0.74329 | 0.93410 | 1.00000 | 1.00000 |
2 | 0.06006 | 0.11652 | 0.46175 | 0.71029 | 0.99796 | 1.00000 |
2.5 | 0.02782 | 0.05487 | 0.24586 | 0.43128 | 0.94050 | 0.99646 |
3 | 0.01277 | 0.02538 | 0.12063 | 0.22670 | 0.72347 | 0.92353 |
3.5 | 0.00584 | 0.01164 | 0.05688 | 0.11052 | 0.44322 | 0.68999 |
4 | 0.00266 | 0.00532 | 0.02632 | 0.05195 | 0.23414 | 0.41346 |
Gumbel distribution, AF, p=0.1:
k/n | 5 | 10 | 50 | 100 | 500 | 1000 |
---|---|---|---|---|---|---|
1 | 0.01210 | 0.02405 | 0.11460 | 0.21607 | 0.70393 | 0.91235 |
1.5 | 0.00185 | 0.00369 | 0.01833 | 0.03633 | 0.16891 | 0.30929 |
2 | 0.00028 | 0.00056 | 0.00281 | 0.00561 | 0.02773 | 0.05469 |
2.5 | 0.00004 | 0.00009 | 0.00043 | 0.00085 | 0.00427 | 0.00851 |
3 | 0.00001 | 0.00001 | 0.00006 | 0.00013 | 0.00065 | 0.00130 |
3.5 | 0.00000 | 0.00000 | 0.00001 | 0.00002 | 0.00010 | 0.00020 |
4 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00002 | 0.00003 |
Gumbel distribution, AF, p=0.25:
k/n | 5 | 10 | 50 | 100 | 500 | 1000 |
---|---|---|---|---|---|---|
1 | 0.23535 | 0.41531 | 0.93167 | 0.99533 | 1.00000 | 1.00000 |
1.5 | 0.09780 | 0.18603 | 0.64269 | 0.87233 | 0.99997 | 1.00000 |
2 | 0.04178 | 0.08182 | 0.34741 | 0.57413 | 0.98599 | 0.99980 |
2.5 | 0.01756 | 0.03481 | 0.16234 | 0.29832 | 0.82991 | 0.97107 |
3 | 0.00733 | 0.01460 | 0.07088 | 0.13674 | 0.52059 | 0.77017 |
3.5 | 0.00305 | 0.00608 | 0.03005 | 0.05920 | 0.26298 | 0.45680 |
4 | 0.00127 | 0.00253 | 0.01258 | 0.02501 | 0.11895 | 0.22375 |