In the previous post, I suggested an asymmetric decile-based outlier detector as an alternative to Tukey’s fences. In this post, we run some numerical simulations to check out the suggested outlier detector in action.
Let \(Q_p\) be an estimation of the \(p^\textrm{th}\) quantile for the given sample, and \(k\) be a parameter of the outlier detector. With this notation, we consider the following range:
\[[Q_{0.1} - k (Q_{0.5} - Q_{0.1}),\, Q_{0.9} + k (Q_{0.9} - Q_{0.5})] \]
All the sample elements outside this range should be considered as outliers.
Now let’s repeat the experiment that I performed for Tukey’s fences according to the following scheme:
- Enumerate different distributions. We consider the standard normal distribution, the standard Gumbel distribution, and the standard exponential distribution.
- Enumerate different \(k\) values. We consider \(k \in \{ 1.0, 1.5, 2.0, 2.5, 3.0, 3.5 \}\).
- Enumerate different samples sizes from 6 to 500.
- Generate 1000 random samples from the given distribution of the given size.
- For each sample, detect outliers using the considered \(k\) values.
- Evaluate the percentage of samples that contain at least one outlier.
Here are the results: