Asymmetric decile-based outlier detector, Part 2

DateTags

In the previous post, I suggested an asymmetric decile-based outlier detector as an alternative to Tukey’s fences. In this post, we run some numerical simulations to check out the suggested outlier detector in action.

Let \(Q_p\) be an estimation of the \(p^\textrm{th}\) quantile for the given sample, and \(k\) be a parameter of the outlier detector. With this notation, we consider the following range:

\[[Q_{0.1} - k (Q_{0.5} - Q_{0.1}),\, Q_{0.9} + k (Q_{0.9} - Q_{0.5})] \]

All the sample elements outside this range should be considered as outliers.

Now let’s repeat the experiment that I performed for Tukey’s fences according to the following scheme:

  • Enumerate different distributions. We consider the standard normal distribution, the standard Gumbel distribution, and the standard exponential distribution.
  • Enumerate different \(k\) values. We consider \(k \in \{ 1.0, 1.5, 2.0, 2.5, 3.0, 3.5 \}\).
  • Enumerate different samples sizes from 6 to 500.
  • Generate 1000 random samples from the given distribution of the given size.
  • For each sample, detect outliers using the considered \(k\) values.
  • Evaluate the percentage of samples that contain at least one outlier.

Here are the results: