Beeping Busy Beavers and twin prime conjecture
In this post, I use Beeping Busy Beavers to show that twin prime conjecture could be proven or disproven.
Read more
Hodges-Lehmann-Sen shift and shift confidence interval estimators
In the previous two posts (1, 2), I discussed the Hodges-Lehmann median estimator. The suggested idea of getting median estimations based on a cartesian product could be adopted to estimate the shift between two samples. In this post, we discuss how to build Hodges-Lehmann-Sen shift estimator and how to get confidence intervals for the obtained estimations. Also, we perform a simulation study that checks the actual coverage percentage of these intervals.
Read more
Statistical efficiency of the Hodges-Lehmann median estimator, Part 2
In the previous post, we evaluated the relative statistical efficiency of the Hodges-Lehmann median estimator against the sample median under the normal distribution. In this post, we extended this experiment to a set of various light-tailed and heavy-tailed distributions.
Read more
Statistical efficiency of the Hodges-Lehmann median estimator, Part 1
In this post, we evaluate the relative statistical efficiency of the Hodges-Lehmann median estimator against the sample median under the normal distribution. We also compare it with the efficiency of the Harrell-Davis quantile estimator.
Read more
Expected value of the maximum of two standard half-normal distributions
Let \(X_1, X_2\) be i.i.d. random variables that follow the standard normal distribution \(\mathcal{N}(0,1^2)\). In the previous post, I have found the expected value of \(\min(|X_1|, |X_2|)\). Now it’s time to find the value of \(Z = \max(|X_1|, |X_2|)\).
Read more
Expected value of the minimum of two standard half-normal distributions
Let \(X_1, X_2\) be i.i.d. random variables that follow the standard normal distribution \(\mathcal{N}(0,1^2)\). One day I wondered, what is the expected value of \(Z = \min(|X_1|, |X_2|)\)? It turned out to be a fun exercise. Let’s solve it together!
Read more
Unbiased median absolute deviation for n=2
I already covered the topic of the unbiased median deviation based on the traditional sample median, the Harrell-Davis quantile estimator, and the trimmed Harrell-Davis quantile estimator. In all the posts, the values of bias-correction factors were evaluated using the Monte-Carlo simulation. In this post, we calculate the exact value of the bias-correction factor for two-element samples.
Read more
Weighted trimmed Harrell-Davis quantile estimator
In this post, I combine ideas from two of my previous posts:
- Trimmed Harrell-Davis quantile estimator: quantile estimator that provides an optimal trade-off between statistical efficiency and robustness
- Weighted quantile estimators: a general scheme that allows building weighted quantile estimators. Could be used for quantile exponential smoothing and dispersion exponential smoothing.
Thus, we are going to build a weighted version of the trimmed Harrell-Davis quantile estimator based on the highest density interval of the given width.
Read more
Minimum meaningful statistical level for the Mann–Whitney U test
The Mann–Whitney U test is one of the most popular nonparametric null hypothesis significance tests. However, like any statistical test, it has limitations. We should always carefully match them with our business requirements. In this post, we discuss how to properly choose the statistical level for the Mann–Whitney U test on small samples.
Let’s say we want to compare two samples \(x = \{ x_1, x_2, \ldots, x_n \}\) and \(y = \{ y_1, y_2, \ldots, y_m \}\) using the one-sided Mann–Whitney U test. Sometimes, we don’t have an opportunity to gather enough data and we have to work with small samples. Imagine that the size of both samples is six: \(n=m=6\). We want to set the statistical level \(\alpha\) to \(0.001\) (because we really don’t want to get false-positive results). Is it a valid requirement? In fact, the minimum p-value we can observe with \(n=m=6\) is \(\approx 0.001082\). Thus, with \(\alpha = 0.001\), it’s impossible to get a positive result. Meanwhile, everything is correct from the technical point of view: since we can’t get any positive results, the false positive rate is exactly zero which is less than \(0.001\). However, it’s definitely not something that we want: with this setup the test becomes useless because it always provides negative results regardless of the input data.
This brings an important question: what is the minimum meaningful statistical level that we can require for the one-sided Mann–Whitney U test knowing the sample sizes?
Read more
Fence-based outlier detectors, Part 2
In the previous post, I discussed different fence-based outlier detectors. In this post, I show some examples of these detectors with different parameters.
Read more