## Untied quantile absolute deviation

In the previous posts, I tried to adapt the concept of the quantile absolute deviation to samples with tied values so that this measure of dispersion never becomes zero for nondegenerate ranges. My previous attempt was the middle non-zero quantile absolute deviation (modification 1, modification 2). However, I’m not completely satisfied with the behavior of this metric. In this post, I want to consider another way to work around the problem with tied values.

## Middle non-zero quantile absolute deviation, Part 2

In one of the previous posts, I described the idea of the middle non-zero quantile absolute deviation. It’s defined as follows:

$\operatorname{MNZQAD}(x, p) = \operatorname{QAD}(x, p, q_m),$

$q_m = \frac{q_0 + 1}{2}, \quad q_0 = \frac{\max(k - 1, 0)}{n - 1}, \quad k = \sum_{i=1}^n \mathbf{1}_{Q(x, p)}(x_i),$

where $$\mathbf{1}$$ is the indicator function

$\mathbf{1}_U(u) = \begin{cases} 1 & \textrm{if}\quad u = U,\\ 0 & \textrm{if}\quad u \neq U, \end{cases}$

and $$\operatorname{QAD}$$ is the quantile absolute deviation

$\operatorname{QAD}(x, p, q) = Q(|x - Q(x, p)|, q).$

The $$\operatorname{MNZQAD}$$ approach tries to work around a problem with tied values. While it works well in the generic case, there are some corner cases where the suggested metric behaves poorly. In this post, we discuss this problem and how to solve it.

## The expected number of takes from a discrete distribution before observing the given element

Let’s consider a discrete distribution $$X$$ defined by its probability mass function $$p_X(x)$$. We randomly take elements from $$X$$ until we observe the given element $$x_0$$. What’s the expected number of takes in this process?

This classic statistical problem could be solved in various ways. I would like to share one of my favorite approaches that involves the derivative of the series $$\sum_{n=0}^\infty x^n$$.

## Folded medians

In the previous post, we discussed the Gastwirth’s location estimator. In this post, we continue playing with different location estimators. To be more specific, we consider an approach called folded medians. Let $$x = \{ x_1, x_2, \ldots, x_n \}$$ be a random sample with order statistics $$\{ x_{(1)}, x_{(2)}, \ldots, x_{(n)} \}$$. We build a folded sample using the following form:

$\Bigg\{ \frac{x_{(1)}+x_{(n)}}{2}, \frac{x_{(2)}+x_{(n-1)}}{2}, \ldots, \Bigg\}.$

If $$n$$ is odd, the middle sample element is folded with itself. The folding operation could be applied several times. Once folding is conducted, the median of the final folded sample is the folded median. A single folding operation gives us the Bickel-Hodges estimator.

In this post, we briefly check how this metric behaves in the case of the Normal and Cauchy distributions.

## Gastwirth's location estimator

Let $$x = \{ x_1, x_2, \ldots, x_n \}$$ be a random sample. The Gastwirth’s location estimator is defined as follows:

$0.3 \cdot Q_{⅓}(x) + 0.4 \cdot Q_{½}(x) + 0.3 \cdot Q_{⅔}(x),$

where $$Q_p$$ is an estimation of the $$p^{\textrm{th}}$$ quantile (using classic sample quantiles).

This estimator could be quite interesting from a practical point of view. On the one hand, it’s robust (the breakdown point ⅓) and it has better statistical efficiency than the classic sample median. On the other hand, it has better computational efficiency than other robust and statistical efficient measures of location like the Harrell-Davis median estimator or the Hodges-Lehmann median estimator.

In this post, we conduct a short simulation study that shows its behavior for the standard Normal distribution and the Cauchy distribution.

## Dynamical System Case Study 1 (symmetric 3d system)

Let’s consider the following dynamical system:

$\begin{cases} \dot{x}_1 = f(x_3) - x_1,\\ \dot{x}_2 = f(x_1) - x_2,\\ \dot{x}_3 = f(x_2) - x_3, \end{cases}$

where $$f(x) = \alpha / (1+x^m)$$ is a Hill function. In this case study, we explore the phase portrait of this system for $$\alpha = 18,\; m = 3$$.

## Beeping Busy Beavers and twin prime conjecture

In this post, I use Beeping Busy Beavers to show that twin prime conjecture could be proven or disproven.

## Hodges-Lehmann-Sen shift and shift confidence interval estimators

In the previous two posts (1, 2), I discussed the Hodges-Lehmann median estimator. The suggested idea of getting median estimations based on a cartesian product could be adopted to estimate the shift between two samples. In this post, we discuss how to build Hodges-Lehmann-Sen shift estimator and how to get confidence intervals for the obtained estimations. Also, we perform a simulation study that checks the actual coverage percentage of these intervals.

## Statistical efficiency of the Hodges-Lehmann median estimator, Part 2

In the previous post, we evaluated the relative statistical efficiency of the Hodges-Lehmann median estimator against the sample median under the normal distribution. In this post, we extended this experiment to a set of various light-tailed and heavy-tailed distributions.