## Insidious implicit statistical assumptions

Recently, I was rereading hampel1986 and I found this quote about the difference between robust and nonparametric statistics (page 9):

Robust statistics considers the effects of only approximate fulfillment of assumptions, while nonparametric statistics makes rather weak but nevertheless strict assumptions (such as continuity of distribution or independence).

This statement may sound obvious. Unfortunately, facts that are presumably obvious in general are not always so obvious at the moment. When a researcher works with specific types of distributions for a long time, the properties of these distributions may be transformed into implicit assumptions. This implicitness can be pretty dangerous. If an assumption is explicitly declared, it can become a starting point for a discussion on how to handle violations of this assumption. The implicit assumptions are hidden and therefore conceal potential issues in cases when the collected data do not meet our expectations.

A switch from parametric to nonparametric methods is sometimes perceived as a rejection of all assumptions. Such a perception can be hazardous. While the original parametric assumption is actually neglected, many researchers continue to act like the implicit consequences of this assumption are still valid.

Since normality is the most popular parametric assumption, I would like to briefly discuss connected implicit assumptions that are often perceived not as non-validated hypotheses, but as essential properties of the collected data.

## Four main books on robust statistics

Robust statistics is a practical and pragmatic branch of statistics. If you want to design reliable and trustworthy statistical procedures, the knowledge of robust statistics is essential. Unfortunately, it’s a challenging topic to learn.

In this post, I share my favorite books on robust statistics. I cannot pick my favorite one: each book is good in its own way, and all of them complement each other. I am returning to these books periodically to reinforce and expand my understanding of the topic.

## Multimodal distributions and effect size

When we want to express the difference between two samples or distributions, a popular measure family is the effect sizes based on differences between means (difference family). When the normality assumption is satisfied, this approach works well thanks to classic measures of effect size like Cohen’s d, Glass’ Δ, or Hedges’ g. With slight deviations from normality, robust alternatives may be considered. To build such a measure, it’s enough to upgrade classic measures by replacing the sample mean with a robust measure of central tendency and replacing the standard deviation with a robust measure of dispersion. However, it might not be enough in the case of large deviations from normality. In this post, I briefly discuss the problem of effect size evaluation in the context of multimodal distributions.

## Unobvious limitations of R *signrank Wilcoxon Signed Rank functions

In R, we have functions to calculate the density, distribution function, and quantile function of the Wilcoxon Signed Rank statistic distribution: dsignrank, psignrank, and qsignrank. All the functions use exact calculations of the target functions (the R 4.3.1 implementation can be found here). The exact approach works excellently for small sample sizes. Unfortunately, for large sample sizes, it fails to provide the expected function values. Out of the box, there are no alternative approximation solutions that could allow us to get reasonable results. In this post, we investigate the limitations of these functions and provide sample size thresholds after which we might get invalid results.

## Weighted Mann-Whitney U test, Part 1

Previously, I have discussed how to build weighted versions of various statistical methods. I have already covered weighted versions of various quantile estimators and the Hodges-Lehmann location estimator. Such methods can be useful in various tasks like the support of weighted mixture distributions or exponential smoothing. In this post, I suggest a way to build a weighted version of the Mann-Whitney U test.

## Joining modes of multimodal distributions

Multimodality of distributions is a severe issue in statistical analysis. Comparing two multimodal distributions is a tricky challenge. The degree of this challenge depends on the number of existing modes. Switching from unimodal models to multimodal ones can be a controversial decision, potentially causing more problems than solutions. Hence, if we dare to increase the complexity of the considering models, we should be sure that this is an essential necessity. Even when we confidently detect a truly multimodal distribution, a unimodal model could be an acceptable approximation if it is sufficiently close to the true distribution. The simplicity of a unimodal model may make it preferable, even if it is less accurate. Of course, the research goals should always be taken into account when the particular model choice is being made.

## Understanding the pitfalls of preferring the median over the mean

A common task in mathematical statistics is to aggregate a set of numbers $\mathbf{x} = \{ x_1, x_2, \ldots, x_n \}$ to a single “average” value. Such a value is usually called central tendency. There are multiple measures of central tendency. The most popular one is the arithmetic average or the mean:

$$\overline{\mathbf{x}} = \left( x_1 + x_2 + \ldots + x_n \right) / n.$$

The mean is so popular not only thanks to its simplicity but also because it provides the best way to estimate the center of the perfect normal distribution. Unfortunately, the mean is not a robust measure. This means that a single extreme value $x_i$ may distort the mean estimation and lead to a non-reproducible value that has nothing in common with the “expected” central tendency. The actual real-life distributions are never normal. They can be pretty close to the normal distribution, but only to a certain extent. Even small deviations from normality may produce occasional extreme outliers, which makes the mean an unreliable measure in the general case.

When people discover the danger of the mean, they start looking for a more robust measure of the central tendency. And the first obvious alternative is the sample median $\tilde{\mathbf{x}}$. The classic sample median is easy to calculate. First, you have to sort the sample. If the sample size $n$ is odd, the median is the middle element in the sorted sample. If $n$ is even, the median is the arithmetic average of the two middle elements in the sorted sample. The median is extremely robust: it provides a reasonable estimate even if almost half of the sample elements are corrupted.

For symmetric distributions (including the normal one), the true values of the mean and the median are the same. Once we discover the high robustness of the median, it may be tempting to always use the median instead of the mean. The median is often perceived as “something like the mean but with high resistance to outliers.” Indeed, what is the point of using the unreliable mean, if the median always provides a safer choice? Should we make the median our default option for the central tendency?

The answer is no. You should beware of any default options in mathematical statistics. All the measures are just tools, and each tool has its limitations and areas of applicability. A mindless transition from the mean to the median, regardless of the underlying distribution, is not a smart move. When we are picking a measure of central tendency to use, the first step should be reviewing the research goals: why do we need a measure of central tendency, and what are we going to do with the result? It’s impossible to make a rational decision on the statistical methods used without a clear understanding of the goals. Next, we should match the goals to the properties of available measures.

There are multiple practical issues with the median, but the most noticeable problem in practice is about its statistical efficiency. Understanding this problem reveals the price of advanced robustness of the median. In this post, we discuss the concept of statistical efficiency, estimate the statistical efficiency of the mean and the median under different distributions, and consider the Hodges-Lehman estimator as a measure of central tendency that provides a better trade-off between robustness and efficiency.

## Introducing the defensive statistics

Normal or approximately normal subjects are less useful objects of research than their pathological counterparts.

— Sigmund Freud, “The Psychopathology of Everyday Life”

In the realm of software development, reliability is crucial. This is especially true when creating systems that automatically analyze performance measurements to maintain optimal application performance. To achieve the desired level of reliability, we need a set of statistical approaches that provide accurate and trustworthy results. These approaches must work even when faced with varying input data sets and multiple violated assumptions, including malformed and corrupted values. In this blog post, I introduce “Defensive Statistics” as an appropriate methodology for tackling this challenge.

## Edgeworth expansion for the Mann-Whitney U test, Part 2: increased accuracy

In the previous post, we showed how the Edgeworth expansion can improve the accuracy of obtained p-values in the Mann-Whitney U test. However, we considered only the Edgeworth expansion to terms of order $1/m$. In this post, we explore how to improve the accuracyk of this approach using the Edgeworth expansion to terms of order $1/m^2$.