## The Effect Existence, Its Magnitude, and the Goals

If you are curious if something impacts something else, the answer is probably “yes.” Does that indicator depend on those factors? Yes, it does. If we change this thing, would it affect …? Yes, it would. If a person takes this pill, could it cause a non-exactly-zero change in the body? Yes, the presence of the pill is already a change that can always be detected with the right amount of effort.

One may argue that in some cases (assuming the list of specific cases is presented), zero effect does exist. For a moment, let us pretend that it is true. Now, let us imagine a parallel universe, which is the same as ours but with the presence of the effect. Unfortunately, the effect is so small that our tools are not sophisticated enough to detect it. Imagine being put into one of these worlds, but you don’t know which one. How do you determine the existence of the effect? Of course, you can improve the resolution of the measurement tools via new scientific discoveries, but with the current state of technology, the absence of the effect cannot be checked. Therefore, it is always safer to assume that the effect exists, keeping in mind that it can be negligible. Let us accept this assumption and continue if it is absolute truth.

Read more## Case Study: A City Social Survey

Imagine a city mayor considering a project offering to build parks in several neighborhoods. It can be a good budget investment since it can potentially increase the happiness level of the citizens. However, it is just a hypothesis: if parks do not impact happiness, it is worth considering other city renovation projects. It makes sense to perform a pilot experiment before spending the budget on all the parks. The mayor is thinking about the following plan: pick a random neighborhood, survey the citizens to measure their happiness, build a park, survey the citizens again, compare the survey results, make a decision about the further parks in other neighborhoods. Someone is needed to design the survey and draw the conclusion.

Let us explore possible approaches to perform such a study. These artificial examples are not guidelines but rather simplified illustrations of possible mindsets presented as lists of thoughts. In this demonstration, we mainly focus on the attitude to the research process rather than on the technical details. All the examples are based on real stories.

Read more## Simplifying Adjustments of Confidence Levels and Practical Significance Thresholds

Translation of the buisness goals to the actual parameters of the statistical procedure is a non-trivial task. The degree of non-triviality increases if we should adjust several parameters at the same time. In this post, we consider a problem of simultaneous choice of the confidence level and the practical significance threshold. We discuss possible pitfalls and how to simplify the adjusting procedure to avoid them.

Read more## Degrees of Practical Significance

Let’s say we have two data samples, and we want to check if there is a difference between them. If we are talking about any kind of difference, the answer is most probably yes. It’s highly unlikely that two random samples are identical. Even if they are, there are still chances that we observe such a situation by accident, and there is a difference in the underlying distributions. Therefore, the discussion about the existence of any kind of difference is not meaningful.

To make more meaningful insights, researchers often talk about statistical significance.
The approach can also be misleading.
If the sample size is large enough, we are almost always able to detect even a neglectable difference
and obtain a statistically significant result for any pair of distributions.
On the other hand, a huge difference can be declared insignificant if the sample size is small.
While the concept is interesting and well-researched, it rarely matches the actual research goal.
I strongly believe that we should *not* test for the nil hypothesis (checking if the true difference is *exactly* zero).

Here, we can switch from statistical significance to practical significance. We are supposed to define a threshold (e.g., in terms of minimum effect size) for the difference that is meaningful for the research. This approach has more chances to be aligned with the research goals. However, it is also not always satisfying enough. We should keep in mind that hypothesis testing often arises in the context of decision-making problems. In some cases, we can do exploration research in which we just want to have a better understanding of the world. However, in most cases, we do not perform calculations just because we are curious; we often want to make a decision based on the results. And this is the most crucial moment. It should always be the starting point in any research project. First of all, we should clearly describe the possible decisions and their preconditions. When we start doing that, we can discover that not all the practically significant outcomes are equally significant. If different practically significant results may lead to different decisions, we should define the proper classification in advance during the research design stage. The dichotomy of “practically significant” vs. “not practically significant” may conceal important problem aspects and lead to a wrong decision.

In this post, I would like to discuss the degrees of practical significance and show an example of how important it is for some problems.

Read more## Weighted Mann-Whitney U Test, Part 3

I continue building a weighted version of the Mann–Whitney $U$ test. While previously suggested approach feel promising, I don’t like the usage of Bootstrap to obtain the $p$-value. It is always better to have a deterministic and exact approach where it’s possible. I still don’t know how to solve it in general case, but it seems that I’ve obtained a reasonable solution for some specific cases. The current version of the approach still has issues and requires additional correction factors in some cases and additional improvements. However, it passes my minimal requirements, so it is worth trying to continue developing this idea. In this post, I share the description of the weighted approach and provide numerical examples.

Read more## Andreas Löffler's Implementation of the Exact P-Values Calculations for the Mann-Whitney U Test

Mann-Whitney is one of the most popular non-parametric statistical tests. Unfortunately, most test implementations in statistical packages are far from perfect. The exact p-value calculation is time-consuming and can be impractical for large samples. Therefore, most implementations automatically switch to the asymptotic approximation, which can be quite inaccurate. Indeed, the classic normal approximation could produce enormous errors. Thanks to the Edgeworth expansion, the accuracy can be improved, but it is still not always satisfactory enough. I prefer using the exact p-value calculation whenever possible.

The computational complexity of the exact p-value calculation using the classic recurrent equation suggested by Mann and Whitney is $\mathcal{O}(n^2 m^2)$ in terms of time and memory. It’s not a problem for small samples, but for medium-size samples, it is slow, and it has an extremely huge memory footprint. This gives us an unpleasant dilemma: either we use the exact p-value calculation (which is extremely time and memory-consuming), or we use the asymptotic approximation (which gives poor accuracy).

Last week, I got acquainted with a brilliant algorithm for the exact p-value calculation suggested by Andreas Löffler in 1982. It’s much faster than the classic approach, and it requires only $\mathcal{O}(n+m)$ memory.

Read more## Eclectic Statistics

In the world of mathematical statistics, there is a constant confrontation between adepts of different paradigms. This is a constant source of confusion for many researchers who struggle to pick out the proper approach to follow. For example, how to choose between the frequentist and Bayesian approaches? Since these paradigms may produce inconsistent results (e.g., see Lindley’s paradox), some choice has to be made. The easiest way to conduct research is to pick a single paradigm and stick to it. The right way to conduct research is to carefully think.

Read more## Change Point Detection and Recent Changes

Change point detection (CPD) in time series analysis is an essential tool for identifying significant shifts in data patterns. These shifts, or “change points,” can signal critical transitions in various contexts. While most CPD algorithms are adept at discovering historical change points, their sensitivity in detecting recent changes can be limited, often due to a key parameter: the minimum distance between sequential change points. In this post, I share some speculations on how we can improve cpd analysis by combining two change point detectors.

Read more## Merging Extended P² Quantile Estimators, Part 1

P² quantile estimator is a streaming quantile estimator with $\mathcal{O}(1)$ memory footprint and an extremely fast update procedure. Several days ago, I learned that it was adopted for the new Paint.NET GPU-based Median Sketch effect (the description is here). While P² meets the basic problem requirement (streaming median approximation without storing all the values), the algorithm performance is still not acceptable without additional adjustments. A significant performance improvement can be obtained if we split the input stream, process each part separately with a separate P², and merge the results. Unfortunately, the merging procedure is a tricky thing to implement. I enjoy such challenges, so I decided to attempt to build such a merging approach. In this post, I describe my first attempt.

Read more## Hodges-Lehmann Ratio Estimator vs. Bhattacharyya's Scale Ratio Estimator

Previously, I discussed an idea of a ratio estimator based on the Hodges-Lehmann estimator. This idea looks so simple and natural that I was sure that it must have already been proposed and studied. However, when I started to search for it, it turned out that it was not as easy as I expected. Moreover, some papers attribute this idea to Bhattacharyya, which is not accurate. In this post, we discuss the difference between these two approaches.

Read more