Andrey Akinshin's blog (Page 22)

Weighted quantile estimators

September 29, 2020 Mathematics Statistics Research [Research] Weighted quantile estimators Quantiles Harrell-Davis quantile estimator

Update 2021-07-06: the approach was updated using the Kish’s effective sample size.

In this post, I will show how to calculate weighted quantile estimates and how to use them in practice.

Let’s start with a problem from real life. Imagine that you measure the total duration of a unit test executed daily on a CI server. Every day you get a single number that corresponds to the test duration from the latest revision for this day:

You collect a history of such measurements for 100 days. Now you want to describe the “actual” distribution of the performance measurements.

However, for the latest “actual” revision, you have only a single measurement, which is not enough to build a distribution. Also, you can’t build a distribution based on the last N measurements because they can contain change points that will spoil your results. So, what you really want to do is to use all the measurements, but older values should have a lower impact on the final distribution form.

Such a problem can be solved using the weighted quantiles! This powerful approach can be applied to any time series regardless of the domain area. In this post, we learn how to calculate and apply weighted quantiles.

Nonparametric Cohen's d-consistent effect size

June 25, 2020 Mathematics Statistics Research Effect Size [Research] Gamma Effect Size Median Absolute Deviation Harrell-Davis quantile estimator Perfolizer

Update: the second part of this post is available here.

The effect size is a common way to describe a difference between two distributions. When these distributions are normal, one of the most popular approaches to express the effect size is Cohen’s d. Unfortunately, it doesn’t work great for non-normal distributions.

In this post, I will show a robust Cohen’s d-consistent effect size formula for nonparametric distributions.

DoubleMAD outlier detector based on the Harrell-Davis quantile estimator

June 22, 2020 Mathematics Research Statistics Outliers Median Absolute Deviation Harrell-Davis Asymmetry

Outlier detection is an important step in data processing. Unfortunately, if the distribution is not normal (e.g., right-skewed and heavy-tailed), it’s hard to choose a robust outlier detection algorithm that will not be affected by tricky distribution properties. During the last several years, I tried many different approaches, but I was not satisfied with their results. Finally, I found an algorithm to which I have (almost) no complaints. It’s based on the double median absolute deviation and the Harrell-Davis quantile estimator. In this post, I will show how it works and why it’s better than some other approaches.

How ListSeparator Depends on Runtime and Operating System

May 20, 2020 Programming .NET C# Rider Mono .NET Core

This blog post was originally posted on JetBrains .NET blog.

In the two previous blog posts from this series, we discussed how socket errors and socket orders depend on the runtime and operating systems. For some, it may be obvious that some things are indeed specific to the operating system or the runtime, but often these issues come as a surprise and are only discovered when running our code on different systems. An interesting example that may bite us at runtime is using ListSeparator in our code. It should give us a common separator for list elements in a string. But is it really common? Let’s start our investigation by printing ListSeparator for the Russian language:

Console.WriteLine(new CultureInfo("ru-ru").TextInfo.ListSeparator);

On Windows, you will get the same result for .NET Framework, .NET Core, and Mono: the ListSeparator is ; (a semicolon). You will also get a semicolon on Mono+Unix. However, on .NET Core+Unix, you will get a non-breaking space.