Posts / Lowland multimodality detection


Multimodality is an essential feature of a distribution, which may create many troubles during automatic analysis. One of the best ways to work with such distributions is to detect all the modes in advance based on the given samples. Unfortunately, this problem is much harder than it looks like.

I tried many different approaches for multimodality detection, but none of them was good enough. During the past several years, my approach of choice was the mvalue-based modal test by Brendan Gregg. It works nicely in simple cases, but I was constantly stumbling over noisy samples where this algorithm doesn’t produce reliable results. Also, it has some limitations that make it inapplicable to some corner cases.

So, I needed a better approach. Here are my main requirements:

I failed to find such an algorithm anywhere, so I came up with my own! The current working title is “the lowland multimodality detector.” It takes an estimation of the probability density function (PDF) and tries to find “lowlands” (areas that are much lower than neighboring peaks). Next, it splits the plot by these lowlands and detects modes between them. For the PDF estimation, it uses the quantile-respectful density estimation based on the Harrell-Davis quantile estimator (QRDE-HD) (see also harrell1982) Let me explain how it works in detail.

The problem

How many modes do you see in the above image? It’s a trick question because it depends on the “mode” definition and on the kind of plot we have. Here are some possible options:

In this post, we will work with the most practical case. Let’s say that we have a sample, and we want to provide an estimation of the modes in the underlying distribution. There is an important nuance here. If we define the mode as the local maxima, we can easily get bunches of noisy modes that are close to each other, but have no clear separation. Typically, the true density of real-life distributions is very noisy and wobbly.

However, we are not interested in most of “close to each other noisy modes” in practice. When we work with such distributions, we tend to make the density smoother. We tend to reduce the impact of noise patterns, even if such patterns are integral parts of the original distribution. We tend to group each bunch of nearby local maxima into a single mode.

During basic distribution exploration, we are interested only in “major” modes that make sense to us from the practical point of view. Here I have some good news and some bad news. The bad news: we don’t have a strict mathematical definition, so we can’t verify and compare different approaches. The good news: we are free to choose our own definition, which will be optimal for our use cases.

During my research, I often ask my colleagues: “How many modes do you see in this plot?” Typically, I get identical answers for the same plot, which makes me think that there is a common understanding of this concept. In this post, we focus on such “practically significant” modes which often look obvious for most people.

The mvalue approach

To get a better understanding of the problem, we start with the analysis of the mvalue approach. It’s described by Brendan Gregg in Frequency Trails: Modes and Modality. It worth reading the original post, but I briefly describe the main idea. Basically, the mvalue is the total variation of a histogram or a density plot normalized by the plot height. The modal test compares the mvalue with a predefined threshold and makes a decision about multimodality. The easiest way to understand the idea is to just look at some examples.

Here is a visualization of a perfect unimodal distribution:

Here we have three local extrema: the left corner point, the central peak, and the right corner point. For each extremum, we should calculate the density value normalized by the global maxima. For the left and right corner points, this value equals zero. For the central peak, it equals one (because it’s the highest peak). The total variation is the sum of absolute differences between consecutive values. In this case, it equals (1.0 for ascent from the left corner point to the central peak) + (1.0 for descent from the central peak to the right corner point). Thus, the mvalue equals 2. It’s the minimum possible value of mvalue.

Here is a visualization of a perfect bimodal distribution:

The mvalue equals 4 (1 for ascent + 1 for descent + 1 for ascent + 1 for descent). It’s the default value of the etalon bimodal distribution.

Here is a visualization of a perfect trimodal distribution:

The mvalue equals 6 (1 for ascent + 1 for descent + 1 for ascent + 1 for descent + 1 for ascent + 1 for descent). It’s the default value of the etalon trimodal distribution.

You may think that we can get the number of modes by dividing mvalue by 2. Unfortunately, it’s not so simple. Could you guess the mvalue of the below plot?

It equals 4:

It’s hard to make an unequivocal conclusion about modality here, but it’s definitely not a bimodal distribution.

And what could you say about the next plot?

It equals 6:

This distribution is most probably unimodal. But because of the noise, the mvalue-based modal test thinks that it’s a trimodal distribution. Although mvalues allow detecting multimodal distributions in simple cases, they have some serious disadvantages:

Despite all these problems, this approach works and can be used in practice. I have been using it in BenchmarkDotNet for the past two years (it’s available since v0.10.14), but there are still cases when it doesn’t help to automatically make a reliable conclusion about multimodality. So, I decided to find another approach that would work better.

Introducing lowlands

Firstly, we will use the QRDE-HD instead of histograms and kernel density estimations. The approach provides a much better estimation of the shape of the collected data, and it doesn’t require parameter tuning. Now imagine that this function describes a mountain relief (side view):

Next, it’s starting to rain1. The rain fills the mountain hollows with water and forms a bunch of ponds. The area of the mountain is strictly below the ponds saturated with water. Now let’s introduce the following definitions:

For better understanding, let’s see how this classification works in the bimodal case:

We can do the following observations:

I hope that you got the idea. Now it’s time to formalize this schema.

Lowland detection algorithm

To find the number of modes and their locations for the given samples, we should do the following:

  1. Build the QRDE-HD
  2. Find all local maxima of the QRDE-HD except border points (peaks)
  3. Enumerate all the segments between neighboring peaks in ascending order of size
  4. For each segment, we try to fill it with “water.” The water level is determined by the lowest peak. If the water area is larger than the area under the water, we mark it as “lowland.” Once we find a lowland, we don’t touch it anymore: it can’t be flooded by a larger pond. If the water area is smaller than the area under the water, this pond can be merged with other ponds.
  5. Once we found all the lowland areas, we split the whole plot by them to not-lowland areas.
  6. In each not-lowland area, we choose the highest peek and mark them as a mode.

Extreme case

Now consider a case of a bimodal distribution where two modes are very close to each other but still strictly separated. Let’s take two huge samples from the uniform distribution $\{x_{1..1000}\}, \{y_{1..1000}\} \in \mathcal{U}(0, 1)$ and build a bimodal sample of the following form: $\{ -\delta - x^3_{1..1000}, +\delta +y^3_{1..1000} \}$. It contains two modes ($-\delta$ and $+\delta$), and it doesn’t include any sample elements between them. The distance between modes equals $2\delta$.

Let’s check out how the lowland detector works in such a case. We start with $\delta = 0.5$:

It wasn’t so hard to detect two modes here. Let’s try $\delta = 0.1$:

Not bad. Now let’s try an extreme case: $\delta = 0.01$.

We are still able to detect bimodality here! Note that it’s only possible thanks to the QRDE-HD. If we use classic histograms or kernel density estimations here, we will not be able to see multimodality:

More examples

Below you can find some additional cases of applying this algorithm to multimodal distributions.

Implementation notes

You can find a reference C# implementation of this algorithm in the latest nightly version (0.3.0-nightly.54+) of perfolizer (you need LowlandModalityDetector). As I said in the beginning, it doesn’t require any parameter tuning. It should just work without any adjustments. However, there are a few points of customization that you can use to adopt this algorithm to some very specific corner cases.

Conclusion

The suggested algorithm allows detection of multimodality phenomena based on the given sample. You can try to use it via perfolizer or implement it yourself. Here are some of the main advantages of this approach:

If you decide to try this algorithm on your data sets, I will be happy to get your feedback!


  1. Inspired by the problem “Rain” from XIV All-Russian Olympiad for schoolchildren in computer science a.k.a. ROI-2002 (Day 1, Problem 2). The problem description can be found here (In Russian). The problem solution can be found here (In Russian). ↩︎


References (5)