## Preprint announcement: 'Quantile-Respectful Density Estimation Based on the Harrell-Davis Quantile Estimator'

I have just published a preprint of a paper ‘Quantile-Respectful Density Estimation Based on the Harrell-Davis Quantile Estimator’. It is based on a series of my research notes.

The paper preprint is available on arXiv: arXiv:2404.03835 [stat.ME]. The paper source code is available on GitHub: AndreyAkinshin/paper-qrdehd. You can cite it as follows:

- Andrey Akinshin (2024) “Quantile-Respectful Density Estimation Based on the Harrell-Davis Quantile Estimator” arXiv:2404.03835

Abstract:

Read moreTraditional density and quantile estimators are often inconsistent with each other. Their simultaneous usage may lead to inconsistent results. To address this issue, we propose a novel smooth density estimator that is naturally consistent with the Harrell-Davis quantile estimator. We also provide a jittering implementation to support discrete-continuous mixture distributions.

## Lowland multimodality detection and jittering

In A better jittering approach for discretization acknowledgment in density estimation, I discussed the jittering approach that improves Quantile-Respectful Density Estimation for discrete distributions and continuous-discrete mixtures. In this post, I will show a brief example of how such an approach improves the accuracy of the Lowland multimodality detection.

Read more## Quantile-Respectful Density Estimation and Trimming

I continue the topic of Quantile-Respectful Density Estimation in the context of Multimodality Detection. In this post, we briefly discuss the handling of the QRDE boundary spikes in order to correctly detect the near-border modes.

Read more## A better jittering approach for discretization acknowledgment in density estimation

In How to build a smooth density estimation for a discrete sample using jittering, I proposed a jittering approach. It turned out that it does not always work well. It is not always capable of preserving the original distribution shape and avoiding gaps. In this post, I would like to propose a better strategy.

Read more## Effect Sizes and Asymmetry

Cohen’s d is one of the most popular measures of the effect size. Unfortunately, it was designed for the normal distribution, which may make it a misleading measure in the non-normal case. And the real distributions are never normal. When we discuss deviations from normality, we should treat the illusion of normality not as an atomic mental construction, but rather as a set of independent assumptions, each of which may be violated independently. In this post, I take a look at what kind of issues we may have when the symmetry assumption is heavily violated.

Read more## Pragmatic Statistics Manifesto

Statistics is one of the most confusing, controversial, and depressing disciplines I know. So many different approaches, so many different opinions, so many arguments, so many person-years of wasted time, and so many flawed peer-reviewed papers.

What we want from statistics is an easy-to-use tool that would nudge us toward asking the right questions and then straightforwardly guide us on how to design proper and relevant statistical procedures. What we have is a bunch of vaguely described sets of strange equations, a few arbitrarily chosen magical numbers as thresholds, and no clear understanding of what to do.

In the scientific community, there are a lot of adherents of
*Frequentist* statistics (both Neyman-Pearson and Fisherian),
*Bayesian* statistics,
*Likelihood* statistics,
*Nonparametric* statistics,
*Robust* statistics,
and many other statistics.
And almost no one discusses *Pragmatic* statistics.
I feel like we really need something which is called *Pragmatic* statistics.
However, it should not be just a set of “blessed” approaches but rather a mindset.

Let me make an attempt to speculate on the principles
that should form the foundation of the *Pragmatic* statistics approach.
In future posts, I will show how to apply these principles to solve real-world problems.

## The Effect Existence, Its Magnitude, and the Goals

If you are curious if something impacts something else, the answer is probably “yes.” Does that indicator depend on those factors? Yes, it does. If we change this thing, would it affect …? Yes, it would. If a person takes this pill, could it cause a non-exactly-zero change in the body? Yes, the presence of the pill is already a change that can always be detected with the right amount of effort.

One may argue that in some cases (assuming the list of specific cases is presented), zero effect does exist. For a moment, let us pretend that it is true. Now, let us imagine a parallel universe, which is the same as ours but with the presence of the effect. Unfortunately, the effect is so small that our tools are not sophisticated enough to detect it. Imagine being put into one of these worlds, but you don’t know which one. How do you determine the existence of the effect? Of course, you can improve the resolution of the measurement tools via new scientific discoveries, but with the current state of technology, the absence of the effect cannot be checked. Therefore, it is always safer to assume that the effect exists, keeping in mind that it can be negligible. Let us accept this assumption and continue if it is absolute truth.

Read more## Case Study: A City Social Survey

Imagine a city mayor considering a project offering to build parks in several neighborhoods. It can be a good budget investment since it can potentially increase the happiness level of the citizens. However, it is just a hypothesis: if parks do not impact happiness, it is worth considering other city renovation projects. It makes sense to perform a pilot experiment before spending the budget on all the parks. The mayor is thinking about the following plan: pick a random neighborhood, survey the citizens to measure their happiness, build a park, survey the citizens again, compare the survey results, make a decision about the further parks in other neighborhoods. Someone is needed to design the survey and draw the conclusion.

Let us explore possible approaches to perform such a study. These artificial examples are not guidelines but rather simplified illustrations of possible mindsets presented as lists of thoughts. In this demonstration, we mainly focus on the attitude to the research process rather than on the technical details. All the examples are based on real stories.

Read more## Simplifying adjustments of confidence levels and practical significance thresholds

Translation of the buisness goals to the actual parameters of the statistical procedure is a non-trivial task. The degree of non-triviality increases if we should adjust several parameters at the same time. In this post, we consider a problem of simultaneous choice of the confidence level and the practical significance threshold. We discuss possible pitfalls and how to simplify the adjusting procedure to avoid them.

Read more## Degrees of practical significance

Let’s say we have two data samples, and we want to check if there is a difference between them. If we are talking about any kind of difference, the answer is most probably yes. It’s highly unlikely that two random samples are identical. Even if they are, there are still chances that we observe such a situation by accident, and there is a difference in the underlying distributions. Therefore, the discussion about the existence of any kind of difference is not meaningful.

To make more meaningful insights, researchers often talk about statistical significance.
The approach can also be misleading.
If the sample size is large enough, we are almost always able to detect even a neglectable difference
and obtain a statistically significant result for any pair of distributions.
On the other hand, a huge difference can be declared insignificant if the sample size is small.
While the concept is interesting and well-researched, it rarely matches the actual research goal.
I strongly believe that we should *not* test for the nil hypothesis (checking if the true difference is *exactly* zero).

Here, we can switch from statistical significance to practical significance. We are supposed to define a threshold (e.g., in terms of minimum effect size) for the difference that is meaningful for the research. This approach has more chances to be aligned with the research goals. However, it is also not always satisfying enough. We should keep in mind that hypothesis testing often arises in the context of decision-making problems. In some cases, we can do exploration research in which we just want to have a better understanding of the world. However, in most cases, we do not perform calculations just because we are curious; we often want to make a decision based on the results. And this is the most crucial moment. It should always be the starting point in any research project. First of all, we should clearly describe the possible decisions and their preconditions. When we start doing that, we can discover that not all the practically significant outcomes are equally significant. If different practically significant results may lead to different decisions, we should define the proper classification in advance during the research design stage. The dichotomy of “practically significant” vs. “not practically significant” may conceal important problem aspects and lead to a wrong decision.

In this post, I would like to discuss the degrees of practical significance and show an example of how important it is for some problems.

Read more