Embracing model misspecification


When researchers focus on model design, they often worry whether the model is correct or not. I believe that we should accept the fact that all the models are wrong. The world is too complex to be captured by a single model: we are never able to acknowledge all the variables. Therefore, the answer to the question “Is the model correct?” is always “No”. It should not bother us: from the pragmatic perspective, it is irrelevant whether the model is correct or not. If we embrace the model misspecification, we can switch our attention to the question “What is the impact of deviations from the model on the decision-making?”

Recently, I was reading cerreia2020. I am still in the process of understanding the technical part, but I was charmed by the Introduction, so I want to share quotes I liked from this paper and referenced box1976 and chatfield1995.

Read more


Preprint announcement: 'Quantile-Respectful Density Estimation Based on the Harrell-Davis Quantile Estimator'


I have just published a preprint of a paper ‘Quantile-Respectful Density Estimation Based on the Harrell-Davis Quantile Estimator’. It is based on a series of my research notes.

The paper preprint is available on arXiv: arXiv:2404.03835 [stat.ME]. The paper source code is available on GitHub: AndreyAkinshin/paper-qrdehd. You can cite it as follows:

  • Andrey Akinshin (2024) “Quantile-Respectful Density Estimation Based on the Harrell-Davis Quantile Estimator” arXiv:2404.03835

Abstract:

Traditional density and quantile estimators are often inconsistent with each other. Their simultaneous usage may lead to inconsistent results. To address this issue, we propose a novel smooth density estimator that is naturally consistent with the Harrell-Davis quantile estimator. We also provide a jittering implementation to support discrete-continuous mixture distributions.

Read more


Lowland multimodality detection and jittering


In A better jittering approach for discretization acknowledgment in density estimation, I discussed the jittering approach that improves Quantile-Respectful Density Estimation for discrete distributions and continuous-discrete mixtures. In this post, I will show a brief example of how such an approach improves the accuracy of the Lowland multimodality detection.

Read more


Quantile-Respectful Density Estimation and Trimming


I continue the topic of Quantile-Respectful Density Estimation in the context of Multimodality Detection. In this post, we briefly discuss the handling of the QRDE boundary spikes in order to correctly detect the near-border modes.

Read more


A better jittering approach for discretization acknowledgment in density estimation


In How to build a smooth density estimation for a discrete sample using jittering, I proposed a jittering approach. It turned out that it does not always work well. It is not always capable of preserving the original distribution shape and avoiding gaps. In this post, I would like to propose a better strategy.

Read more


Effect Sizes and Asymmetry


Cohen’s d is one of the most popular measures of the effect size. Unfortunately, it was designed for the normal distribution, which may make it a misleading measure in the non-normal case. And the real distributions are never normal. When we discuss deviations from normality, we should treat the illusion of normality not as an atomic mental construction, but rather as a set of independent assumptions, each of which may be violated independently. In this post, I take a look at what kind of issues we may have when the symmetry assumption is heavily violated.

Read more


Pragmatic Statistics Manifesto


Statistics is one of the most confusing, controversial, and depressing disciplines I know. So many different approaches, so many different opinions, so many arguments, so many person-years of wasted time, and so many flawed peer-reviewed papers.

What we want from statistics is an easy-to-use tool that would nudge us toward asking the right questions and then straightforwardly guide us on how to design proper and relevant statistical procedures. What we have is a bunch of vaguely described sets of strange equations, a few arbitrarily chosen magical numbers as thresholds, and no clear understanding of what to do.

In the scientific community, there are a lot of adherents of Frequentist statistics (both Neyman-Pearson and Fisherian), Bayesian statistics, Likelihood statistics, Nonparametric statistics, Robust statistics, and many other statistics. And almost no one discusses Pragmatic statistics. I feel like we really need something which is called Pragmatic statistics. However, it should not be just a set of “blessed” approaches but rather a mindset.

Let me make an attempt to speculate on the principles that should form the foundation of the Pragmatic statistics approach. In future posts, I will show how to apply these principles to solve real-world problems.

Read more


The Effect Existence, Its Magnitude, and the Goals


If you are curious if something impacts something else, the answer is probably “yes.” Does that indicator depend on those factors? Yes, it does. If we change this thing, would it affect …? Yes, it would. If a person takes this pill, could it cause a non-exactly-zero change in the body? Yes, the presence of the pill is already a change that can always be detected with the right amount of effort.

One may argue that in some cases (assuming the list of specific cases is presented), zero effect does exist. For a moment, let us pretend that it is true. Now, let us imagine a parallel universe, which is the same as ours but with the presence of the effect. Unfortunately, the effect is so small that our tools are not sophisticated enough to detect it. Imagine being put into one of these worlds, but you don’t know which one. How do you determine the existence of the effect? Of course, you can improve the resolution of the measurement tools via new scientific discoveries, but with the current state of technology, the absence of the effect cannot be checked. Therefore, it is always safer to assume that the effect exists, keeping in mind that it can be negligible. Let us accept this assumption and continue if it is absolute truth.

Read more


Case Study: A City Social Survey


Imagine a city mayor considering a project offering to build parks in several neighborhoods. It can be a good budget investment since it can potentially increase the happiness level of the citizens. However, it is just a hypothesis: if parks do not impact happiness, it is worth considering other city renovation projects. It makes sense to perform a pilot experiment before spending the budget on all the parks. The mayor is thinking about the following plan: pick a random neighborhood, survey the citizens to measure their happiness, build a park, survey the citizens again, compare the survey results, make a decision about the further parks in other neighborhoods. Someone is needed to design the survey and draw the conclusion.

Let us explore possible approaches to perform such a study. These artificial examples are not guidelines but rather simplified illustrations of possible mindsets presented as lists of thoughts. In this demonstration, we mainly focus on the attitude to the research process rather than on the technical details. All the examples are based on real stories.

Read more


Simplifying adjustments of confidence levels and practical significance thresholds


Translation of the buisness goals to the actual parameters of the statistical procedure is a non-trivial task. The degree of non-triviality increases if we should adjust several parameters at the same time. In this post, we consider a problem of simultaneous choice of the confidence level and the practical significance threshold. We discuss possible pitfalls and how to simplify the adjusting procedure to avoid them.

Read more