Posts about R

Normality is a myth

In many statistical papers, you can find the following phrase: "assuming that we have a normal distribution." Probably, you saw plots of the normal distribution density function in some statistics textbooks, it looks like this:

The normal distribution is a pretty user-friendly mental model when we are trying to interpret the statistical metrics like mean and standard deviation. However, it may also be an insidious and misleading model when your distribution is not normal. There is a great sentence in the "Testing for normality" paper by R.C. Geary, 1947 (the quote was found here):

Normality is a myth; there never was, and never will be, a normal distribution.

I 100% agree with this statement. At least, if you are working with performance distributions (that are based on the multiple iterations of your benchmarks that measure the performance metrics of your applications), you should forget about normality. That's how a typical performance distribution looks like (I built the below picture based on a real benchmark that measures the load time of assemblies when we open the Orchard solution in Rider on Linux):

Read more

Analyzing distribution of Mono GC collections

Sometimes I want to understand the GC performance impact on an application quickly. I know that there are many powerful diagnostic tools and approaches, but I'm a fan of the "right tool for the job" idea. In simple cases, I prefer simple noninvasive approaches which provide a quick way to get an overview of the current situation (if everything is terrible, I always can switch to an advanced approach). Today I want to share with you my favorite way to quickly get statistics of GC pauses in Mono and generate nice plots like this:

Read more