## BenchmarkDotNet v0.10.10

BenchmarkDotNet v0.10.10 has been released! This release includes many new features like Disassembly Diagnoser, ParamsSources, .NET Core x86 support, Environment variables, and more!

## Reflecting on performance testing

Performance is an important feature for many projects. Unfortunately, it’s an all too common situation when a developer accidentally spoils the performance adding some new code. After a series of such incidents, people often start to think about performance regression testing.

As developers, we write unit tests all the time. These tests check that our business logic work as designed and that new features don’t break existing code. It looks like a good idea to write some perf tests as well, which will verify that we don’t have any performance regressions.

Turns out this is harder than it sounds. A lot of developers don’t write perf tests at all. Some teams write perf tests, but almost all of them use their own infrastructure for analysis (which is not a bad thing in general because it’s usually designed for specific projects and requirements). There are a lot of books about test-driven development (TDD), but there are no books about performance-driven development (PDD). There are well-known libraries for unit-testing (like xUnit/NUnit/MSTest for .NET), but there are almost no libraries for performance regression testing. Yeah, of course, there are some libraries which you can use. But there are troubles with well-known all recognized libraries, approaches, and tools. Ask your colleagues about it: some of them will give you different answers, the rest of them will start Googling it.

There is no common understanding of what performance testing should look like. This situation exists because it’s really hard to develop a solution which solves all problems for all kind of projects. However, it doesn’t mean that we shouldn’t try. And we should try, we should share our experience and discuss best practices.

## Measuring Performance Improvements in .NET Core with BenchmarkDotNet (Part 1)

A few days ago Stephen Toub published a great post at the Microsoft .NET Blog: Performance Improvements in .NET Core. He showed some significant performance changes in .NET Core 2.0 Preview 1 (compared with .NET Framework 4.7). The .NET Core uses RyuJIT for generating assembly code. When I first tried RyuJIT (e.g., CTP2, CTP5, 2014), I wasn’t excited about this: the preview versions had some bugs, and it worked slowly on my applications. However, the idea of a rethought and open-source JIT-compiler was a huge step forward and investment in the future. RyuJIT had been developed very actively in recent years: not only by Microsoft but with the help of the community. I’m still not happy about the generated assembly code in some methods, but I have to admit that the RyuJIT (as a part of .NET Core) works pretty well today: it shows a good performance level not only on artificial benchmarks but also on real user code. Also, there are a lot of changes not only in dotnet/coreclr (the .NET Core runtime), but also in dotnet/corefx (the .NET Core foundational libraries). It’s very nice to watch how the community helps to optimize well-used classes which have not changed for years.

Now let’s talk about benchmarks. For the demonstration, Stephen wrote a set of handwritten benchmarks. A few people (in comments and on HackerNews) asked about BenchmarkDotNet regarding these samples (as a better tool for performance measurements). So, I decided to try all these benchmarks on BenchmarkDotNet.

In this post, we will discuss how can BenchmarkDotNet help in such performance investigations, which benchmarking approaches (and when) are better to use, and how can we improve these measurements.

## BenchmarkDotNet v0.10.7

BenchmarkDotNet v0.10.7 has been released. In this post, I will briefly cover the following features:

• Filters and categories
• Updated Setup/Cleanup attributes
• Better Value Types support
• Building Sources on Linux

## 65535 interfaces ought to be enough for anybody

It was a bright, sunny morning. There were no signs of trouble. I came to work, opened Slack, and received many messages from my coworkers about failed tests.

After a few hours of investigation, the situation became clear:

• I’m responsible for the unit tests subsystem in Rider, and only tests from this subsystem were failing.
• I didn’t commit anything to the subsystem for a week because I worked with a local branch. Other developers also didn’t touch this code.
• The unit tests subsystem is completely independent. It’s hard to imagine a situation when only the corresponded tests would fail, thousands of other tests pass, and there are no changes in the source code.
• git blame helped to find the “bad commit”: it didn’t include anything suspicious, only a few additional classes in other subsystems.
• Only tests on Linux and MacOS were red. On Windows, everything was ok.
• Stacktraces in failed tests were completely random. We had a new stack trace in each test from different subsystems. There was no connection between these stack traces, unit tests source code, and the changes in the “bad commit.” There was no clue where we should look for a problem.

## A bug story about named mutex on Mono

When you write some multithreading magic on .NET, you can use a cool synchronization primitive called Mutex:

var mutex = new Mutex(false, "Global\\MyNamedMutex");


You also can make it named (and share the mutex between processes) which works perfectly on Windows:

However, today the .NET Framework is cross-platform, so this code should work on any operation system. What will happen if you use named mutex on Linux or MacOS with the help of Mono or CoreCLR? Is it possible to create some tricky bug based on this case? Of course, it does. Today I want to tell you a story about such bug in Rider which was a headache for several weeks.

## InvalidDataException in Process.GetProcesses

Consider the following program:

public static void Main(string[] args)
{
try
{
Process.GetProcesses();
}
catch (Exception e)
{
Console.WriteLine(e);
}
}


It seems that all exceptions should be caught. However, sometimes, I had the following exception on Linux with dotnet cli-1.0.0-preview2:

\$ dotnet run
System.IO.InvalidDataException: Found invalid data while decoding.
at System.IO.StringParser.ParseNextChar()
at System.Diagnostics.ProcessManager.GetProcessInfos(String machineName)
at System.Diagnostics.Process.GetProcesses(String machineName)
at System.Diagnostics.Process.GetProcesses()
at DotNetCoreConsoleApplication.Program.Main(String[] args) in /home/akinshin/Program.cs:line 12


How is that possible?

## Why is NuGet search in Rider so fast?

I’m the guy who develops the NuGet manager in Rider. It’s not ready yet, there are some bugs here and there, but it already works pretty well. The feature which I am most proud of is smart and fast search:

Today I want to share with you some technical details about how it was implemented.

## NuGet2 and a DirectorySeparatorChar bug

In Rider, we care a lot about performance. I like to improve the application responsiveness and do interesting optimizations all the time. Rider is already well-optimized, and it’s often hard to make significant performance improvements, so usually I do micro-optimizations which do not have a very big impact on the whole application. However, sometimes it’s possible to improve the speed of a feature 100 times with just a few lines of code.

Rider is based on ReSharper, so we have a lot of cool features out of the box. One of these features is Solution-Wide Analysis which lets you constantly keep track of issues in your solution. Sometimes, solution-wide analysis takes a lot of time to run because there are many files which should be analyzed. Of course, it works super fast on small and projects.

• Repro: Open Rider, create a new “ASP .NET MVC Application”, enable solution wide-analysis.
• Expected: The analysis should take 1 second.
• Actual: The analysis takes 1 second on Windows and 2 minutes on Linux and MacOS.

## Performance exercise: Division

In the previous post, we discussed the performance space of the minimum function which was implemented via a simple ternary operator and with the help of bit magic. Now we continue to talk about performance and bit hacks. In particular, we will divide a positive number by three:

uint Div3Simple(uint n)   => n / 3;
uint Div3BitHacks(uint n) => (uint)((n * (ulong)0xAAAAAAAB) >> 33);


As usual, it’s hard to say which method is faster in advanced because the performance depends on the environment. Here are some interesting results:

SimpleBitHacks
LegacyJIT-x86≈8.3ns≈2.6ns
LegacyJIT-x64≈2.6ns≈1.7ns
RyuJIT-x64≈6.9ns≈1.5ns
Mono4.6.2-x86≈8.5ns≈14.4ns
Mono4.6.2-x64≈8.3ns≈2.8ns