BenchmarkDotNet v0.10.12

by Andrey Akinshin · 2018-01-15

BenchmarkDotNet v0.10.12 has been released! This release includes:

  • Improved DisassemblyDiagnoser: BenchmarkDotNet contains an embedded disassembler so that it can print assembly code for all benchmarks; it’s not easy, but the disassembler evolves in every release.
  • Improved MemoryDiagnoser: it has a better precision level, and it takes less time to evaluate memory allocations in a benchmark.
  • New TailCallDiagnoser: now you get notifications when JIT applies the tail call optimizations to your methods.
  • Better environment info: when your share performance results, it’s very important to share information about your environment. The library generates the environment summary for you by default. Now it contains information about the amount of physical CPU, physical cores, and logic cores. If you run a benchmark on a virtual machine, you will get the name of the hypervisor (e.g., Hyper-V, VMware, or VirtualBox).
  • Better summary table: one of the greatest features of BenchmarkDotNet is the summary table. It shows all important information about results in a compact and understandable form. Now it has better customization options: you can display relative performance of different environments (e.g., compare .NET Framework and .NET Core) and group benchmarks by categories.
  • New GC settings: now we support NoAffinitize, HeapAffinitizeMask, HeapCount.
  • Other minor improvements and bug fixes

Diagnosers

Diagnosers are helpers which print additional information about your benchmarks.

Improved DisassemblyDiagnoser

DisassemblyDiagnoser prints an assembly listing for your source code. We already had this feature, but we continue to improve it. Our goal is not just to provide a raw info about your code, but provide a comfortable way to explore the program internals. In v0.10.12, Adam Sitnik (@adamsitnik) implemented advanced support of labels for jump targets.

  • When user hovers over a label, the mouse cursor changes to pointer and label get’s highlighted
  • When user clicks a label, all usages gets highlighted
  • When user presses F3, we jump to next usage of given label

Demo:


Improved MemoryDiagnoser

MemoryDiagnoser show the memory traffic for each benchmark and the GC collection count for each generation. In this release, we improved accuracy and reduce the total time which you should spend to get the results.

New TailCallDiagnoser

@GeorgePlotnikov implemented TailCallDiagnoser which detects tail call optimizations and prints information about it. This feature should be useful for F# developers. Currently, it has some restrictions: it works only for x64 programs, and it’s Windows-only.

Demo:

[Diagnostics.Windows.Configs.TailCallDiagnoser]
[LegacyJitX86Job, LegacyJitX64Job, RyuJitX64Job]
public class Jit_TailCalling
{
    [Benchmark]
    public long Calc()
        => FactorialWithoutTailing(7) - FactorialWithTailing(7);

    private static long FactorialWithoutTailing(int depth)
        => depth == 0 ? 1 : depth * FactorialWithoutTailing(depth - 1);

    private static long FactorialWithTailing(int pos, int depth)
        => pos == 0 ? depth : FactorialWithTailing(pos - 1, depth * pos);

    private static long FactorialWithTailing(int depth)
        => FactorialWithTailing(1, depth);
}

TailCallDiagnosers prints the following lines:

// * Diagnostic Output - TailCallDiagnoser *
--------------------

--------------------
Jit_TailCalling.Calc: LegacyJitX64(Jit=LegacyJit, Platform=X64, Runtime=Clr)
--------------------

--------------------
Jit_TailCalling.Calc: LegacyJitX86(Jit=LegacyJit, Platform=X86, Runtime=Clr)
--------------------

--------------------
Jit_TailCalling.Calc: RyuJitX64(Jit=RyuJit, Platform=X64)
--------------------
Caller: <null>.<null> - <null>
Callee: BenchmarkDotNet.Samples.JIT.Jit_TailCalling.FactorialWithTailing - int64  (int32,int32)
Tail prefix: False
Tail call type: RecursiveLoop
-------------------

Better environment info

One of the most important parts of any performance report is the environment information. People should understand what kind of machine did you use for your benchmarks.

Irina Ananyeva (@morgan-kn) implemented a cool feature which displays the amount of physical CPU, logical cores, and physical cores (an example: 1 CPU, 8 logical cores and 4 physical cores). Now the environment info section looks like this (it works on Windows/Linux/macOS; .NET Framework/.NET Core/Mono):

BenchmarkDotNet=v0.10.12, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.192)
Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical cores and 4 physical cores
Frequency=2531249 Hz, Resolution=395.0619 ns, Timer=TSC
.NET Core SDK=2.0.3
  [Host] : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT
  Clr    : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2600.0
  Core   : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT
  Mono   : Mono 5.4.0 (Visual Studio), 64bit

Some people run benchmarks on virtual machines instead of real hardware, and it’s also an important fact. With a new feature by Łukasz Pyrzyk (@lukasz-pyrzyk), a special label (like VM=VirtualBox) will be automatically added to the result.

Better summary table

The summary table tries to help you understand performance data in a quick way. In the old versions of BenchmarkDotNet, you can mark a method as a baseline and get “scaled” performance values for all other methods. In v0.10.12 (thanks Marc Gravell (@mgravell) for the idea)), you can introduce several baselines in a class (if you are using the benchmark categories) or mark a job as a baseline (it allows evaluating the relative performance of different environments).

Let’s look at a few examples.

Example 1: Methods

You can mark a method as a baseline with the help of [Benchmark(Baseline = true)].

public class Sleeps
{
    [Benchmark]
    public void Time50() => Thread.Sleep(50);

    [Benchmark(Baseline = true)]
    public void Time100() => Thread.Sleep(100);

    [Benchmark]
    public void Time150() => Thread.Sleep(150);
}

As a result, you will have additional Scaled column in the summary table:

|  Method |      Mean |     Error |    StdDev | Scaled |
|-------- |----------:|----------:|----------:|-------:|
|  Time50 |  50.46 ms | 0.0779 ms | 0.0729 ms |   0.50 |
| Time100 | 100.39 ms | 0.0762 ms | 0.0713 ms |   1.00 |
| Time150 | 150.48 ms | 0.0986 ms | 0.0922 ms |   1.50 |

Example 2: Methods with categories

The only way to have several baselines in the same class is to separate them by categories. and mark the class with [GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory)].

[GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory)]
[CategoriesColumn]
public class Sleeps
{
    [BenchmarkCategory("Fast"), Benchmark(Baseline = true)]        
    public void Time50() => Thread.Sleep(50);

    [BenchmarkCategory("Fast"), Benchmark]
    public void Time100() => Thread.Sleep(100);
    
    [BenchmarkCategory("Slow"), Benchmark(Baseline = true)]        
    public void Time550() => Thread.Sleep(550);

    [BenchmarkCategory("Slow"), Benchmark]
    public void Time600() => Thread.Sleep(600);
}

Results:

|  Method | Categories |      Mean |     Error |    StdDev | Scaled |
|-------- |----------- |----------:|----------:|----------:|-------:|
|  Time50 |       Fast |  50.46 ms | 0.0745 ms | 0.0697 ms |   1.00 |
| Time100 |       Fast | 100.47 ms | 0.0955 ms | 0.0893 ms |   1.99 |
|         |            |           |           |           |        |
| Time550 |       Slow | 550.48 ms | 0.0525 ms | 0.0492 ms |   1.00 |
| Time600 |       Slow | 600.45 ms | 0.0396 ms | 0.0331 ms |   1.09 |

Example 3: Jobs

If you want to compare several runtime configurations, you can mark one of your jobs with isBaseline = true.

[ClrJob(isBaseline: true)]
[MonoJob]
[CoreJob]
public class RuntimeCompetition
{
    [Benchmark]
    public int SplitJoin() => string.Join(",", new string[1000]).Split(',').Length;
}

Results:

    Method | Runtime |     Mean |     Error |    StdDev | Scaled | ScaledSD |
---------- |-------- |---------:|----------:|----------:|-------:|---------:|
 SplitJoin |     Clr | 19.42 us | 0.2447 us | 0.1910 us |   1.00 |     0.00 |
 SplitJoin |    Core | 13.00 us | 0.2183 us | 0.1935 us |   0.67 |     0.01 |
 SplitJoin |    Mono | 39.14 us | 0.7763 us | 1.3596 us |   2.02 |     0.07 |

New GC Settings

BenchmarkDotNet allows configuring GC Settings for each job. Now we support a few additional settings: NoAffinitize, HeapAffinitizeMask, HeapCount. If you set them, the library generates app.config like this:

<configuration>
   <runtime>
      <GCHeapCount enabled="6"/>
      <GCNoAffinitize enabled="true"/>
      <GCHeapAffinitizeMask enabled="144"/>
   </runtime>
</configuration>

See the MSDN page for details.

Milestone details

In the v0.10.12 scope, 14 issues were resolved and 10 pull requests where merged. This release includes 42 commits by 9 contributors.

Resolved issues (14)

Merged pull requests (10)

Commits (42)

Contributors (9)

Thank you very much!