NuGet2 and a DirectorySeparatorChar bug



In Rider, we care a lot about performance. I like to improve the application responsiveness and do interesting optimizations all the time. Rider is already well-optimized, and it's often hard to make significant performance improvements, so usually I do micro-optimizations which do not have a very big impact on the whole application. However, sometimes it's possible to improve the speed of a feature 100 times with just a few lines of code.

Rider is based on ReSharper, so we have a lot of cool features out of the box. One of these features is Solution-Wide Analysis which lets you constantly keep track of issues in your solution. Sometimes, solution-wide analysis takes a lot of time to run because there are many files which should be analyzed. Of course, it works super fast on small and projects.

Let's talk about a performance bug (#RIDER-3742) that we recently had.

  • Repro: Open Rider, create a new "ASP .NET MVC Application", enable solution wide-analysis.
  • Expected: The analysis should take 1 second.
  • Actual: The analysis takes 1 second on Windows and 2 minutes on Linux and MacOS.

Read more    Comments


Performance exercise: Division



In the previous post, we discussed the performance space of the minimum function which was implemented via a simple ternary operator and with the help of bit magic. Now we continue to talk about performance and bit hacks. In particular, we will divide a positive number by three:

uint Div3Simple(uint n)   => n / 3;
uint Div3BitHacks(uint n) => (uint)((n * (ulong)0xAAAAAAAB) >> 33);

As usual, it's hard to say which method is faster in advanced because the performance depends on the environment. Here are some interesting results:

Simple BitHacks
LegacyJIT-x86 ≈8.3ns ≈2.6ns
LegacyJIT-x64 ≈2.6ns ≈1.7ns
RyuJIT-x64 ≈6.9ns ≈1.5ns
Mono4.6.2-x86 ≈8.5ns ≈14.4ns
Mono4.6.2-x64 ≈8.3ns ≈2.8ns

Read more    Comments


Performance exercise: Minimum



Performance is tricky. Especially, if you are working with very fast operations. In today benchmarking exercise, we will try to measure performance of two simple methods which calculate minimum of two numbers. Sounds easy? Ok, let's do it, here are our guinea pigs for today:

int MinTernary(int x, int y)  => x < y ? x : y;
int MinBitHacks(int x, int y) => x & ((x - y) >> 31) | y & (~(x - y) >> 31);

And here are some results:

Random Const
Ternary BitHacks Ternary BitHacks
LegacyJIT-x86 ≈643µs ≈227µs ≈160µs ≈226µs
LegacyJIT-x64 ≈450µs ≈123µs ≈68µs ≈123µs
RyuJIT-x64 ≈594µs ≈241µs ≈180µs ≈241µs
Mono-x64 ≈203µs ≈283µs ≈204µs ≈282µs

What's going on here? Let's discuss it in detail.

Read more    Comments


Stopwatch under the hood



In the previous post, we discussed DateTime. This structure can be used in situations when you don't need a good level of precision. If you want to do high-precision time measurements, you need a better tool because DateTime has a small resolution and a big latency. Also, time is tricky, you can create wonderful bugs if you don't understand how it works (see Falsehoods programmers believe about time and More falsehoods programmers believe about time).

In this post, we will briefly talk about the Stopwatch class:

  • Which kind of hardware timers could be a base for Stopwatch
  • High precision timestamp API on Windows and Linux
  • Latency and Resolution of Stopwatch in different environments
  • Common pitfalls: which kind of problems could we get trying to measure small time intervals

If you are not a .NET developer, you can also find a lot of useful information in this post: mainly we will discuss low-level details of high-resolution timestamping (probably your favorite language also uses the same API). As usual, you can also find useful links for further reading.

Read more    Comments


DateTime under the hood



DateTime is a widely used .NET type. A lot of developers use it all the time, but not all of them really know how it works. In this post, I discuss DateTime.UtcNow: how it's implemented, what the latency and the resolution of DateTime on Windows and Linux, how the resolution can be changed, and how it can affect your application. This post is an overview, so you probably will not see super detailed explanations of some topics, but you will find a lot of useful links for further reading.

Read more    Comments


LegacyJIT-x86 and first method call



Today I tell you about one of my favorite benchmarks (this method doesn't return a useful value, we need it only as an example):

[Benchmark]
public string Sum()
{
    double a = 1, b = 1;
    var sw = new Stopwatch();
    for (int i = 0; i < 10001; i++)
        a = a + b;
    return string.Format("{0}{1}", a, sw.ElapsedMilliseconds);
}

An interesting fact: if you call Stopwatch.GetTimestamp() before the first call of the Sum method, you improve Sum performance several times (works only with LegacyJIT-x86).

Read more    Comments


Visual Studio and ProjectTypeGuids.cs



It's a story about how I tried to open a project in Visual Studio for a few hours. The other day, I was going to do some work. I pulled last commits from a repo, opened Visual Studio, and prepared to start coding. However, one of a project in my solution failed to open with a strange message:

error  : The operation could not be completed.

In the Solution Explorer, I had "load failed" as a project status and the following message instead of the file tree: "The project requires user input. Reload the project for more information." Hmm, ok, I reloaded the project and got a few more errors:

error  : The operation could not be completed.
error  : The operation could not be completed.

Read more    Comments


Blittable types



Challenge of the day: what will the following code display?

[StructLayout(LayoutKind.Explicit)]
public struct UInt128
{
    [FieldOffset(0)]
    public ulong Value1;
    [FieldOffset(8)]
    public ulong Value2;
}
[StructLayout(LayoutKind.Sequential)]
public struct MyStruct
{
    public UInt128 UInt128;
    public char Char;
}
class Program
{
    public static unsafe void Main()
    {
        var myStruct = new MyStruct();
        var baseAddress = (int)&myStruct;
        var uInt128Adress = (int)&myStruct.UInt128;
        Console.WriteLine(uInt128Adress - baseAddress);
        Console.WriteLine(Marshal.OffsetOf(typeof(MyStruct), "UInt128"));
    }
}

A hint: two zeros or two another same values are wrong answers in the general case. The following table shows the console output on different runtimes:

MS.NET-x86MS.NET-x64Mono
uInt128Adress - baseAddress 480
Marshal.OffsetOf(typeof(MyStruct), "UInt128")000

If you want to know why it happens, you probably should learn some useful information about blittable types.

Read more    Comments


RyuJIT RC and constant folding



Update: The below results are valid for the release version of RyuJIT.

The challenge of the day: which method is faster?

public double Sqrt13()
{
    return Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + Math.Sqrt(4) + Math.Sqrt(5) + 
           Math.Sqrt(6) + Math.Sqrt(7) + Math.Sqrt(8) + Math.Sqrt(9) + Math.Sqrt(10) + 
           Math.Sqrt(11) + Math.Sqrt(12) + Math.Sqrt(13);
}
public double Sqrt14()
{
    return Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + Math.Sqrt(4) + Math.Sqrt(5) + 
           Math.Sqrt(6) + Math.Sqrt(7) + Math.Sqrt(8) + Math.Sqrt(9) + Math.Sqrt(10) + 
           Math.Sqrt(11) + Math.Sqrt(12) + Math.Sqrt(13) + Math.Sqrt(14);
}

I have measured the methods performance with help of BenchmarkDotNet for RyuJIT RC (a part of .NET Framework 4.6 RC) and received the following results:

// BenchmarkDotNet=v0.7.4.0
// OS=Microsoft Windows NT 6.2.9200.0
// Processor=Intel(R) Core(TM) i7-4702MQ CPU @ 2.20GHz, ProcessorCount=8
// CLR=MS.NET 4.0.30319.0, Arch=64-bit  [RyuJIT]
Common:  Type=Math_DoubleSqrtAvx  Mode=Throughput  Platform=X64  Jit=RyuJit  .NET=Current  

Method | AvrTime | StdDev | op/s | ------- |--------- |---------- |------------- | Sqrt13 | 55.40 ns | 0.571 ns | 18050993.06 | Sqrt14 | 1.43 ns | 0.0224 ns | 697125029.18 |

How so? If I add one more Math.Sqrt to the expression, the method starts work 40 times faster! Let's examine the situation..

Read more    Comments


Unrolling of small loops in different JIT versions



Challenge of the day: what will the following code display?

struct Point
{
    public int X;
    public int Y;
}
static void Print(Point p)
{
    Console.WriteLine(p.X + " " + p.Y);
}
static void Main()
{
    var p = new Point();
    for (p.X = 0; p.X < 2; p.X++)
        Print(p);
}

The right answer: it depends. There is a bug in CLR2 JIT-x86 which spoil this wonderful program. This story is about optimization that called unrolling of small loops. This is a very interesting theme, let's discuss it in detail.

Read more    Comments



More posts in Russian

Subscribe: RSS Atom