How ListSeparator Depends on Runtime and Operating System

Andrey Akinshin · 2020-05-20

This blog post was originally posted on JetBrains .NET blog.

In the two previous blog posts from this series, we discussed how socket errors and socket orders depend on the runtime and operating systems. For some, it may be obvious that some things are indeed specific to the operating system or the runtime, but often these issues come as a surprise and are only discovered when running our code on different systems. An interesting example that may bite us at runtime is using ListSeparator in our code. It should give us a common separator for list elements in a string. But is it really common? Let’s start our investigation by printing ListSeparator for the Russian language:

Console.WriteLine(new CultureInfo("ru-ru").TextInfo.ListSeparator);

On Windows, you will get the same result for .NET Framework, .NET Core, and Mono: the ListSeparator is ; (a semicolon). You will also get a semicolon on Mono+Unix. However, on .NET Core+Unix, you will get a non-breaking space.

The Mono approach

On Windows, it’s possible to fetch the ListSeparator value from the operating system’s regional settings. Unfortunately, there is no such option on Linux and macOS. So, how is this problem solved in Mono? The missing information about cultures is collected in advance using the locale-builder tool. Some of this data is filled in using unicode CLDR. The rest is hardcoded. Speaking of TextInfo (a class that contains the ListSeparator value), it’s defined in Patterns.cs:

var entry_te = Text[lcid];
var te = ci.TextInfoEntry;
te.ANSICodePage = entry_te[0];
te.EBCDICCodePage = entry_te[1];
te.IsRightToLeft = entry_te[2] == "1" ? true : false;
te.ListSeparator = entry_te[3];
te.MacCodePage = entry_te[4];
te.OEMCodePage = entry_te[5];

The lcid value (Language Code Identifier) for Russian is 1049 or 0x419. (The values for a number of other languages can be found here). The predefined values can also be found in Patterns.cs. Here is the corresponding entry for Russian:

{ 0x0419, new [] { "1251", "20880", "0", ";", "10007", "866" } },

Thus, entry_te[3] equals ";". That’s how Mono knows the ListSeparator value even if it’s not defined in the current operating system.

The .NET Core approach

Unfortunately, .NET Core doesn’t have predefined values for ListSeparator. There’s a fairly strange logic in the source code of .NET Core 3.1.3:

case LocaleString_ListSeparator:
// fall through
case LocaleString_ThousandSeparator:
    status = GetLocaleInfoDecimalFormatSymbol(locale, UNUM_GROUPING_SEPARATOR_SYMBOL, value, valueLength);
    break;

It looks like .NET Core always uses the ThousandSeparator value instead of ListSeparator on Linux and macOS. This doesn’t feel right, so we filed an issue: dotnet/runtime#536. Hopefully, this behavior will be improved in the future.

Practical recommendations

If you are using some CultureInfo properties that are not supported by one of your target operating systems, it’s better to provide some fallback values. Here is an example of how this problem has been solved in BenchmarkDotNet:

public static string GetActualListSeparator([CanBeNull] this CultureInfo cultureInfo)
{
    cultureInfo = cultureInfo ?? DefaultCultureInfo.Instance;
    string listSeparator = cultureInfo.TextInfo.ListSeparator;

    // On .NET Core + Linux, TextInfo.ListSeparator returns NumberFormat.NumberGroupSeparator
    // To work around this behavior, we patch empty ListSeparator with ";"
    // See also: https://github.com/dotnet/runtime/issues/536
    if (string.IsNullOrWhiteSpace(listSeparator))
        listSeparator = ";";

    return listSeparator;
}

Having fallback values in place prevents us from seeing unexpected issues when running our code, and helps us be sure we can safely run our code across multiple platforms without any surprises.

References