How ListSeparator Depends on Runtime and Operating System
This blog post was originally posted on JetBrains .NET blog.
In the two previous blog posts from this series, we discussed how socket errors and socket orders depend on the runtime and operating systems. For some, it may be obvious that some things are indeed specific to the operating system or the runtime, but often these issues come as a surprise and are only discovered when running our code on different systems.
An interesting example that may bite us at runtime is using ListSeparator
in our code. It should give us a common separator for list elements in a string. But is it really common?
Let’s start our investigation by printing ListSeparator
for the Russian language:
Console.WriteLine(new CultureInfo("ru-ru").TextInfo.ListSeparator);
On Windows, you will get the same result for .NET Framework, .NET Core, and Mono: the ListSeparator
is ;
(a semicolon). You will also get a semicolon on Mono+Unix. However, on .NET Core+Unix, you will get a non-breaking space.
The Mono approach
On Windows, it’s possible to fetch the ListSeparator
value from the operating system’s regional settings. Unfortunately, there is no such option on Linux and macOS. So, how is this problem solved in Mono?
The missing information about cultures is collected in advance using the locale-builder tool. Some of this data is filled in using unicode CLDR. The rest is hardcoded. Speaking of TextInfo
(a class that contains the ListSeparator
value), it’s defined in Patterns.cs:
var entry_te = Text[lcid];
var te = ci.TextInfoEntry;
te.ANSICodePage = entry_te[0];
te.EBCDICCodePage = entry_te[1];
te.IsRightToLeft = entry_te[2] == "1" ? true : false;
te.ListSeparator = entry_te[3];
te.MacCodePage = entry_te[4];
te.OEMCodePage = entry_te[5];
The lcid
value (Language Code Identifier) for Russian is 1049 or 0x419. (The values for a number of other languages can be found here).
The predefined values can also be found in Patterns.cs. Here is the corresponding entry for Russian:
{ 0x0419, new [] { "1251", "20880", "0", ";", "10007", "866" } },
Thus, entry_te[3]
equals ";"
. That’s how Mono knows the ListSeparator
value even if it’s not defined in the current operating system.
The .NET Core approach
Unfortunately, .NET Core doesn’t have predefined values for ListSeparator
. There’s a fairly strange logic in the source code of .NET Core 3.1.3:
case LocaleString_ListSeparator:
// fall through
case LocaleString_ThousandSeparator:
status = GetLocaleInfoDecimalFormatSymbol(locale, UNUM_GROUPING_SEPARATOR_SYMBOL, value, valueLength);
break;
It looks like .NET Core always uses the ThousandSeparator
value instead of ListSeparator
on Linux and macOS. This doesn’t feel right, so we filed an issue: dotnet/runtime#536. Hopefully, this behavior will be improved in the future.
Practical recommendations
If you are using some CultureInfo
properties that are not supported by one of your target operating systems, it’s better to provide some fallback values. Here is an example of how this problem has been solved in BenchmarkDotNet:
public static string GetActualListSeparator([CanBeNull] this CultureInfo cultureInfo)
{
cultureInfo = cultureInfo ?? DefaultCultureInfo.Instance;
string listSeparator = cultureInfo.TextInfo.ListSeparator;
// On .NET Core + Linux, TextInfo.ListSeparator returns NumberFormat.NumberGroupSeparator
// To work around this behavior, we patch empty ListSeparator with ";"
// See also: https://github.com/dotnet/runtime/issues/536
if (string.IsNullOrWhiteSpace(listSeparator))
listSeparator = ";";
return listSeparator;
}
Having fallback values in place prevents us from seeing unexpected issues when running our code, and helps us be sure we can safely run our code across multiple platforms without any surprises.