Posts about Encodings

About UTF-8 conversions in Mono



This post is a logical continuation of the Jon Skeet's blog post “When is a string not a string?”. Jon showed very interesting things about behavior of ill-formed Unicode strings in .NET. I wondered about how similar examples will work on Mono. And I have got very interesting results.

Experiment 1: Compilation

Let's take the Jon's code with a small modification. We will just add text null check in DumpString:

using System;
using System.ComponentModel;
using System.Text;
using System.Linq;
[Description(Value)]
class Test
{
    const string Value = "X\ud800Y";
    static void Main()
    {
        var description = (DescriptionAttribute)typeof(Test).
            GetCustomAttributes(typeof(DescriptionAttribute), true)[0];
        DumpString("Attribute", description.Description);
        DumpString("Constant", Value);
    }
    static void DumpString(string name, string text)
    {
        Console.Write("{0}: ", name);
        if (text != null)
        {
            var utf16 = text.Select(c => ((uint) c).ToString("x4"));
            Console.WriteLine(string.Join(" ", utf16));
        }
        else
            Console.WriteLine("null");
    }
}

Read more    Comments