Posts about Encodings

About UTF-8 conversions in Mono

This post is a logical continuation of the Jon Skeet's blog post “When is a string not a string?”. Jon showed very interesting things about behavior of ill-formed Unicode strings in .NET. I wondered about how similar examples will work on Mono. And I have got very interesting results.

Experiment 1: Compilation

Let's take the Jon's code with a small modification. We will just add text null check in DumpString:

using System;
using System.ComponentModel;
using System.Text;
using System.Linq;
class Test
    const string Value = "X\ud800Y";
    static void Main()
        var description = (DescriptionAttribute)typeof(Test).
            GetCustomAttributes(typeof(DescriptionAttribute), true)[0];
        DumpString("Attribute", description.Description);
        DumpString("Constant", Value);
    static void DumpString(string name, string text)
        Console.Write("{0}: ", name);
        if (text != null)
            var utf16 = text.Select(c => ((uint) c).ToString("x4"));
            Console.WriteLine(string.Join(" ", utf16));

Read more    Comments