Kostiantyn's Blog

i18n and MessageFormat

Tea and cookies
Tea and cookies.
Photo by Adeolu Eletu on Unsplash

Recently, I have made a contribution to an open source project MessageFormat.net. The library implements MessageFormat ICU specification for .NET. I figure, this is a good opportunity to talk about how a standardized format is leveraged by programmers and translators to work together and localize their product.

And to show off my contribution, of course ;). Though, I will talk about the technical details of my contribution in a separate blog post.

Let's start with the basics first.

Table of Contents

  1. What is Message Format?
  2. Internationalization with MessageFormat
  3. What are Plural Rules?
  4. Working with translators
  5. Conclusion

What is Message Format?

As programmers, we format strings for a living. The output is the core value of the programs we write. ICU's MessageFormat defines a syntax of various patterns. These patterns make the program output different things depending on some variable.

Let's imagine that you are a programmer for some tea/coffee shop. Your task is to work on a cookie promotion. Who doesn't love some cookies with their beverages?

Firstly, you can just embed a variable in a string:

Would you like some cookies with your {drink}?

Now you can put an advertisement for buying extra cookies with one's drink. For example, your users might see a message Would you like some cookies with your tea?.

Internationalization with MessageFormat

Imagine, that you have received some feedback about the British version of the site. An anonymous user told you that "biscuits" would be more precise than "cookies" for that area. This is a localization concern, and MessageFormat is perfect for localization! Now you can just specify a different string for the GB version, without changing the template substitution logic.

"en-us": "Would you like some cookies...
"en-gb": "Would you like some biscuits...

It sounds like this is a piece of overengineering for just some minor variations in English, but imagine that you would need to create French, German, Ukrainian etc. versions of your site and this starts to make sense.

Well, you may say, that is not profoundly useful. Can't we just use some templating engine, like JS Express or C# Razor to do that? And you would be absolutely right!

What are Plural Rules?

The previous example is pretty simple, but most localization challenges aren't. Imagine, that you implemented a slider for the number of cookies a user might want. There you can specify that you want 1, 2, 3 or more cookies. Our localized string starts to get more complex, as we need to include numbers there too. An obvious approach to this problem would be - to include the variable:

"Would you like {count} cookies...

This approach would work fine for multiple cookies ( Would you like 2 cookies... ) but will break for 1 (Would you like 1 cookies).

This is a very common problem, and the rules for pluralization are vastly different for many languages. Thankfully, MessageFormat provides a way to solve this problem.

The plural operator asks you of the format for each kind of pluralization it knows. The engine will then determine which one to pick depending on the number provided.

A revised MessageFormat string would look like this:

"Would you like {count, plural, one {# cookie} few {# cookies} many {# cookies} other {# cookies} }...

What are these "one", "few", "many" and "other" things I included in my format string? These are the "kinds of pluralisations" that I mentioned earlier. They are officially called "categories" and they put all numbers into classes. English has two classes - "one" for single items, and "other" for others. Other languages, like Ukrainian, utilise more classes to fully describe their grammar rules for pluralisation.

Visit Plural Rules Table if you are interested which categories your language uses. I will talk about how these categories work on a more technical level in another blog post.

Working with translators

How would one approach distributing the work of translation to various languages? MessageFormat is pretty human-readable, so one can imagine the following approach.

  1. Define the localization string for your language:

"Would you like {count, plural, one {# cookie} few {# cookies} many {# cookies} other {# cookies}} with your {drink}?"

  1. Prepare a table with languages in rows and translations in columns.
Language Translation
en-us Would you like {count, plural, one {# cookie} few {# cookies} many {# cookies} other {# cookies}} with your {drink}?
en-gb
uk-ua
  1. Ask the translators to fill out the table by your example.
Language Translation
en-us Would you like {count, plural, one {# cookie} few {# cookies} many {# cookies} other {# cookies}} with your {drink}?
en-gb Would you like {count, plural, one {# biscuit} few {# biscuits} many {# biscuits} other {# biscuits}} with your {drink}?
uk-ua Чи хочешь ти {count, plural, one {# печеньку} few {# печеньки} many {# печеньок} other {# печеньок}} до твого {drink}?
  1. PROFIT!

Now, each translator can work on the translation for the language they are most proficient with. You, as a programmer, will just use these translations "as a black box", by substituting the variables without caring about the specifics of each language.

Conclusion

MessageFormat is a nice way to define your formatting logic separately from the code. This formatting logic is pretty smart - it can work with text and numbers, by applying language-specific rules to the the format of the numbers. This allows you to use the same code with various formatted strings - one for each language or locale. Furthermore, you can scale this approach by working with translators.