Generic Translator for (.NET) Applications

It took the whole day, but I’ve finally made exactly what I needed to translate our product at DataWeb.cz:

screenshot

DataWeb.cz Translator 2.1.0.0

When it comes to translation of .NET applications, we can choose from the following options:

  • Write our own Translation tool from scratch
  • Use Microsoft’s proposed model of .resx files
  • Use some third party solution, either free or commercial
  • Hardcode all the translations

The usefulness of every model depends on its actual usage. I’ve not sorted this list in any way, but everyone might agree, that the the last solution will be the worst solution of all (if it is not a really tiny, single purpose application or school project just for one use only…). But what about the other?

When I worked recenly as developer employee, we in our company once decided to localize one product with .resx files. This was great … until the customer started to need Excel files as translation source (containing all target languages). Then all the localization becomes pure hell. Translators (especially Magyars) was rude on the Excel source and corrupted it often, sometimes the conversion into resource files did not work for unknown reason and there was also constant need for merging, updating and re-building the project.

From this experience and the previous with simpler tool I’ve developed for ImagingShop 1.32, I put togheter several requirements for a translation model, that would be optimal for us:

  • Must be really simple, so that any user would understand it without much learning
  • Source file and target localizations must be separated to enable non-programmers to create pluggable “language packs”
  • User would see translation source and any number of localizations he want side by side, so he can learn from two language versions to understand the context and write down the desired equivalent
  • Localization files should be human-readable text files allowing for manual corrections
  • Avoid merging operations and support incomplete translations seamlessly
  • Translation on-the-fly in application would be fast (search in O(log n) time, if possible)
  • The tool would be fool-proof and highlight missing equivalents

I wrote something that fullfills these requirements. This is our DataWeb.cz Translator. The Translator’s workflow consists of four simple steps in which the three available buttons in main window are used (in order from left to right):

  1. Open translation file (this is the file with original texts sorted in groups)
  2. Add one or more languages (either by choosing a new language from the list or opening the existing language file)
  3. Do the translation by filling cells like in Excel (adding by clicking the last row as well as in-place editing and deleting are supported)
  4. Save the translation (this will be discussed below)

Opening a source file just fills the two tables, just like adding a new language. When adding a brand-new language, user is prompted to choose from all the available culture-specific languages .NET knows. Specific culture means, that the language is described not only by the country, but even by a region that corresponds to certain dialect (e.g. Luxembourgh French or Caribbean English). These culture names are abbreviated in a well-specified way (e.g. fr-LU, en-029, cs-CZ).

The saving is little bit trickier. When saving a translation, the tool saves not only the source translation file, but also all the opened localizations in separate files. Let’s suggest we have opened a translation source:

MyShinyApp.xml

and then we added localization for Belgium Dutch and Punjabi. After saving the translation, three files are actually saved:

MyShinyApp.xml
MyShinyApp.nl-BE.xml
MyShinyApp.pa-IN.xml

The first file is just an updated source, while the two new files are localization files, that can be separated from the source and can be distributed solely. How actually the separation works? The application works with something called “translation items” which can be of two types: source and equivalent. Source items holds only the text information, while equivalents holds text and a code. The code is a hash code of the english source. I’ve decided to use MD5 hash code stripped to 64 bits and then converted to System.UInt64 number. This is much more failproof than simple System.String.GetHashCode() since this method varies depending on specific CLR implementation and return only System.Int32 number. Numeric hash has also advantage of fast searching in a System.Collections.Generic.SortedDictionary structure that holds text equivalents in our translation library internals.

The source file looks like this:

<?xml version="1.0" encoding="utf-8"?>
<Translation version="2.1.0.0">
  <TranslationGroup name="First Group">
    <TranslationItem text="snow" />
    <TranslationItem text="first" />
  </TranslationGroup>
  <TranslationGroup name="Second Group">
    <TranslationItem text="skin" />
    <TranslationItem text="second" />
  </TranslationGroup>
</Translation>

…and the localization file has this form (notice there are no original texts):

<?xml version="1.0" encoding="utf-8"?>
<Language version="2.1.0.0" code="1029">
 <TranslationItem code="8312450513718967627" text="kůže" />
 <TranslationItem code="14715597452973696560" text="první" />
 <TranslationItem code="17172999427304075786" text="sníh" />
</Language>

I still miss just two lovely features:

  • localization of the Translator itself
  • little crispy flag icons to visualise languages a little bit

If you like this app, send me an e-mail. This tool will be published as freeware alongside with ImagingShop package to allow users creating unofficial translations of our main product.

Tags: , , ,

This entry was posted on Wednesday, November 11th, 2009 at 04:53 and is filed under Jiné. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply