How to prepare a software string for internationalization

Jourik Ciesielski
Jan 13, 2021
6 min read

Updated: Feb 22, 2021

If you’re in the localization business, you’ll undoubtedly know that translation involves a lot more than transferring words from language A to language B. However, translation buyers aren’t always aware of the complications that could jeopardize the quality and the cost of a project. Software localization in particular has a couple of characteristics that differ from other types of localization. It’s the only discipline where traditional translation memory matches are subordinate to another match type, namely key-based (or ID-based) matches, and it breaks with the unwritten law of having a unique translation for each source string. In this article we’ll discuss best practices that apply to software developers as well as corporate localization teams and LSPs, and we’ll mirror the different steps to one specific translation management system (TMS) that we think is right for the job: XTM Cloud.

1. Get geared up before you start

If approved legacy data is available, you must build resources with it. If the data consists of structural files such as Java Properties, XLIFF or YAML, you can use it for so-called “structural” alignments in XTM Cloud. In a structural alignment, strings are matched based on their key. Aligned string pairs can subsequently be reused in the translation phase to pre-translate approved strings, lock them and isolate them from new strings.

If you want to give an extra dimension to your legacy data, you should have a look at XTM’s Inter-Language Vector Space (ILVS) model. This newly released AI-based technology offers algorithm-driven linguistic automation to increase the productivity of linguists and improve their user experience. It supports auto-placement of inline elements in XTM Workbench and allows to enrich structural alignments with spot-on bilingual terminology extractions. Inter-Language Vector Space is the first major implementation of neural network-based technology in the localization industry since neural machine translation in 2017. Excellent work.

2. Treat the source language as a target language

A serious contradiction. When software developers submit source strings, usually in English, they are most of the time not yet fit to be translated. We refer to strings in this state as “developer English” strings. It is highly recommended to engage a second pair of eyes for a thorough review of the source strings. The reviewer, preferably someone who knows the software inside out, is supposed to detect ambiguity, anticipate potential problems such as length issues and enrich strings with comments and screenshots. Additionally, strings can be adapted to a specific locale (United Kingdom, United States, etc.) in the review step. Reviewed strings or “customer English” strings have a dual purpose; they are the ones that will be released on the one hand, and they serve as source strings for the actual target languages on the other.

XTM Cloud allows to integrate a “pre-processing” step in the workflow. With this step, a project can be kicked off with a preparatory review step from English into English as if it was a source-target combination. When the pre-processing is done, the reviewed strings will automatically be leveraged as source strings in the translation step. The “developer English” vs. “customer English” segments will also be stored in a monolingual translation memory together with their corresponding keys so they can be reused in a continuous flow.

3. Don’t forget about pseudo translation

Pseudo translation is the process of replacing source characters with random characters to check if all translatable strings are imported properly into the TMS. It might pick up other issues such as characters that should or shouldn’t be escaped too.

Note that pseudo translation could be traded in for machine translation since an MT sample into an exotic language will reveal the above mentioned issues as well. Furthermore, if you have the technology to generate quality MT output, you could integrate machine translation combined with post-editing in the workflow to tackle cost and/or time issues.

Besides pseudo translation, XTM Cloud supports machine translation from 15+ providers including Google Translate, Microsoft Translator and SYSTRAN. One of the MT mechanisms to follow closely is inten.to. Intento is an AI-driven middleware provider that automatically selects the best MT provider for your source content in a given language pair with a single API call. It integrates 30+ providers, from Amazon and DeepL to Alibaba, ModernMT, Yandex and many more.

4. Be careful with ICU message syntax

Software files contain many technical particularities such as variables, embedded HTML, embedded JSON or character limitations. Those particularities need to be parsed correctly to avoid corrupting the strings. One of the biggest challenges in software localization, both from a linguistic and technical point of view, is pluralization. While certain languages have only one form for both singular and plural (e.g. Chinese), the translation of “%n items” differs in some Slavic languages if “%n” equals 2, or “%n” equals 3, etc.

In XML, pluralized entries are usually stored in dedicated strings. The quantity of a variable is defined in the key:

The accompanying key of every source string can subsequently be consulted in XTM’s Workbench:

In other file formats like JSON, pluralization is based on the ICU message syntax:

If your translation management system doesn’t support ICU messages, you’ll be forced to regex the daylights out of them, but the fact of the matter is that regular expressions might fall short when it comes to extensive ICU messages. XTM Cloud is one of the few translation management systems that is able to parse ICU plurals. It breaks the syntax down into multiple translation units to make sure that only the appropriate pieces of content get translated.

On a more critical note, we’d like to encourage TMS providers to give the ICU message syntax special attention because it can be used for more than just pluralization:

At C-Jay we think of it as a feature that could be supported similarly to how basically every TMS supports embedded HTML in a wide variety of file formats.

5. Provide sufficient context

Software strings are usually exported into structured data formats like JSON and XML, or plain text files such as iOS strings. Translation vendors and especially subject matter experts (SMEs) love to work in WYSIWYG (in-context) environments, but not a lot of translation management systems are equipped to generate those for the aforementioned file formats. Nevertheless, contextual information is crucial in software localization since strings usually consist of small text portions, which doesn’t make the life of translators easier. Every little bit helps.

XTM Cloud offers a couple of features that can be used to provide as much context as possible. One of the options that deserve some attention in this article is the possibility to attach segment ID images to a project. Those images, created by the source string reviewer in step #2 of this article, will be displayed in XTM Workbench if their names match the keys of particular strings:

6. Automate like there was no tomorrow

Automation sounds like music to the ears of translation buyers, localization teams and LSPs. Software localization is characterized by low word counts, tight deadlines and frequent sprints, which has a very negative impact on the profitability of a project. If you want to implement automation, you still need to invest money and time in it first, but those investments will pay for themselves quickly. The ultimate goal is to establish a concept called “continuous localization”. Note that automation is possible on different levels:

File preparation: If you need to perform specific actions on the source strings (search & replace operations, encoding adjustments, etc.) to make them ready for translation, make sure to gather those actions in a script that can be executed from e.g. the command line.
File import: If you manage your software strings in a version control system like GitHub, check whether you can establish a connection between the source repository and the TMS. XTM Cloud has many out-of-the-box connectors that allow to scan repositories for new or updated strings, push them to XTM Cloud for translation and send the translated strings back without any manual intervention.
Workflow management: It is absolutely crucial to add your translation vendors including allocation method into a dedicated user group and to organize your default workflow in a top-notch template. If those conditions are met, XTM Cloud can take over and distribute the work among the different stakeholders, push strings from one workflow step into another and take care of the necessary communication through automated email notifications.
Project administration: Small and frequent projects are a hard nut to crack, especially for LSPs. They require the same amount of project management time as any other project, which makes them extremely expensive to manage. If this sounds familiar, you should have a look at BeLazy. BeLazy is a middleware provider that connects translation management systems (like XTM Cloud), business management systems (like XTRF) and vendor portals. It fills the gaps in the supply chain that prevent full-cycle automation and enables what many companies seek to implement: continuous localization.