• Jourik Ciesielski

Five steps towards MT within the LSP walls

Neural machine translation has turned the language industry upside down since its introduction in 2017. It gained a critical mass in the localization industry and new business models around MT are constantly emerging.

From a buyer’s perspective, the sky’s the limit with MT as it opens a lot of multilingual doors; think about chatbots, emails, support articles, etc. Furthermore, translation buyers like the sound of using MT combined with post-editing to translate thousands of words at a faster pace and at a lower cost.

Things are a bit more complex from the language service provider’s perspective. LSPs must get their clients at the table and set realistic expectations in terms of content enablement, ROI and time-to-market. On the other hand, they need to engage and retain linguists with fair pay compensation schemes. In this article we’ll discuss five steps that LSPs need to take into account on the road to a successful machine translation program.

1. Define the business goals

The first step consists of selecting the types of content that are going to be processed with machine translation. There are two important content categories. Certain pieces of content are intended to inform or instruct (e.g. operation manuals) while other text types are supposed to convince or influence the reader (e.g. marketing materials). The closer a text is to the first category, the more suitable it is for MT. Texts belonging to the second category are often transcreated, which is not a good fit for MT.

Then it is important to determine why machine translation should be adopted in the process. MT is a useful resource when turn-around times have to be decreased, when high volumes of content must be processed with few resources available, or when content needs to be translated for a low priority market. The goal of implementing MT should never be to obtain a massive price reduction.

2. Select a provider

An important factor in the choice of a machine translation engine is the ability to train because the main asset of neural MT is at the same time its biggest danger: stock engines generate excellent generic quality, but might not score very well for specialized content. Trained MT engines use the appropriate tone of voice, apply the correct terminology and help to achieve better overall accuracy.

The current technology market has a lot of inventory which enables every LSP to select the right provider for their budget and/or business case. If you attach great value to quick support when something breaks down, you should go into business with a smaller organization like Globalese or Kantan. If you seek a secure on-premise setup, then Systran is an interesting option. If the focus is on Asian languages, it is worth having a look at oriental providers like Rosetta. Amazon, Google and Microsoft are classics because they offer great generic quality for a large number of language combinations. Intento is your weapon of choice if you wish to make use of multiple providers with a single API. Services like ModernMT and PangeaMT deserve to be followed as well.

3. Think about the pricing model

While translation management system (TMS) providers have made it very easy to make use of machine translation in CAT tools, there’s still a lot of uncertainty about pricing models that work for every stakeholder. Currently there are three methods for monetizing MT savings:

  • Effort-based (often referred to as “edit distance”) statistics have become quite common, but they don’t always represent the actual time and effort that is needed to post-edit a machine translation. Furthermore, the effort-based approach might entice post-editors to edit more than necessary.

  • Hourly rates are increasingly being applied, but the question is whether the localization industry is mature enough to come up with a fair hourly alternative for the traditional per-word rates.

  • A per-word rate still seems the way to go. Translation memory analysis results with fuzzy discounts are maintained and no matches are MT-discounted.

Note that pricing models based on a combination of different methods are also used, but mainly at larger LSPs.

One thing we know for sure is that post-editing will break with the tradition of determining prices before the start of a project. Effort-based compensation can only be calculated after a project is closed and at RWS, the largest LSP in the world, per-word discounts are granted based on TER scores and fuzzy match tables after processing the first 10,000 words in post-editing.

4. Conduct a pilot

It is crucial to execute a pilot to preview the results of using machine translation in production. The pilot consists ideally of a set of sample sentences for which human translations already exist. This enables LSPs to measure the distance between the machine-translated sentences compared to the reference translations, determine the post-editing effort and define the overall quality of the generated MT output. If the pilot reveals unanticipated problems, they can still be resolved before the engines go live.

Note that existing quality metrics like BLUE, hLEPOR or TER scores don’t necessarily have to be ignored, but those scores alone are not representative enough in a pilot.

5. Improve continuously

If the machine translation engines are ready to be deployed, you have to think about a maintenance strategy too. Every MT batch requires resources like glossaries to be reviewed while the actual MT engines have to be retrained on a regular basis. That way, all resources remain up-to-date ensuring that the benefits of using MT are retained.

The fact that engine training and maintenance is a continuous process reveals the rise of a new role in the localization industry: the machine translation specialist. Large LSPs like Acolad already have dedicated MT departments, and it is not inconceivable that every LSP will have one or more MT specialists over time.