NCTA, Translation

Understanding Machine-Translation Post-Editing (MTPE)

by Marco Díaz

Have you ever been asked to do machine translation post-editing (MTPE)? Do you understand how it works and what it entails? Do you know how to respond to client inquiries requesting this service and how much to charge? Have you noticed lately that many language services providers (LSPs) are moving away from human translation and increasing their MTPE requests as new machine translation (MT) technologies emerge and improve?

In February 2024, Carola Berger offered her insights and answers to these and many other questions in her two-part workshop, “MTPE In Theory and In Practice.” Carola F. Berger, PhD, CT, is an ATA-certified patent and technical translator, translating and post-editing between German and English. Carola has not only been post-editing machine-translated patents for a few years now, but she has also taken courses on MTPE and the inner workings of neural machine translation and even coded her own MT engine. In 2022, she co-authored a presentation on quality metrics for the MT Summit 2022 by AMTA (the American Machine Translation Association). She has given numerous presentations on artificial intelligence, MT, and post-editing for ATA, NCTA, and other organizations for translators.

In her workshop, Carola walked us through the history of MT, from rule-based MT in the 1950s to statistical MT in the 1980s and 90s to neural MT and generative AI in the present. She then explained the concept of what MT is and the inner workings of MT engines. Briefly, neural MT works by picking up the most likely path among countless parameters. The major takeaway from understanding these mechanisms is that computers do not think, and artificial intelligence is not intelligent, as it makes decisions based on data patterns. Furthermore, we ought not to confuse fluency with accuracy. The newer technologies can string together wonderful-sounding sentences, but what is their use if they are inaccurate or utter nonsense?

Carola pointed out the difference between full and light post-editing. The former is meant to be as close as possible to human translation, while the latter tolerates awkward-sounding sentences and even grammatical or spelling errors as long as the end text is “understandable.” She also explained that the final product involving MTPE should be equivalent to a text that has gone through the standard stages commonly known as translation, editing, and proofreading (TEP), in which MTPE replaces the T of translation; however, some LSPs are skipping on the E, and are only offering raw MT, PE, and P, which is far from being comparable in terms of quality.

She delved into the skillset for post-editing and the tools needed to master MTPE inside and out. Subject-matter expertise is necessary because the neural MT output reads so fluently that we might overlook the wrong terminology. Additionally, terminology management skills are essential because neural MT frequently mixes and matches and is notorious for inconsistent terminology. Another critical component is technology knowledge, especially CAT Tool functionality, including number checks, terminology and spelling checks, punctuation, units, tag checks, filters, and shortcuts, with supplementary external spelling and grammar checks. Lastly, although this might not always be required, it is beneficial to know MT error typology, quality metrics, and have the ability to give proper QA feedback to MT engineers.

Another key point in this webinar was that not every document suits MT. The most common suitable documents are repetitive texts like user manuals, formulaic texts (patents, certain legal documents, and financial reports), documents for internal use, and others whose purpose is providing an overview. In contrast, documents inadequate for MT are those with source text errors, creative texts (such as transcreation), texts that need to read exceptionally well in the target language, and documents whose format is not easily translatable or with complex layouts.

One of the most indispensable aspects of MTPE is knowing how much it increases our productivity because that will determine our post-editing speed (e.g., words per hour), which is a significant factor in estimating how much to charge for this service. Money matters, especially when running a business and making a living from language services. Carola explained that some MT providers claim that their products increase productivity by 50%, but in reality, it is only between 15% and 25%, and depends on the particular MT engine. She also shed light on the concept of editing distance, which measures how much you’ve edited, and recommended that we steer clear of this service as it is not proportional to the amount of work one needs to perform for proper editing.

Moreover, Carola illustrated the factors to consider when evaluating MT engines. First and foremost, we need to ask for a sample and edit it to determine the MT engine’s quality. We must also estimate the number of unedited segments or portions, repetitions, fuzzy matches, and tags and understand the complexity of the sentence structure. The more we ask for samples, the more likely it is to become standard practice in our field.

Finally, Carola indicated that generative AI technologies like ChatGPT pose many privacy concerns because they use all data for training (at least in free accounts; it is uncertain if this is the same in paid accounts). In addition, as neural MT bases its output statistically on large numbers of data, if the training data is biased regarding gender, culture, or other characteristics, the results will be biased.

In summary, I’m grateful to have attended this workshop because I gained the following insights:

  • Take everything with a grain of salt when it comes to MT and generative AI.
  • Lack of fluency does not necessarily indicate that a text is machine-translated.
  • Fluency should not be confused with accuracy.
  • Computers and AI lack critical thinking skills because their outputs are based on probability and patterns in data.
  • Now more than ever, we need experts rather than merely humans in the loop, especially regarding high-stakes texts.
  • MTPE is an entirely different, independent service from the traditional TEP and requires specific tools, skills, and other pricing parameters.
  • Not every document can be handled with MT.
  • Always consider the privacy concerns and biases from MT and generative AI.
  • MTPE is still not and most likely will never be at the level of a highly skilled human translator.

It dawns on me that these takeaways could be used as selling points or differentiators when offering our services if we prefer to stick to highly skilled human translation or to understand better what we need to know if we decide to test the waters of MTPE. How would you approach it?

Marco Díaz is an ATA-certified English-to-Spanish translator specializing in legal & business, health care, and technical translations. He is originally from Guatemala City and has lived in the United States for several years. He obtained a B.S. in Mechanical/Electrical Engineering at ITESM in Monterrey, Mexico, and a Specialized Certificate in Translation from the University of California San Diego Extended Studies. He currently serves as the president of ATISDA and the web administrator of the ATA Spanish Language Division.