“Hybrid” is the way to go in machine translation systems. BY BARBARA GUGGEMOS
At the NCTA’s general meeting on May 1, a packed room heard three perspectives on current uses of machine translation (MT) and the role of professional translators in this rapidly evolving field.
Great enthusiasm for MT
According to Raymond Flournoy, Senior Program Manager for MT initiatives at Adobe Systems, Adobe currently uses unedited machine translation for internal purposes and certain customer-facing applications, and is in the process of integrating edited MT into the translation workflow for various languages. In the area of localization, expected benefits are lower cost, faster turnaround and an ability to cover more languages.
Adobe uses two different MT engines: one primarily “rule-based,” and the other primarily “statistical.” Rule-based MT uses sets of grammar rules and knowledge bases such as dictionaries to analyze source text and convert it into target text. Statistical MT, an approach developed in the 1990s, proposes translations based on statistical comparisons between the source text and a large database of previously translated material (i.e., large translation memories). According to Ray, most companies that use MT now try to combine the benefits of the two approaches in a “hybrid MT system.”
Adobe’s translation tasks differ greatly in regard to turnaround time and accuracy requirements. In cases where high speed is essential and expectations of accuracy are low—web browsing, chat translations, and email translations were cited as examples—Adobe has found that unedited machine translation is an acceptable alternative to no translation at all.
In 2009, Adobe conducted a pilot study to investigate the cost-effectiveness and efficiency of integrating post-edited machine translation into its localization workflow. Following encouraging initial results, Adobe decided to expand the pilot program. Post-edited MT is now used at Adobe for four languages, and plans are underway to add two more.
Adobe finds post-editors through language service providers. Most LSPs do not have an established process for tracking efficiency gains and setting discount rates for post-editing, so Adobe provides guidance. Usually after one week (or, approximately 10,000 words), it is possible to get a good idea of the quality of the MT output and the appropriate discount rate.
What Adobe has learned from its pilot studies:
- Productivity estimates can be overly optimistic. Time trials are misleading. Actual post-editing has lower productivity.
- Productivity varies from file to file.
- The industry needs a better way to establish post-editing rates.
- Post-editing requires more experience than anticipated. Post-editors must match the quality and style level of regular translations. New translators are not as good at it as experienced translators.
- Globalization Management Systems and MT engines were not built with the enterprise MT user in mind. Many bugs remain.
- MT works best on short sentences (< 20 words).
- MT is good for enforcing terminology consistency.
- Placeholder handling is a big time-saver.
- Some content is really boring and repetitive, and is ideally suited for MT.
Perils and pleasures
Ellen Fernandez, an English>German translator who specializes in manuals for medical devices, industrial machinery and IT, began her light-hearted discourse on the perils, pleasures and profits of post-editing by noting two important facts:
- Machine translation has greatly improved in recent years: According to a January 2009 entry in Corinne McKay’s blog (http://thoughtsontranslation.com/ ), when three ATA exam tests were run through an MT, one actually passed.
- Use of machine translation is already quite widespread: For example, almost all Microsoft articles in German are now MT products
According to Ellen, the essential prerequisite for a good post-editing experience is a good client. The client cannot use Google MT. In addition to good MT software, highly focused translation memories, dictionaries and rules, very specific to the client and to the type of text being translated, are essential for good MT output. If the client does not weed out incorrect and irrelevant entries from its translation memories and dictionaries, then the post-editor will be stuck with the boring, time-consuming (and unprofitable) task of correcting the same machine-generated mistranslations over and over.
Post-editing a machine translation is similar to working with a CAT program such as Trados. Typically, the spelling is great, and numbers do not need to be transposed. However, the grammar is often weird. For example, MT has problems with English verbs ending in –ing. In Ellen’s experience, the bad parts are usually very obvious and easily corrected.
For post-editing jobs, Ellen gets 55% of her base rate for regular translation work. She has found that it works out to the same rate of compensation because of the greater productivity: with a client who provides good MT material, it is possible to edit 1000 words per hour or 5000 words per day.
Ellen’s advice for translators who are considering post-editing work: Pick your clients carefully and be prepared to refuse a job if the client does not maintain its TM, dictionaries, etc.
Guidance for new MT owners
Andy Bell’s company, Welocalize, assists other companies in the introduction and management of MT systems. Andy considers Systran to be the best corporate MT program, but right out of the box, even a very good MT system has the linguistic intelligence of a bilingual 7th grader: it has trouble processing long, complex sentences, it doesn’t understand “corporate compliance,” and its vocabulary is limited.
In order to make an MT system work for a particular content, the MT owner must continue to invest in the system. Specifically, it must:
- Customize the MT engine to the planned content
- Simplify source documents (for example, by limiting the complexity and length of sentences)
- Post-edit the MT results
Sufficient investment in the first two items reduces post-editing costs.
The MT owner is responsible for “everything with respect to the final result,” including accuracy. The owner must maintain the asset (translation memories, glossaries, etc.), pay for reiteration, and provide good source material in the form of controlled language texts. Potential risks for the MT owner include legal risks (if the translation is inaccurate), additional time to market (if the MT doesn’t work), underbudgeting, and loss of credibility (brand damage if final translation results are poor).
The freelance post-editor must get appropriate training, provide feedback to clients, and understand the level of quality that is required. Poor MT output because of inadequate system maintenance poses a potential financial risk for post-editors (because of lower productivity).
NCTA thanks all three speakers for their interesting and informative presentations. BG