How close are we to  the reality of a Star Trek-type of Universal Communicator?  The U.S. Government is making a serious effort to get us there.   BY HANY FARAG

In the December 2007 issue of Translorial, Paula Dieli, in her article about Machine Translation (MT), concluded that MT is no longer the funny substitution of words in one language for words in another and that MT, according to Google, is based on a data-driven approach. MT also combines linguistic typology, phrase recognition, translation of idioms and isolation of anomalies. However MT does not have to be limited to one form of implementation centered on documents and word files. If it is combined with a system that converts natural speech of one language into text, with the output text fed into MT for translation to another language, and the translated text is then converted back into speech, we will have an automated speech translation system. This is, as I prefer to call it, an Interpreter Machine (IM).   

Automated speech translation, as seen in films like Star Wars, is interesting but fictitious. So why do linguists need to know about the Interpreter Machine? → continue reading

Googling Machine Translation

By Paula Dieli

Mention the words “machine translation,” and a translator’s thoughts will range from job security to the ridiculously funny translations we’re able to produce with so-called online translation tools. Should we be worried that machines will take over our jobs? Paula Dieli thinks not, and explains why in this report.

I recently attended a presentation on “Challenges in Machine Translation,” sponsored by the International Macintosh Users Group (IMUG), at which Dr. Franz Josef Och, Senior Staff Research Scientist at GoogleResearch, presented some of the challenges Google is facing in its machine translation (MT) research, and how some of these challenges are being addressed. Excitement about successes in machine translation research initially came to a head back in 1954 with a report in the press regarding the Georgetown University/IBM experiment which had used a computer to translate Russian into English. Since then, over the past 50 years, we have continued to read about the great advances that will be possible in “the next 20 years,” but these great advances never came to pass. When the Internet came of age, online translation tools surfaced and we translators amused ourselves by seeing what crazy translations we could come up with by entering seemingly simple phrases.

The linguistics of MT

So why did the research never produce anything really viable? It was based on a linguistic approach; that is, an analysis of the structure of a language followed by an attempt to map it into machine language such that one could input a source language text and out would come a wonderful translation in the target language, albeit with a few minor errors. As we all know, a language is filled with so many cultural, contextual, idiomatic, and exceptional uses that this task became virtually impossible, and no real progress has been made with this approach in the past 50 years.

Dr. Geoffrey Nunberg, Adjunct full professor at UC Berkeley, linguist, researcher, and consulting professor at Stanford University, had this to say at a recent NCTA presentation: “I asked a friend of mine, who is the dean of this [MT] field, once, ‘if you asked people working in machine translation how long it will be until we have perfect, idiomatic machine translation of text …?’, they would all say about 25 years. And that’s been a constant since 1969.”

The data-driven approach

In recent years, MT researchers have begun to take a different approach, which can be loosely compared to the work you do as a translator when you use a tool such as SDL Trados WinAlign or Translator’s Workbench. That is, you use a data-driven methodology. As you translate, you store your translations in a translation memory (TM), so that if that same or a similar translation appears again, the tool will notify you and let you use that translation as is, or modify it slightly to match the source text. The more you translate similar texts in a particular domain, the more likely it is that you will find similar translations already in your TM.

Similarly, if before you began to translate a weekly online newsletter of real estate announcements, for example, you searched the Internet for already existing translations in your language pair and then aligned them and input them, via WinAlign, into your TM, you might find that much of the work had already been done for you. Imagine now if you were to input 47 billion words worth of these translations. Your chances of being able to “automatically” translate much of your source text would certainly increase. This is the approach that Google is taking.

Google’s goal, as stated by Dr. Och, is “to organize the world’s information and make it universally accessible and useful.” Now before you go thinking you’re out of a job, their data-driven approach has proven successful only for certain language pairs, and only in certain specialized domains. They have achieved success in what they call “hard” languages, that is from Chinese to English, and from Arabic to English in domains such as blogging, online FAQs, and interviews by journalists.

Dr. Och reported that their reasons for progress were due to “learning from examples rather than from a rule-based approach.” He admits that “more data is better data.” He went on to say that adding 2 trillion words to their data store would result in a 1 percent improvement for specific uses such as the ones described above. They see a year-to-year improvement of 4 percent by doubling the amount of data in their data store, or “corpus.” The progress reported by Dr. Och is supported by a study conducted by the NIST (National Institute of Standards and Technology) in 2005. Google received the highest BLEU (Bilingual Evaluation Understudy) scores using their MT technology to translate 100 news articles in the language pairs mentioned above. A BLEU score ranges from 0 (lowest) to 1 (highest) and is calculated by comparing the quality of the target segments with their associated source segments (a penalty is applied for short segments since that artificially produces a higher score).

Challenges and limitations

So what are the limitations of this data-driven approach? When asked by a member of the audience if Google’s technology could be used to translate a logo, Dr. Och instantly replied that such a translation would require a human translator. It’s clear that Google’s approach handles a very specific type of translation. Similar data-driven MT implementations can be used to translate highly specialized or technical documents with a limited vocabulary which wouldn’t be translated 100 percent correctly, but which would be readable enough to determine whether the document is of interest. In that case, a human translator would be needed to “really” translate it.

The Google approach described above deals with a tremendous amount of data and a very targeted use. It works only for some languages—German, for example, has been problematic—and in order to improve in more than just small increments, human intervention is required to make corrections to errors generated by this approach. One example that Dr. Och provided—the number “1,173” was consistently incorrectly translated into the word “Swedes”—confirms that a machine can’t do it all.

And if you think for a minute about the amount of Internet-based data being generated on just an hourly basis, it’s great to have machines around to handle some of the repetitive (read: uninteresting) work, and let us translators handle the rest. That still leaves plenty of work for us humans.

Alternative technologies

There are other approaches to MT, including example-based technology, which relies on a combination of existing translations (such as you have in your translation memory) along with a linguistic approach that involves an analysis of an unmatched segment to a set of heuristics, or rules, based on the grammar of the target language. Some proponents of this approach concede that large amounts of data would be needed to make this approach successful, and have all but abandoned their research. Once again, we can see that any approach that relies even partially on linguistics has not met with a reasonable level of success.

Other advances occurring in the MT arena include gisting and post-editing. MT can be used successfully in some settings where the gist of a document is all that is needed in order to determine if it is of enough interest to warrant a human translation. There are also MT systems on the market that produce translations that require post-editing by human translators who spend (often painful) time “fixing” these translations, correcting the linguistic errors that such a system invariably produces. While this may not be the translation work you’re looking for, I know of at least one large translation agency that provides specific training for this type of post-editing to linguists willing to do this kind of work. This is another example that shows that while machines play a part, there is still a role for human translators in the overall process.

Still other advancements include the licensing of machine translation technology based on a data-driven approach, which can be tailored to work with existing translations and terminology databases at a specific company. As with the Google solution, such technologies typically work on a limited set of languages. However, if they can help translate some of the less interesting, repetitive information out there, with more information being produced at a continually increasing rate, have no fear; there will still be plenty of work for human translators to do!

The road ahead

Where does that leave us? From the typewriter to word processors to CAT (Computer-Assisted or Computer-Aided Translation) tools and the pervasiveness of the Internet, our livelihood has been transformed, in a positive way. We are more productive and able to work on more interesting translations than ever before.

I encourage you to embrace technology; understand how it is helping to make information accessible, and learn how technology can help translators do the work that only humans can do.

more information

The calendar of the International Macintosh User Group (IMUG) upcoming presentations can be found at

You can get the official results of the 2005 Machine Translation Evaluation from the National Institute of Standards and Technology (NIST) at

Tiziana Perinotti Takes a Global Perspective

By Anna Schlegel

Tiziana Perinotti is the founder of TGP Consulting and creator of the award-winning Silicon Valley Localization Forum website and services. She has over 15 years of successful software development and product marketing experience with companies such as Olivetti, Microsoft, PowerUp! (acquired by The Learning Company), Radius, Verity, and Palm Computing, to name a few.

In 1996, she founded TGP Consulting and helped the original founders of Palm Computing (also founders of Handspring) develop what it is now the very successful Palm handheld. Tina has developed and offered training and courseware material for end enterprises as well as freelance translators and translation companies.

Where did you grow up, and when did you come to the U.S.?
I was born and raised in Turin, northwest of Milan, near the Italian Alps. I developed a desire to move to the U.S.—Silicon Valley, in particular—when I was in college. So, as a student in Turin, I decided to travel and take communication and other computer summer courses in the U.S. during my visits to an old uncle who used to live in Pittsburgh.

How did you start in the localization field?
Right after my Computer Science degree and Masters in Linguistics, I was recruited by Olivetti, the large computer conglomerate located in Ivrea, near Turin, Italy. At the time, Olivetti was very active in the research field of software office automation, not just for the stylish typewriters the company was manufacturing, but also for the first PC lines.
There was a need to localize Olivetti Italian hardware and software products into English-ready products for all English-speaking markets around the world. In 1997—when the joint venture/OEM project between Olivetti and Microsoft was established—I was sent to Microsoft headquarters in Redmond, Washington, to work on Windows 2.0 and the Windows version for the first 386 machines. We developed all the device drivers for Olivetti that were included in Windows and completed the first localized versions (Dutch, French, German, Italian, Portuguese, Spanish, and Swedish).

What localization challenges do corporations face today?
The mantra “fast and cheap” localization has reached new levels, and the challenge is how to add “quality” to that equation in a process that has outsourced all skills, including engineering, testing, management, and customer support. Cultural and communication barriers among the product team members located in very different locales are another big challenge as well as a lack of training for IT, engineering, customer support, marketing/sales, and project management staff to be able to operate at the best of their abilities in a stressful, multi-cultural environment under strict deadlines.

What are the new trends you see in localization?
Because of the new challenge of introducing products less expensively, more and more localizers are relying on machine translation tools, online terminology tools, and project management tools to expedite the localization process and achieve consistency. Localization has also expanded beyond the traditional computer and electronics industry; for example, biotech, pharmaceutical, medical device companies, and the government are in need of more localization.

How does English influence other language localization?
In the U.S., in general, my experience has been that corporations tend to be biased towards the English language. Products still tend to be first architected and developed in an English context, before they go through some internationalization process. Part of the problem is that we ask engineering to make certain product development decisions that would be better made by professionals who have the training and experience of designing for a global audience. The outcome of this approach may be a poorly localized product and unsatisfied customers who are forced to use an English-based product with a translated user interface that is less than optimal for them. This is an obvious cost to the company in terms of missed sales revenues and market opportunities.

Have you experimented with machine translation?
Yes, since the very beginning of my career I have used and tested many tools and systems, from the most sophisticated to the very basic ones. I am very pleased with the progress and advancement in this field, and other areas such as voice recognition and search and retrieval engines; all the signs are there that these tools will become better and better and employed in more aspects of our life.

What would you like to see changed in localization?
The mentality, meaning that when corporations need to cut their budgets, one of the first things they drop off their priority list is internationalization and localization. That’s a symptom of not understanding the investment opportunity and added value of the internationalization and localization product cycles.