The Tool Kit is an online newsletter that comes to its subscribers’ mailboxes twice a month. In Translorial, we offer a quarterly digest of Jost’s most helpful tips from the past season. BY JOST ZETZSCHE © 2009 INTERNATIONAL WRITERS’ GROUP, COMPILED BY YVES AVÉROUS
GOOGLE CHROME SHINES
I always tend to use the software that I have just translated—after all, I know all the tricks once the translation is finished. Here are some things I recently learned that way about Google Chrome: My new favorite feature is a way to create stand-alone applications of web-based applications in Chrome. This means that you can run any website not within the tabbed browser- interface but in an interface that has nothing but the actual application. I really like this because it prevents you from accidentally closing an important application that you’re working in by closing your browser or browser tabs, and it lets you completely focus on your task. This is great for things like browser-based translation interfaces or many other important tasks for which it is not important to link continuously to other webpages.
Another likeable feature in Chrome is the ability to change interface languages on the fly (under Tools> Options> Under the Hood> Change font and language settings) or the versatility of its address bar which can be used as a search field as well.
THE RESOURCE-FULL TRANSLATOR
Recently, the Canadian government opened up TERMIUM Plus, the Translation Bureau’s terminology database, with nearly four million terms in English, French, and Spanish. My feeling is that most English <> French translators have had access to it in some way or another for a while, but now it’s free and open to all.
Here is another resource that has turned from so-so to fabulous overnight: the Translation Automation User Society (TAUS) Data Association’s search engine. The TAUS Data Association or TDA is, like its name says, an association of mostly large corporate translation buyers who originally came together to pool their translation memory data to better train their machine translation engines.
They decided to open the data up to the public—not as Translation memory (TM) data, but as a terminology resource. If you want to get to the data as TM data, you have to become a TDA member and contribute your own data. However, since this is financially out of reach for many of us, we can at least use it as a terminology resource.
In the latest incarnation of the TDA search engine, it is now possible to filter the data not only by language combination and broad industry sectors, but also by owner (originating company) and con- tent type (user documentation, software, websites, etc.) The companies that so far have contributed to the total of almost one billion words are ABBYY, Adobe, Avocent, Dell, eBay, EMC, Intel, McAfee, PTC, Sun, and Sybase. Lionbridge, SDL, and Moravia have also contributed their own translated website and marketing materials (as far as I can tell) and ProMT has contributed train- ing material from its machine translation engine. Materials from the United Nations and the European Union are also included. While this list is impressive, it is only a small part of what you will eventually find in this database.
The next improvement is that the engine now includes some subsegmenting capabilities so that it is able to identify the matching term in the target segment and highlight it. It will also list likely translated terms with a probability rating at the top of the search window. Pretty cool.
There is one thing that I find maybe even cooler, though: you can download a “widget,” a little Java-based application that lets you do all these searches right from your desktop. If you close the widget it will remember your last settings so you don’t have to modify them again when you reopen it, and the search is blazingly fast.
I was really struck this past week when I realized that I had done most of my translations with the aid of resources like the TAUS Search rather than specialized dictionaries. It was not that my projects last week were so generic, but both of these large-scale tools provide enough intelligent information and data to make them highly usable even for very specialized searches. Why is this relevant? Because I think that we have entered a new era of data availability that is going to and has already changed the way we work.
TRANSLATOR TOOLKIT, REBORN
Recently, Google released a new version of its Translator Toolkit with 37 source languages (all the ones you would expect, plus possibly less widespread languages like Croatian and Yiddish) and more than 400 target languages and regional variations (here are some “K” entries: Kabyle, Kachin, Kalaallisut, Kalmyk, Kannada, and Kanuri). Also, the interface is now trans- lated into 36 languages.
This is what the official Google blog posting says about this: “At Google, we’re focusing on how Translator Toolkit can help preserve and revitalize small and minority languages. Minority languages, also called regional, indigenous, heritage or threatened languages, are languages spoken by the minority people in one locale in a sovereign state or country. Were these endangered languages to become extinct, it would mean an immeasurable loss of knowledge, culture and way of life to minority people worldwide. For this project we worked with Dr. Te Taka Keegan, a Māori language activist and senior lecturer in computer science at the University of Waikato who spent much of his career on how technology can assist in minority language revitalization.” This is undoubtedly a worthwhile project. Google is also open about its goal of using our translations to better its own machine translation—and that is something that you need to be aware of when you use this tool. JS