Business

THE TRANSLORIAL TOOL KIT

The Tool Kit is an online newsletter that comes to its subscribers’ mailboxes twice a month. In Translorial, we offer a quarterly digest of Jost’s most helpful tips from the past season.
BY JOST ZETZSCHE © 2013 INTERNATIONAL WRITERS’ GROUP, COMPILED BY YVES AVÉROUS

tl_35-1_web_page20_image48 (376x199) (376x199)

The keyboard layout jungle
Multilingual computing has always faced the challenge of different input methods and keyboard layouts for different languages. This will continue to be a challenge, with new solutions cropping up here and there. For Windows users, Microsoft has been remarkably good at offering built-in keyboards for more than 120 languages and the ability to extend or compose your own language keyboard with the Microsoft Keyboard Layout Creator.

With much computing now happening on mobile platforms, there is an increasing need for other input methods that are not bound to a single operating system. Google has recently multiplied its offering of language keyboards by adding no fewer than three additional Chinese input methods, among others. Some of these can be downloaded and integrated into various operating systems (including Windows); others are available only within different Google tools, in particular Google Translate, which adds an input icon in the source language field that provides access to various keyboards or input methods.

Some of you use the US-International keyboard for Windows when dealing with multiple Western languages, which gives you access to a number of special characters that are not part of the English alphabet. Moshe Devere just pointed me to another keyboard that offers most, if not all, of the additional US-International keyboard characters plus characters with breve (˘), inverted breve ( ̑ ), dot below and above (·), and the macron (¯) as diacritical marks. While interviewing folks for Found in Translation, we encountered individuals who struggle with those diacritics, in particular when it comes to languages like Māori or Yorùbá.

You can download versions of the Alt-Latin keyboard right here. These will easily integrate into your system (both Windows and Macintosh) and you can install them with the normal keyboard installation routine (this explanatory article is helpful though slightly outdated, updated information is available in my Tool Box ebook). To enter characters with diacritics such as å, ā, ą, ạ, ȧ, or ă, you will need to simultaneously press the right Alt key and the key that corresponds to the grey diacritics marked below, release both keys, and then press the key for the primary character.

OmegaT ready for prime time
I’ve been waiting to write about OmegaT for some time now, but recently the open-source tool just blew me away.

Before looking at the current version and spending some time on the phone with OmegaT’s development manager Didier Briel, I watched a video that I had made three or four years ago as an intro to the then current version of OmegaT. While at very first glance there are some similarities between the current and previous versions, the differences are mind-boggling.

Let’s first talk a bit about what OmegaT is, who uses it, and who develops it.

There are only a small handful of open-source translation tools geared toward the professional freelance translator, and OmegaT is without a doubt the king of the hill. Last month, for instance, OmegaT was downloaded more than 8,000 times (for a total of over 400,000 downloads), and even though this includes updates of existing users (typically between 8 and 10 updates per year), these impressive numbers indicate that OmegaT is not just used by a few isolated software geeks.

Here are some other interesting stats: Spain has the highest number of downloads by far, followed by Japan, France, Germany, Poland, Italy, Russia, and, finally, the US. The statistics for operating system-specific downloads are also revealing: OmegaT can be run on WindowsMac, or Linux because it’s built for Java, which rides on top of other operating systems so to speak (if you feel insecure about Java, here are instructions on how to keep Java on your computer but disable it for browsers, which was the reason for recent security breaches). Seventy-six percent of all downloads were for Windows computers, with the remaining 23% almost evenly split between Mac and Linux — a sign that OmegaT is much more mainstream than one might expect. (All these numbers can be found on OmegaT’s Sourceforge site.)

According to Didier, what differentiates OmegaT from other open-source products is that the project is not run by one geeky developer, but by a team. Didier Briel is the development manager, Alex Buloichik is the lead developer, Marc Prior is the project coordinator and website manager, Vincent Bidaux is the (new) documentation manager, and Jean-Christophe Helary manages localization and the (very supportive) user group. Among these folks, Didier and Alex are the only true developers. Naturally, there are a whole bunch of other contributors working on the software as well; changes are submitted to Didier, who decides whether to integrate them.

There are also commercially sponsored developments. A significant sponsor right now is Welocalize — a language service provider that many of you will be familiar with. Welocalize is looking at OmegaT for two reasons. First, they themselves manage an open-source translation management system called GlobalSight. While GlobalSight comes with a bare-bones online editor for translation, their translators often prefer to translate offline on their desktops. This can presently be done only via XLIFF or various SDL products (TagEditorTrados WorkbenchWorldServer). Since most of these SDL products are increasingly outdated, and since it cannot be in the interest of Welocalize to continue supporting SDL, they need a new offline editor. Their initial investment into the open-source OpenTM2 has really not gone anywhere, so they have now chosen OmegaT and have invested a “significant amount,” according to Didier, into the development of some features (including better tag protection and a better matching of TMX and XLIFF results).

The second reason Welocalize (together with the Centre for Next Generation Localisation) is investing in OmegaT is a project called iOmegaT. This is a version of OmegaT used to measure MT post-editing productivity in comparison to TM-based translation. These features may or may not make it into OmegaT. You can find more information about that project right here.

OmegaT’s upgrade
The interface is super easy and user-friendly: the actual translation is done in a non-tabular, horizontal layout. If you have to deal with inline tags, they are clearly set apart from the translatables. Any panes with access to terminology, translation memory, machine translation, or comments can be arranged however you want and even dragged to a second monitor. And while I wish there were more right-click menus, the existing menus are well organized and efficient.

Two of my previous major complaints have been fixed. You can now enter terminology into the glossary as you translate with a great GUI, and, more importantly, you no longer have to convert MS Office files into OpenOffice files in order to translate — now you can translate files directly from MS Office 2007 and above. The range of directly supported file formats is impressive and includes TXT, PROPERTIES, PO, INI, SRT (subtitle), Open Document Formats, (X)HTML, XLIFF, RESX, LaTex, Wordfast TXML, and Visio files. There are other file formats that are supported indirectly through the open-source Rainbow application. You can find instructions on how to prepare files specifically for OmegaT with Rainbow right here. The file types requiring you to use this route include most XML formats, FrameMaker MIF, bilingual DOC, and Trados TTX files.

OmegaT includes a very cool project concept: you can have numerous files of various different formats within a project that automatically open one after the other as you translate, and any search-and-replace action can be done simultaneously in all files. Another special feature is that the TM consists of an unconverted TMX file rather than a database system. This circumvents any necessary TM conversions or imports/exports of TM data (particularly because OmegaT actually automatically produces three versions of TMX files in different flavors of TMX). There is currently a limit to the size of the TMX file, but the only actual example Didier could cite of a file it couldn’t handle was the EU’s obscenely large Acquis Communautaire. Yet another nice feature is the possible inclusion of Language Tools for inline grammar and style checking for a number of languages.

And there is no worry about missing out on instruction for OmegaT: the manual is very comprehensive (Vito Smolej, the former documentation manager, wanted to create the “holy grail” of documentation, according to Didier). The new doc manager will approach things a little differently, including making French (rather than English) the authoring language. Because this is a volunteer effort, the 20+ different language versions of the manual may cover anywhere between version 1.4.4 and the current version 2.6.3. Plus, be aware that the English documentation itself has various versions: the help system within OmegaT is slightly outdated, whereas the HTML version (found by a link in the installation folder) is current.

There are a good number of other resources for OmegaT. There is plenty of information listed on Omegat.org, one really great blog I’ve already mentioned (another one is right here), and then there’s Qabirias’s full-fledged manual (in Italian). The OmegaT team is very excited right now about this new grassroots ecosystem that has sprung up in the last few months, which is partly due to some of the independent resources listed above, and partly due to the new server version of OmegaT

I was excited to get re-acquainted with this tool. I always admired OmegaT, but I felt that it was limited due to missing support for Office files and poor terminology management. With those problems solved and the addition of many other powerful features, this is a tool that has no reason to hide behind its commercial competitors. JZ