THE TOOL KIT – HELP WITH PDFS
File Types and Tools
To use PDF files as efficiently as possible, it’s important to know that, from a practical point of view, there are three different types of PDF files:
Text-based files, where text is “real” text and you can copy and paste text from the file (unless restricted by the file’s security settings) and search for text in the file. Converting these types of files to a fully editable (and translatable) format, such as a Word file, is less problematic than with image-based files, though not necessarily simple.
Image-based files, where copying or searching is not possible because what appears to be text is actually part of an image, as in a scanned or faxed document that has been saved as a PDF file. To convert these types of files to an editable file format one needs to use an OCR (optical character recognition) program, and the result depends on the clarity of the image.
Searchable image-based files — a kind of hybrid between the two other types, where you can search text even though it’s an image. A searchable image-based file can be created from an image-based file using the “Recognize Text Using OCR” function in Adobe Acrobat (not available in the Reader version).
Of all PDF tools, Adobe Reader is the one probably already in almost everyone’s computer. It allows you to view and search PDF files and also comment on files that have been enabled for commenting. In addition to the free Reader version, the Adobe Acrobat product family also includes various forms of paid versions (among them Adobe Acrobat Standard and Adobe Acrobat Pro). See the comparison chart, to review the choice of features offered between the unpaid and paid versions. The main difference between the paid Standard and Pro editions from the translator’s perspective is the helpful comparison of different PDFs and the ability to enable comment features for the free Reader (these features are available in the Pro version only).
In addition to the Adobe Acrobat products, there are many more-or-less comparable and often less expensive programs that allow you to do many of the same things, including Solid PDF Tools, DocuCom PDF Gold, Pdf995Suite, and many, many others.
Conversion to Editable Text
The question of converting PDF files is a full article in itself — or workshop, like the one recently offered by the NCTA. Here are a few tips that were developed in the Premium version of my newsletter and can also be found my next Tool Kit electronic book.
Most translation environment tools can’t process PDF files, and even those that can (such as Trados Studio, Wordfast Pro, memoQ, Alchemy Publisher, and Fluency) often don’t do it well enough. A good conversion program converts a PDF file to a Word file with flowing text but conserves formatting (bold, italics, paragraphs, tables, etc.) without creating text boxes. If the PDF file is an image-based file (such as a scanned or faxed document), the program also needs to be able to convert the image to text accurately.
Adobe Reader offers only two possible conversion methods: text can be copied and pasted using the clipboard, or the file can be saved as a text file (File > Save as Text). With both methods, each line ends with a hard return (paragraph mark), so they are practical only for a small amount of text.
The Standard and Pro versions of Adobe Acrobat offer some additional conversion methods. You can select File > Export or File > Save As, which allows saving the file directly in various file formats (such as Word, Excel, HTML, XML, etc.).
I have covered numerous times in my newsletter the lighter versions of ABBYY and Nuance’s transformer/converter programs (PDF Transformer by ABBYY and PDF Converter by Nuance), and these can be both an affordable and suitable option for converting PDFs. However, if your bread and butter consists of translating PDF files and you’re looking for more flexibility, you might want to have a look at their (much) more expensive bigger siblings, i.e., FineReader by ABBYY and OmniPage by Nuance. JZ