File size observations on the IATE TBX Termbase

Is has been known for a while now that a database dump of IATE, the EU Terminology Database, has been made available as a download instead of a web search form in June 2014. The ZIP file is ~116 MB, the unpacked database 2.2 GB (!) large. Since it contains all EU languages, I split this file into 4 subfiles, and extracted four trilingual DE/FR/EN files using an XSL transformation sheet. xsltproc.exe from Apache’s Xerxes XML Parser package couldn’t cope with the complete file, but the four 550MB files passed through in about 10 minutes each and dropped to about half their original size.


Preparing half-translated bilingual XML for Trados Studio – with XSLT

More and more translation clients, especially in the Web industry, but also in application I18N/L10N, use the versatile XML standard for translation purposes. The market leader of Computer Aided Translation (CAT) Tools, SDL’s Trados Studio, allows to translate XML with an “Any XML” input filter, which includes an assistant that lets you choose which XML tags and attributes will be visible in the editor as “translatables”. Unfortunately, this means that the source strings will be overwritten with the translation — a bad idea if the source file is already bilingual XML that contains source and target language strings in matched tags.

If the target strings are empty, you can easily copy the content over and translate right away. But if the file is already partly translated, things get a bit more tricky, since you don’t want to overwrite existing translations. Worse, if the client happily announces that the source of some of the translated strings has changed, things get more than just a bit tricky. Let’s have a look at how to prepare those files with XSLT!