File size observations on the IATE TBX Termbase

Is has been known for a while now that a database dump of IATE, the EU Terminology Database, has been made available as a download instead of a web search form in June 2014. The ZIP file is ~116 MB, the unpacked database 2.2 GB (!) large. Since it contains all EU languages, I split this file into 4 subfiles, and extracted four trilingual DE/FR/EN files using an XSL transformation sheet. xsltproc.exe from Apache’s Xerxes XML Parser package couldn’t cope with the complete file, but the four 550MB files passed through in about 10 minutes each and dropped to about half their original size.

Lire la suite