As I posted on my job website on Yahoo, the EU 1 million sentences TM
in TMX is available for free download. It is 12 zipped files, together
about 1.2G, plus extraction tool. It takes about 30 minutes on fast
machine to extract for example English Czech tmx file, which has over
700M. Importing tmx into Trados (I am using 6.5) is exteremely slow,
one of my machines is running 2 days now and impported only 470,000
sentences so far. SDLX is faster, it takes about half day to import
one language combination. I am preparing E>CZ, CZ<E, E>SK, SK<E and
CZ>SK and SK>CZ files for both Trados and SDLX.
Those files will be huge, for example SDLX CZtoE mdb file is 1.16G,
zipped is 280M and exported txt file from SDLX for Trados is 312M,
zipped is 59M.
Other files will be similar in size.
It will take me additional several days to process the other language
If any of you is willing to help with this project (advice,
processing), please contact me of the list, and you will be able to
share the results for free.
If you later will be interested in the processed files, let me know,
and we can exchange the TM for free as well.
I am not doing this for money, but I want to collect as big E<>CZ<>SK
TM as possible as a foundation for future cooperation among us.
As the big TMs will represent in the future bigger and bigger power in
translation, I would appreciate your thoughts and input about how to
handle this fairly and also profitably for all participants.
P.S. Also if any of you have experience in extracting terminology from
those huge databases, please contact me off the list, as I would like
to do that as well.
Sending huge files for free is not a problem, it only takes a time.
I am using www.sendthisfile.com, where you can send file of any size
for free (one at a time). It takes me about a day to upload 700M file
(they hold the speed back for free accounts) and then I can forward
the file to as many people as I want at no time. The file is available
for download for about 4-5 days, download is much faster.