- Hi all, In light of the amazing progress that is being made with OmegaT, I feel very confident in expanding its use to other translators in my agency. To thisMessage 1 of 10 , Sep 1, 2005View SourceHi all,
In light of the amazing progress that is being made with OmegaT, I feel very confident in
expanding its use to other translators in my agency. To this end, I have a question for
What tools do you use to analyze new texts against existing memories? Up until now, I
have been importing my new OmegaT memories into Trados just for text analysis
purposes, but I would like to find a better (non-Trados) solution if possible. I also have
WordFast, but am not happy with the analysis speed, and especially the handling of tagged
files and large numbers of files. Trados really is fast at analyzing most of the time...
The most ideal solution would really be a non-commercial component, so that the entire
OmegaT workflow from project receipt and quoting to translation could be completed with
open source tools. And of course so that I could show any of our translators (internal or
external) how to work with tools X, Y and Z, all of which are freely available.
So how do all of you deal with the analysis issue? It is no problem for me when translating
for other agencies, as they supply their wordcounts, but when working for our own clients
and leveraging against existing translations, we need to be able to do our own analyses.
- Hello Eric, ... With OmT: none. I am not aware of any tool that does what you want. OmT has a very rudimentary statistical function which is a simple text fileMessage 2 of 10 , Sep 1, 2005View SourceHello Eric,
> What tools do you use to analyze new texts againstWith OmT: none. I am not aware of any tool that does
> existing memories?
what you want. OmT has a very rudimentary statistical
function which is a simple text file located in the
omegat folder of your project. This file contains the
word count for the whole project and for each file
individually, as well as the number of untranslated
words. By comparing "Word count in unique segments"
against "Total word count" you get an idea how many
repetition there is in your project.
To my knowledge a tool analysing the number of exact
matches, repetition and lower percentage matches does
not exist, at least not open source or freeware.
Gesendet von Yahoo! Mail - Jetzt mit 1GB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de
- ... Yet another reason why it could be useful if OmT reverted to the older work method whereby one could create a project TM containing all the segments in theMessage 3 of 10 , Sep 1, 2005View SourceS. Tomaskovic skryf om 9:58 AM op 01/09/2005:
> > What tools do you use to analyze new texts againstYet another reason why it could be useful if OmT reverted to the older work method whereby one could create a project TM containing all the segments in the project even though none of them are translated. The strings from that TM and the other TM can then be extracted to two simple files, which can be compared in a diff fashion... I'm sure it can be done.
> > existing memories?
> With OmT: none. I am not aware of any tool that does
> what you want.
- Hello Eric, Sonja has already mentioned OmegaT s own statistical function. This is very basic and probably not particularly reliable, either. That is oneMessage 4 of 10 , Sep 1, 2005View SourceHello Eric,
Sonja has already mentioned OmegaT's own statistical function. This is very
basic and probably not particularly reliable, either. That is one reason why
I had it removed from the GUI several versions ago and buried away in a
sub-directory where it could do less damage. :-)
The other reason is that I believe the whole concept of "fuzzy matches" to be
wrong-headed: the similiarity of two segments cannot be quantified by a
mathematical algorithm. That isn't to say that the function isn't useful. If
a number of parameters are known quantities, fuzzy match data can provide a
useful guide to similarity. For instance, if the translator of the old text
and the new are one and the same, the translator is familiar with tools and
has a good idea of how a certain percentage match translates into time
savings, then the figure might provide a rough, subjective guide to the
potential savings in time and effort.
The emphasis is very much on the "subjective", though. Quantification of
semantic similiarity is not a science, much less an exact one, and the
subjective impression of a skilled professional cannot be transferred
reliably to a business workflow level. The fact that many translation buyers
have insisted on doing so has only resulted in a lot of very disgruntled
translators who are realizing that they are simply being paid a lot less
money for a little less work.
With the "subjective" aspect firmly in mind, another open-source tool
available for this purpose OpenOffice.org: you can compare two documents for
similarity using the Compare function similiar to that in MS Word (Edit >
Compare Document). "How different" then equals "how much blue", but I'd be
sceptical about trying to be more accurate than that.
- ... I think it can... but it has to be quantified on a per-language basis. In some languages, a small change (percentagewise) in the text results in a smallMessage 5 of 10 , Sep 1, 2005View SourceMarc Prior skryf om 10:26 AM op 01/09/2005:
> The other reason is that I believe the whole concept of "fuzzy matches"I think it can... but it has to be quantified on a per-language basis. In some languages, a small change (percentagewise) in the text results in a small semantic change, but in other languages the changes are bigger.
> to be wrong-headed: the similiarity of two segments cannot be quantified
> by a mathematical algorithm.
The best such software would be that which can determine whether a "change" is a major or a minor one. I'll give an example: In Afrikaans to English, changing the definitive article in die sourcetext to an indefinitive article will result in an equal amount of change in the target text. In the same text, changing a singular to a plural in the sourcetext would result in a *twice* the amount of change in the target text. But such clever programs don't exist, do they? Until such time, one has to be satisfied with rough estimates.
I think leveraging is important to translation companies... it is not all bad and it doesn't necessarily lead to underpaid translators.
- I think that both Marc and Samuel have a point. However, the reality is that clients will not care if you are ready to grant fuzzy match discounts in one caseMessage 6 of 10 , Sep 1, 2005View SourceI think that both Marc and Samuel have a point.
However, the reality is that clients will not care if
you are ready to grant fuzzy match discounts in one
case and not in another, due to obvious reasons.
If you granted them once, you'll virtually be bound to
In fact the idea of charging on a per word basis is
not right. Translators should charge for the actual
work they need to perform, and not by a (usually
wrong) word count that does not reflect the efforts
involved in creating a good translation.
As such, Trados and other CAT tools haven't brought
much relief to our industry, they really make things
I'd prefer not to see a statistical function in OmT,
for the reasons mentioned above. One feature I find
very helpful in other tools such as Heartsome's editor
is a word count for untranslated or unapproved
words/segments, that gives a rough overview on how
much is still left to translate. I don't know how
reliable OmT's word count is, but I think we could
move this information within visible ranges, e.g.
create a menu entry or such.
Sorry, this is not exactly what Eric wanted to hear. I
imagine that it should be fairly easy to create a
small python or tcl script that compares segments from
a legacy TM with segments from OmT's TMX file (project_save.tmx).
Gesendet von Yahoo! Mail - Jetzt mit 1GB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de
- Samuel, I think you are doing quite a good job of making my case for me. ;-) ... some languages, a small change (percentagewise) in the text results in a smallMessage 7 of 10 , Sep 1, 2005View SourceSamuel,
I think you are doing quite a good job of making my case for me. ;-)
> I think it can... but it has to be quantified on a per-language basis. Insome languages, a small change (percentagewise) in the text results in a
small semantic change, but in other languages the changes are bigger.
The "small change" (the effect of, say, a change to one word or even one
letter) varies so much that the effect can only be considered at the level of
the individual change, or perhaps the level of the motivation behind it, i.e.
the level of the individual document. I think it is dangerous to make
generalizations regarding a particular language.
I'm sure many of us have translated a document, then received a version
containing amendments with the request to adapt our original translation,
only to find that our translation was still a perfectly adequate translation
of the amended document. And often suspected (and sometimes known) that the
amendments to the source text were made after the author had read our
translations, and the "new" source text is in fact even more like our
translation than the "old" one. :-)
In these and similar cases, there is zero work to do in the way of changes,
but still a whole bunch of work to do in the way of reviewing each individual
The work involved in reviewing someone else's translation is a similar issue.
This work depends upon two factors: firstly, the difficulty of the original
translation task, and secondly, how well the original translator accomplished
it. The translation might be perfect, but nevertheless a lot of work to
check. Or it might be diabolical, but in the case of an "easy" text, not all
that much work to correct, or for that matter retranslate. Assumptions are
I agree with Sonja that quantifying translation effort by volume is per se -
to use her words - "not right". It's convenient, admittedly, and I don't
think it's going to change in the foreseeable future. But I already have
different rates for different customers and different types of text for the
same customer, to reflect different levels of difficulty. Quantification of
translation by volume is a concession to pricing transparency, and I oppose
regarding it as a given and taking it to its logical conclusion that
similarities can be quantified in a similar way.
As regards fuzzy matches, in fact, I would rather take the number of
*identical* matches in a text as an indicator of the overall level of "fuzzy
matching", than "fuzzy match percentages" that give the impression of having
some scientific validity because they have figures after the decimal point.
> I think leveraging is important to translation companies...Of course it is - some of them are incapable of selling services on any basis
other than price. :-)
> it is not all bad and it doesn't necessarily lead to underpaid translators.I am in favour of leveraging, otherwise I would never have promoted OmegaT.
But it is translators who should be defining the savings in effort, not
customers or CAT tool vendors.
- Actually Marc, I am 100% with you. We are a fairly large agency, at least in Austria, and have a few client agencies that work with standard Trados-typeMessage 8 of 10 , Sep 1, 2005View SourceActually Marc, I am 100% with you. We are a fairly large agency, at least in Austria,
and have a few client agencies that work with standard Trados-type matrices, and I
really do not like that at all. In terms of minor changes, time to review segments and
everything else, I can assure you that you are preaching to the choir here :-).
When we first started working with IBM Translation Manager about nine years ago, we
first kept it secret for a year or two (that was great), but then market awareness was
so great that it was no longer "ethical" to do so. In any case, I was enthralled by the
support that the program provided for consistency an terminology.
For a time, we kind of followed the market and used Trados-type pricing matrices, but
I became more and more dissatisfied with the whole fuzzy match thing, and also with
sentence-level segmentation. And crashing, and regular pricey upgrades...
In response, we first began only offering a discount for 100% matches, nothing more
(unless clients really demand a fuzzy match matrix and it is justifiable). But we also
work less with software localization and strictly technical texts, and more with
advertising, business, financial, websites, corporate communications, etc. etc.
Then, I discovered OmegaT and the insanely logical concept of paragraph-level
segmentation (though it kind of threw me at first). I have followed many discussions
about sentence vs. paragraph, but I only work at sentence level when I absolutely have
to, and have never looked back.
I really think that OmegaT is spot on in terms of what translators really need, also in
terms of ensuring good quality. Even at the stage that it is in now. And conversely, I
have come to resent Trados on some levels, and the way that this kind of approach
has distorted our market. But I suppose every generation gets the feeling sooner or
later that the world is going down the tubes to some degree or another :-).
All I can say there is thanks for a truly great tool!
Oh, and as for my original question, I know of the wordcounts file, but read last
(sometime) that it was not very reliable. If it is fairly accurate, it will fulfill my needs
perfectly, as I am really only interested in 100% matches. I guess I just need to do
some comparisons with Trados counts to see.
- ... Horses for courses;-- I do analyze new texts, but not really against existing memories. As well as a word count I like to extract a word list. If theMessage 9 of 10 , Sep 1, 2005View SourceOn Sep 1, 2005, at 10:16 PM, OmegaT@yahoogroups.com wrote:
> What tools do you use to analyze new texts against existingHorses for courses;--
> memories? Up until now, I
> have been importing my new OmegaT memories into Trados just for
> text analysis
> purposes, but I would like to find a better (non-Trados) solution
> if possible. I also have
> WordFast, but am not happy with the analysis speed, and especially
> the handling of tagged
> files and large numbers of files. Trados really is fast at
> analyzing most of the time...
I do analyze new texts, but not really against existing memories.
As well as a word count I like to extract a word list.
If the latter is run against a word frequency list for the source
language, two sorts of vocabulary can be extracted.
They can be pre-translated in an MT program or against a specialist
Similar lists at the end of the process can be used to check the
validity of the translation.
About seven years ago I came across IBM's comparative length files
for a number of languages.
I haven't seen them since, but they made interesting reading.
Note that a lot of menu and display work requires texts to be of
Currently experimenting with TextWrangler on a Mac mini for comparing
files and manipulating texts.
I want to try WordSmith on a PC to get a concordance up and working
for a load of reference material.
Trouble is that I translate into English from several languages. The
tools for analysis I know best are for the target language.
Solution is to treat the source as badly written English that needs a
lot of heavy editing. In fact it sometimes is.
I have been using Heartsome for my translations for some time now
and working file by file, rather than building up a repertory.
It seems much like working with OmegaT as I remember it. Not much
help from fuzzy matching of segments.
Even so I'm glad of any repeated word that gets thrown up, as I'm a
I've even run English spell checkers on French texts to strip the
accents. It may work in reverse.
Actually I'm using CAT tools as glorified split screens.
None of the above is free software, as such; it's all tied. But there
are two other 'tools' on the Mac that cost me no extra.
WordPerfect 3.5 is a free download. It has good text-to-speech in
English, for proofing.
And Apple Works can manage Word files, so I keep parallel
applications for each source language.
These copies are used to glean my statistics and do some pre-
[Non-text portions of this message have been removed]
- ... Put that in a factory context and we re back in 1917 :) Not that I don t like the idea though :) JCMessage 10 of 10 , Sep 1, 2005View SourceOn 2005/09/02, at 0:03, Marc Prior wrote:
> > it is not all bad and it doesn't necessarily lead to underpaidPut that in a factory context and we're back in 1917 :) Not that I
> I am in favour of leveraging, otherwise I would never have promoted
> But it is translators who should be defining the savings in effort,
> customers or CAT tool vendors.
don't like the idea though :)