Re: [Czechlist] Tag problem
- OCRed docs are a pain when it comes to the formatting. If I were you, I
would check whether multiple fonts are used in the file, and if so, if
they are really different and the client would notice if you remove
them. If this is OK, then simply select all the content and convert to a
Alternatively, you can try using the David Turner's Codezapper macro
that is designed to clean rogue codes. It's for free and to be
You may need login/registration with the Dejavu-l group. If in trouble,
let me know offlist and I will send it to you.
The last resort option is to do formatting manually, paragraph by
paragraph, always with the appropriate font. Or you may try to batch
replace any instance of the hacek font (check for the font name to be
able to enter it in the Search box) with the standard font using Search
Dne 22.4.2010 12:05, James Kirchner napsal(a):
> I've got a Word file from a PDF that I performed OCR on in OmniPage.
> Everything is quite good, except that for unknown reasons, this time
> OmniPage made every C with a hacek the wrong font and one size bigger
> than the surrounding text.
> I have fixed this problem, at least to the naked eye in Word, but when I
> load the file into my CAT tool, I find that every C with a hacek has
> formatting tags on both sides of it.
> Does anyone know how I can go back to Word (or something else) and get
> those tags out without catastrophic conversion of the whole file to TXT?
> Thanks for any ideas.