Loading ...
Sorry, an error occurred while loading the content.
 

Re: [Czechlist] Tag problem

Expand Messages
  • Martin Janda
    OCRed docs are a pain when it comes to the formatting. If I were you, I would check whether multiple fonts are used in the file, and if so, if they are really
    Message 1 of 2 , Apr 22, 2010
      OCRed docs are a pain when it comes to the formatting. If I were you, I
      would check whether multiple fonts are used in the file, and if so, if
      they are really different and the client would notice if you remove
      them. If this is OK, then simply select all the content and convert to a
      single font.

      Alternatively, you can try using the David Turner's Codezapper macro
      that is designed to clean rogue codes. It's for free and to be
      downloaded from
      http://tech.groups.yahoo.com/group/dejavu-l/files/CodeZapper/ .

      You may need login/registration with the Dejavu-l group. If in trouble,
      let me know offlist and I will send it to you.

      The last resort option is to do formatting manually, paragraph by
      paragraph, always with the appropriate font. Or you may try to batch
      replace any instance of the hacek font (check for the font name to be
      able to enter it in the Search box) with the standard font using Search
      + Replace.

      Good luck!
      Martin




      Dne 22.4.2010 12:05, James Kirchner napsal(a):
      >
      >
      > I've got a Word file from a PDF that I performed OCR on in OmniPage.
      >
      > Everything is quite good, except that for unknown reasons, this time
      > OmniPage made every C with a hacek the wrong font and one size bigger
      > than the surrounding text.
      >
      > I have fixed this problem, at least to the naked eye in Word, but when I
      > load the file into my CAT tool, I find that every C with a hacek has
      > formatting tags on both sides of it.
      >
      > Does anyone know how I can go back to Word (or something else) and get
      > those tags out without catastrophic conversion of the whole file to TXT?
      >
      > Thanks for any ideas.
      >
      > Jamie
      >
      >
    Your message has been successfully submitted and would be delivered to recipients shortly.