Loading ...
Sorry, an error occurred while loading the content.

Re: tag forests in Japanese files

Expand Messages
  • Maynard Hogg
    ... I noticed that a lot of these code shifting sequences involved spaces between Japanese characters. The following regexp eliminates spaces before
    Message 1 of 4 , Jun 1, 2009
    • 0 Attachment
      2009/6/1 Maynard Hogg <maynard.hogg@...>:
      >> YMMV Warning: The above didn't work 100% of the time. For some
      >> inexplicable reason, some "code shifting" sequences remained.

      >> Example: <f0>11</f0><f1>月</f1> for November.

      > 営業<f0> </f0>利益
      > = Operating profit

      I noticed that a lot of these "code shifting" sequences involved
      spaces between Japanese characters.

      The following regexp eliminates spaces before hiragana, katakana, and
      Unified Han characters.

      s/ +([\u3041-\u30FF\u4E00-\u9FFF]+)/$1/g

      WARNING: Do NOT click the "Replace all" button. All too often, the
      author has willingly used spaces for layout instead of tabs or
      whatever.

      WARNING #2: Relying on the "distinguish character width" option produces grief.
    • Maynard Hogg
      ... That s the JavaScript version. OOo uses ANSI C s x. s/ +([ u3041- u30FF u4E00- u9FFF]+)/$1/g s/([ u3041- u30FF u4E00- u9FFF]+) +/$1/g Or even s/
      Message 2 of 4 , Jun 1, 2009
      • 0 Attachment
        2009/6/1 Maynard Hogg <maynard.hogg@...>:
        > The following regexp eliminates spaces before hiragana, katakana, and
        > Unified Han characters.
        > s/ +([\u3041-\u30FF\u4E00-\u9FFF]+)/$1/g

        That's the JavaScript version. OOo uses ANSI C's \x.

        s/ +([\u3041-\u30FF\u4E00-\u9FFF]+)/$1/g
        s/([\u3041-\u30FF\u4E00-\u9FFF]+) +/$1/g

        Or even

        s/ *([\u3041-\u30FF\u4E00-\u9FFF]+) */$1/g

        > WARNING: Do NOT click the "Replace all" button. All too often, the
        > author has willingly used spaces for layout instead of tabs or
        > whatever.

        This warning still applies, but OmegaT allows you to insert tabs and
        CRs in the translation.
      Your message has been successfully submitted and would be delivered to recipients shortly.