Loading ...
Sorry, an error occurred while loading the content.

Re: Replace commands in clips incorporating Chinese characters

Expand Messages
  • simon.drimer
    Many thanks all. I see now that simply converting the Chinese characters to that different (readable) format is the way forward ...
    Message 1 of 11 , Jan 18, 2010
    • 0 Attachment
      Many thanks all. I see now that simply converting the Chinese characters to that different (readable) format is the way forward ...

      --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
      >
      > That is actually pretty straightforward. Create a clipbook, with one entry for each different field you need to manage,
      > and use a clip like this for each one:
      >
      > ^!Replace "&\#22995;&\#21517;" >> "Name" ARSTW
      >
      > ^!Replace "&\#36164;&\#26684;&\#35777;&\#20070;&\#21495;&\#30721;" >> "Certificate Number" ARSTW
      >
      >
      >
      > Note the added backslashes for the pound signs - NoteTab is not reliable without them. Open a document, run the clipbook
      > (CH2ENG) for example, and just sit back and let it go.
      >
      >
      >
      > Start with a small file with all the headings you need to capture and let Google give you the translation. Write the
      > clips, and you'll be good to go.
      >
      >
      >
      > Regards,
      >
      > John
      >
      >
      >
      > From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of simon.drimer
      > Sent: Monday, January 18, 2010 4:50 PM
      > To: ntb-clips@yahoogroups.com
      > Subject: Re: [Clip] Replace commands in clips incorporating Chinese characters
      >
      >
      >
      >
      >
      > Hey ... thanks everyone. Yes Google Translate would be perfect, but the files are too big for it and it tends to fall
      > over 5% of the way into the document and thten fails to translate the Chinese text into English. I'll have a look at
      > finding some other way to use the Goodle Translation engine maybe with uploads to the web but could be a lot of trouble
      > (also looking at the other machine translators like Systran). I would love to get NT working !
      > Sheri - thanks - here's a sample partial record (and each file contains let's say 10,000 of these)
      > 姓名 刘丽梅
      > 性别 女
      > 资格证书号码 00200812210000008520
      > 资格证书状态 有效
      > 有效期截止日期 2011-12-10
      > And Google Translate will turn that into:
      > Name Li-Mei Liu
      > Gender Female
      > Certificate Number 00200812210000008520
      > Certificate Status Effective
      > Valid cut-off date 2011-12-10
      > So what I am trying to do is write a clip that replaces "姓名" with "Name",
      > "资格证书号码" with "Certificate Number" and so on - let's say there will be 50
      > different replacements...
      >
      > --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , Sheri <silvermoonwoman@> wrote:
      > >
      > > On 1/17/2010 10:57 PM, simon.drimer wrote:
      > > > Hi - so I have this problem ... I regularly need to process about 60 large (~30mb each) txt files that contain a mix
      > of simplified Chinese characters and English numerals. 99% of the Chinese characters are words and phrases that appear
      > repeatedly in the txt files (basically thousands of database records, with field names like "Registration number",
      > "Start date" etc - although all in Chinese) and the remaining 1% I can live without. I need to get out of Chinese
      > language and into English, so it seems the most efficient process would be to write a clip with a couple of hundred
      > individual Replace commands to convert the Chinese field names into English. Now the problem is, while I can open the
      > txt files with the Chinese characters intact, using one of the Unicode settings when opening the document, I am unable
      > to write a clip which will hold Chinese characters. Any Chinese characters I write or paste into a clip just get
      > converted to "????".
      > > > Anyone - is there a way to write clips which keep the integrity of unicode character sets like Chinese language ? Or
      > is there some other way I can convert Chinese characters into English in a repeatable way ?
      > > >
      > >
      > > Yes, I think it can be done in a NoteTab clip, using UTF-8 and regex.
      > >
      > > If you'd care to upload a small sample to show what you have vs what you
      > > need, I'd be happy to take a look.
      > >
      > > Regards,
      > > Sheri
      > >
      >
      >
      >
      >
      >
      > [Non-text portions of this message have been removed]
      >
    • Axel Berger
      ... Ah, that s different, why didn t you say so? These are not Chinese characters as such, be it UTF or any other encoding, but rather HTML entities. From
      Message 2 of 11 , Jan 19, 2010
      • 0 Attachment
        "simon.drimer" wrote:
        > here's a sample partial record
        > 姓名 刘丽梅
        > 性别 女

        Ah, that's different, why didn't you say so? These are not Chinese
        characters as such, be it UTF or any other encoding, but rather HTML
        entities. From NoteTab's point of view all that is pure 7-bit US-ASCII.
        Nothing simpler than finding and replacing that. (Simple in concept that
        is, still a lot of work, but, as I said before, it needs only be done
        once.)

        Axel
      Your message has been successfully submitted and would be delivered to recipients shortly.