Loading ...
Sorry, an error occurred while loading the content.

Re: Unicode conversion (Previously on Basic list)

Expand Messages
  • Sheri
    Your sample was not zipped, so I didn t see it in the same format as you posted. But try the following clip. H= Convert Unicode Fractions and Document to Ansi
    Message 1 of 11 , Jul 6, 2009
    • 0 Attachment
      Your sample was not zipped, so I didn't see it in the same format as you
      posted. But try the following clip.

      H="Convert Unicode Fractions and Document to Ansi"
      ^!SetWizardWidth 200
      ^!Set %sourcedoc%="^?{(T=O)Browse to unicode text which has fractions
      that need converting=^%sourcedoc%}"
      ^!Set %proposed%="^$GetDataPath$ANSI_^$GetFileName(^%sourcedoc%)$"
      ^!SetWizardWidth 200
      ^!Set %SaveName%="^?{(T=S)Enter File Name for converted
      document=^%proposed%}"
      ^!Open "^%sourcedoc%" /C=65001
      ^!Replace "(*UTF8)(?<!^|\s)[\x{2153}-\x{215E}\xBC-\xBE]" >> "\x20$0" RAWS0
      ^!Replace "(*UTF8)\x{2153}" >> "1/3" RAWS0
      ^!Replace "(*UTF8)\x{2154}" >> "2/3" RAWS0
      ^!Replace "(*UTF8)\x{2155}" >> "1/5" RAWS0
      ^!Replace "(*UTF8)\x{2156}" >> "2/5" RAWS0
      ^!Replace "(*UTF8)\x{2157}" >> "3/5" RAWS0
      ^!Replace "(*UTF8)\x{2158}" >> "4/5" RAWS0
      ^!Replace "(*UTF8)\x{2159}" >> "1/6" RAWS0
      ^!Replace "(*UTF8)\x{215A}" >> "5/6" RAWS0
      ^!Replace "(*UTF8)\x{215B}" >> "1/8" RAWS0
      ^!Replace "(*UTF8)\x{215C}" >> "3/8" RAWS0
      ^!Replace "(*UTF8)\x{215D}" >> "5/8" RAWS0
      ^!Replace "(*UTF8)\x{215E}" >> "7/8" RAWS0
      ^!Replace "(*UTF8)\x{00BC}" >> "1/4" RAWS0
      ^!Replace "(*UTF8)\x{00BD}" >> "1/2" RAWS0
      ^!Replace "(*UTF8)\x{00BE}" >> "3/4" RAWS0
      ^!Export "^%SaveName%" ANSI All
      ^!Close Discard
      ^!Open "^%SaveName%"
      ^!Prompt Single character fractions have been replaced, and the document
      has been converted to ansi. The converted document is now open.
      ;end of clip

      John Shotsky wrote:
      > I uploaded a very small sample file with the high order fractions in it, into a John Shotsky¡¯s Stuff folder in the
      > files section.
      >
      > The file can be successfully opened in Word, Wordpad, Open Office Writer, and another text editor, EditPad Lite.
      > However, NoteTab substitutes question marks for the fractions. These fractions are present in Courier New font, which
      > I¡¯m using, but NTP doesn¡¯t seem to match them.
      >
      > My goal is simply to be able to search and replace the single character fractions with three character fractions, such
      > as ¨÷ to 1/3. (Unicode character = U+2153). I don¡¯t know if I can search on the Unicode character to replace them
      > when a question mark is in place.
      >
      > Thanks,
      > John
      >
      > From: notetab@yahoogroups.com [mailto:notetab@yahoogroups.com] On Behalf Of Sheri
      > Sent: Saturday, July 04, 2009 12:43
      > To: notetab@yahoogroups.com
      > Subject: [NTB] Re: Unicode conversion
      >
      >
      >
      >
      >
      > --- In notetab@yahoogroups.com <mailto:notetab%40yahoogroups.com> , "John Shotsky" <jshotsky@...> wrote:
      >
      >> I have received a text file in Unicode 16 format. It uses single
      >> character fractions for 1/3, 2/3. Although NTP appears to be able
      >> to successfully open this file and display it correctly, those
      >> characters that are not directly supported in most Windows
      >> formats are converted to question marks. Is there a way around
      >> this? I could always convert the fractions afterwards to three
      >> character fractions, if I could display them upon opening the
      >> file. I can provide a sample, if needed.
      >>
      >
      > Go ahead and zip up a sample to the files area, I'll have a look.
      >
      > Regards,
      > Sheri
      >
      >
      >
      > [Non-text portions of this message have been removed]
      >
      >
      >
    • John Shotsky
      This didn’t work for me. It asks for a file name, but no browse is available. I’ve fixed the broken lines, so that’s not an issue… I uploaded a zipped
      Message 2 of 11 , Jul 6, 2009
      • 0 Attachment
        This didn’t work for me. It asks for a file name, but no browse is available. I’ve fixed the broken lines, so that’s not
        an issue…

        I uploaded a zipped copy of the sample to the John Shotsky’s Stuff folder on THIS list. If you open it with Word, etc,
        you should see the fractions.
        Thanks!
        John

        From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Sheri
        Sent: Monday, July 06, 2009 10:00
        To: ntb-clips@yahoogroups.com
        Subject: [Clip] Re: Unicode conversion (Previously on Basic list)





        Your sample was not zipped, so I didn't see it in the same format as you
        posted. But try the following clip.

        H="Convert Unicode Fractions and Document to Ansi"
        ^!SetWizardWidth 200
        ^!Set %sourcedoc%="^?{(T=O)Browse to unicode text which has fractions
        that need converting=^%sourcedoc%}"
        ^!Set %proposed%="^$GetDataPath$ANSI_^$GetFileName(^%sourcedoc%)$"
        ^!SetWizardWidth 200
        ^!Set %SaveName%="^?{(T=S)Enter File Name for converted
        document=^%proposed%}"
        ^!Open "^%sourcedoc%" /C=65001
        ^!Replace "(*UTF8)(?<!^|\s)[\x{2153}-\x{215E}\xBC-\xBE]" >> "\x20$0" RAWS0
        ^!Replace "(*UTF8)\x{2153}" >> "1/3" RAWS0
        ^!Replace "(*UTF8)\x{2154}" >> "2/3" RAWS0
        ^!Replace "(*UTF8)\x{2155}" >> "1/5" RAWS0
        ^!Replace "(*UTF8)\x{2156}" >> "2/5" RAWS0
        ^!Replace "(*UTF8)\x{2157}" >> "3/5" RAWS0
        ^!Replace "(*UTF8)\x{2158}" >> "4/5" RAWS0
        ^!Replace "(*UTF8)\x{2159}" >> "1/6" RAWS0
        ^!Replace "(*UTF8)\x{215A}" >> "5/6" RAWS0
        ^!Replace "(*UTF8)\x{215B}" >> "1/8" RAWS0
        ^!Replace "(*UTF8)\x{215C}" >> "3/8" RAWS0
        ^!Replace "(*UTF8)\x{215D}" >> "5/8" RAWS0
        ^!Replace "(*UTF8)\x{215E}" >> "7/8" RAWS0
        ^!Replace "(*UTF8)\x{00BC}" >> "1/4" RAWS0
        ^!Replace "(*UTF8)\x{00BD}" >> "1/2" RAWS0
        ^!Replace "(*UTF8)\x{00BE}" >> "3/4" RAWS0
        ^!Export "^%SaveName%" ANSI All
        ^!Close Discard
        ^!Open "^%SaveName%"
        ^!Prompt Single character fractions have been replaced, and the document
        has been converted to ansi. The converted document is now open.
        ;end of clip

        John Shotsky wrote:
        > I uploaded a very small sample file with the high order fractions in it, into a John Shotsky¡¯s Stuff folder in the
        > files section.
        >
        > The file can be successfully opened in Word, Wordpad, Open Office Writer, and another text editor, EditPad Lite.
        > However, NoteTab substitutes question marks for the fractions. These fractions are present in Courier New font, which
        > I¡¯m using, but NTP doesn¡¯t seem to match them.
        >
        > My goal is simply to be able to search and replace the single character fractions with three character fractions, such
        > as ¨÷ to 1/3. (Unicode character = U+2153). I don¡¯t know if I can search on the Unicode character to replace them
        > when a question mark is in place.
        >
        > Thanks,
        > John
        >
        > From: notetab@yahoogroups.com <mailto:notetab%40yahoogroups.com> [mailto:notetab@yahoogroups.com
        <mailto:notetab%40yahoogroups.com> ] On Behalf Of Sheri
        > Sent: Saturday, July 04, 2009 12:43
        > To: notetab@yahoogroups.com <mailto:notetab%40yahoogroups.com>
        > Subject: [NTB] Re: Unicode conversion
        >
        >
        >
        >
        >
        > --- In notetab@yahoogroups.com <mailto:notetab%40yahoogroups.com> <mailto:notetab%40yahoogroups.com> , "John Shotsky"
        <jshotsky@...> wrote:
        >
        >> I have received a text file in Unicode 16 format. It uses single
        >> character fractions for 1/3, 2/3. Although NTP appears to be able
        >> to successfully open this file and display it correctly, those
        >> characters that are not directly supported in most Windows
        >> formats are converted to question marks. Is there a way around
        >> this? I could always convert the fractions afterwards to three
        >> character fractions, if I could display them upon opening the
        >> file. I can provide a sample, if needed.
        >>
        >
        > Go ahead and zip up a sample to the files area, I'll have a look.
        >
        > Regards,
        > Sheri
        >
        >
        >
        > [Non-text portions of this message have been removed]
        >
        >
        >



        [Non-text portions of this message have been removed]
      • Sheri
        ... I posted it here:
        Message 3 of 11 , Jul 6, 2009
        • 0 Attachment
          --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
          >
          > This didn't work for me. It asks for a file name, but no browse is available. I've fixed the broken lines, so that's not
          > an issue…

          I posted it here: <http://tech.groups.yahoo.com/group/ntb-clips/files/John Shotsky's Stuff/Cliptext.txt>

          so you can see it without Yahoo broken lines.

          First, you browse to the unicode text file. You have the click the [...] button on a clip wizard to browse.

          Then you can accept or change the name of the Ansi file that will be saved. If you accept the one proposed, the name is formed by prepending "Ansi_" to the original file name, and is put into your Notetab data directory. The second clip wizard also lets you browse (if you click the [...] button), but you shouldn't choose the original file name and location.

          Once that is done, the clip proceeds. The output looks fine to me.

          Regards,
          Sheri
        • John Shotsky
          This gave the same result. Screenshot below. No [.] present. John From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Sheri Sent:
          Message 4 of 11 , Jul 6, 2009
          • 0 Attachment
            This gave the same result. Screenshot below. No [.] present.


            John

            From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Sheri
            Sent: Monday, July 06, 2009 10:49
            To: ntb-clips@yahoogroups.com
            Subject: [Clip] Re: Unicode conversion (Previously on Basic list)





            --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , "John Shotsky" <jshotsky@...> wrote:
            >
            > This didn't work for me. It asks for a file name, but no browse is available. I've fixed the broken lines, so that's
            not
            > an issue.

            I posted it here: <http://tech.groups.yahoo.com/group/ntb-clips/files/John Shotsky's Stuff/Cliptext.txt>

            so you can see it without Yahoo broken lines.

            First, you browse to the unicode text file. You have the click the [...] button on a clip wizard to browse.

            Then you can accept or change the name of the Ansi file that will be saved. If you accept the one proposed, the name is
            formed by prepending "Ansi_" to the original file name, and is put into your Notetab data directory. The second clip
            wizard also lets you browse (if you click the [...] button), but you shouldn't choose the original file name and
            location.

            Once that is done, the clip proceeds. The output looks fine to me.

            Regards,
            Sheri



            [Non-text portions of this message have been removed]
          • Sheri
            ... The group supports text only, you would have to post your screen shot in the files area. BTW, the clip is likely compatible only with version 6.12, the
            Message 5 of 11 , Jul 6, 2009
            • 0 Attachment
              --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
              >
              > This gave the same result. Screenshot below. No [.] present.
              >
              >
              > John
              >

              The group supports text only, you would have to post your screen shot in the files area.

              BTW, the clip is likely compatible only with version 6.12, the latest slipstream release of NoteTab. The Help-About screen says 6.12, and the file version shown in Explorer for NotePro.exe properties says 6.1.2.4. The file date is 6/16/09.

              Regards,
              Sheri
            • Sheri
              ... If you can t see the [...] button, you may need to remove or comment out the two commands: ^!SetWizardWidth 200 That makes them double wide, so a long file
              Message 6 of 11 , Jul 6, 2009
              • 0 Attachment
                --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
                >
                > This gave the same result. Screenshot below. No [.] present.

                If you can't see the [...] button, you may need to remove or comment out the two commands:

                ^!SetWizardWidth 200

                That makes them double wide, so a long file name can be fully seen. Perhaps your screen resolution doesn't support it.

                Regards,
                Sheri
              • John Shotsky
                Yep, updating did the trick. I was on 6.12, but there was one newer one. I updated and now it works fine. Thanks! John From: ntb-clips@yahoogroups.com
                Message 7 of 11 , Jul 6, 2009
                • 0 Attachment
                  Yep, updating did the trick. I was on 6.12, but there was one newer one. I updated and now it works fine.
                  Thanks!
                  John

                  From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Sheri
                  Sent: Monday, July 06, 2009 11:13
                  To: ntb-clips@yahoogroups.com
                  Subject: [Clip] Re: Unicode conversion (Previously on Basic list)





                  --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , "John Shotsky" <jshotsky@...> wrote:
                  >
                  > This gave the same result. Screenshot below. No [.] present.
                  >
                  >
                  > John
                  >

                  The group supports text only, you would have to post your screen shot in the files area.

                  BTW, the clip is likely compatible only with version 6.12, the latest slipstream release of NoteTab. The Help-About
                  screen says 6.12, and the file version shown in Explorer for NotePro.exe properties says 6.1.2.4. The file date is
                  6/16/09.

                  Regards,
                  Sheri



                  [Non-text portions of this message have been removed]
                • John Shotsky
                  I found a new problem with these files - there are non-printing spaces in it. I see printing spaces and CR s, but these are apparently a high-order space which
                  Message 8 of 11 , Jul 7, 2009
                  • 0 Attachment
                    I found a new problem with these files - there are non-printing spaces in it. I see printing spaces and CR's, but these
                    are apparently a high-order space which simply looks like a space but isn't an ASCII space. If I copy and search for it,
                    it finds them, but the search window shows only a blank, not the code that it represents. Is there any way to determine
                    the code for a given character, so I could build a clip to replace it with an ASCII version? If not, I can make the
                    replace work with a copied version, but that's not very obvious when reviewing it later.
                    Thanks,
                    John

                    From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of John Shotsky
                    Sent: Monday, July 06, 2009 16:53
                    To: ntb-clips@yahoogroups.com
                    Subject: RE: [Clip] Re: Unicode conversion (Previously on Basic list)





                    Yep, updating did the trick. I was on 6.12, but there was one newer one. I updated and now it works fine.
                    Thanks!
                    John

                    From: ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> [mailto:ntb-clips@yahoogroups.com
                    <mailto:ntb-clips%40yahoogroups.com> ] On Behalf Of Sheri
                    Sent: Monday, July 06, 2009 11:13
                    To: ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com>
                    Subject: [Clip] Re: Unicode conversion (Previously on Basic list)


                    --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> <mailto:ntb-clips%40yahoogroups.com> , "John
                    Shotsky" <jshotsky@...> wrote:
                    >
                    > This gave the same result. Screenshot below. No [.] present.
                    >
                    >
                    > John
                    >

                    The group supports text only, you would have to post your screen shot in the files area.

                    BTW, the clip is likely compatible only with version 6.12, the latest slipstream release of NoteTab. The Help-About
                    screen says 6.12, and the file version shown in Explorer for NotePro.exe properties says 6.1.2.4. The file date is
                    6/16/09.

                    Regards,
                    Sheri

                    [Non-text portions of this message have been removed]



                    [Non-text portions of this message have been removed]
                  • Sheri
                    ... You probably meant non-breaking spaces which are hex A0. Unless they show as question marks when opened in the normal way, they are within the 256
                    Message 9 of 11 , Jul 7, 2009
                    • 0 Attachment
                      --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
                      >
                      > I found a new problem with these files - there are non-printing
                      > spaces in it. I see printing spaces and CR's, but these are
                      > apparently a high-order space which simply looks like a space but
                      > isn't an ASCII space. If I copy and search for it, it finds them,
                      > but the search window shows only a blank, not the code that it
                      > represents. Is there any way to determine the code for a given
                      > character, so I could build a clip to replace it with an ASCII
                      > version? If not, I can make the replace work with a copied
                      > version, but that's not very obvious when reviewing it later.

                      You probably meant non-breaking spaces which are hex A0. Unless they show as question marks when opened in the normal way, they are within the 256 ansi-supported characters.

                      You can see what it is (in hex) by highlighting it and running this clip:

                      ^!Info \x^$IntToHex(^$CharToDec(^$StrCopyLeft("^$GetSelection$";1)$)$)$

                      You can replace (in the current document) all non-breaking spaces with normal spaces using:

                      ^!Replace "\xA0" >> "\x20" RAWS0

                      Regards,
                      Sheri
                    • Sheri
                      ... I posted an updated version of the clip here:
                      Message 10 of 11 , Jul 8, 2009
                      • 0 Attachment
                        --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
                        >
                        > I found a new problem with these files - there are non-printing spaces in it. I see printing spaces and CR's, but these
                        > are apparently a high-order space which simply looks like a space but isn't an ASCII space. If I copy and search for it,
                        > it finds them, but the search window shows only a blank, not the code that it represents. Is there any way to determine
                        > the code for a given character, so I could build a clip to replace it with an ASCII version? If not, I can make the
                        > replace work with a copied version, but that's not very obvious when reviewing it later.

                        I posted an updated version of the clip here:
                        <http://tech.groups.yahoo.com/group/ntb-clips/files/John%20Shotsky's%20Stuff/Cliptext.txt>

                        It now replaces non-breaking spaces.

                        Also if the source document cannot be processed using regex in UTF8 mode (which probably means the source document was not unicode or utf8), it opens the document as ANSI and replaces single character fractions (half, quarter and three quarters) and non-breaking spaces. Those characters are all available in the Western Ansi code page.

                        Regards,
                        Sheri
                      • John Shotsky
                        Sheri, Thanks, it all works as expected. Some of your code was way above my understanding, but I ll study it and learn some more from you, as usual! John
                        Message 11 of 11 , Jul 8, 2009
                        • 0 Attachment
                          Sheri,

                          Thanks, it all works as expected. Some of your code was way above my understanding, but I'll study it and learn some more from you, as usual!

                          John
                          --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@...> wrote:
                          >
                          > --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@> wrote:
                          > >
                          > > I found a new problem with these files - there are non-printing spaces in it. I see printing spaces and CR's, but these
                          > > are apparently a high-order space which simply looks like a space but isn't an ASCII space. If I copy and search for it,
                          > > it finds them, but the search window shows only a blank, not the code that it represents. Is there any way to determine
                          > > the code for a given character, so I could build a clip to replace it with an ASCII version? If not, I can make the
                          > > replace work with a copied version, but that's not very obvious when reviewing it later.
                          >
                          > I posted an updated version of the clip here:
                          > <http://tech.groups.yahoo.com/group/ntb-clips/files/John%20Shotsky's%20Stuff/Cliptext.txt>
                          >
                          > It now replaces non-breaking spaces.
                          >
                          > Also if the source document cannot be processed using regex in UTF8 mode (which probably means the source document was not unicode or utf8), it opens the document as ANSI and replaces single character fractions (half, quarter and three quarters) and non-breaking spaces. Those characters are all available in the Western Ansi code page.
                          >
                          > Regards,
                          > Sheri
                          >
                        Your message has been successfully submitted and would be delivered to recipients shortly.