Loading ...
Sorry, an error occurred while loading the content.

Re: issue with copy - missing characters

Expand Messages
  • Eb
    Don, The character you used as example _is_ preserved, but NoteTab and/or the font you re using cannot display it properly. In my case, it displays fine in the
    Message 1 of 8 , Dec 8, 2010
    • 0 Attachment
      Don,

      The character you used as example _is_ preserved, but NoteTab and/or the font you're using cannot display it properly. In my case, it displays fine in the document window (Courier New), but not in pop-up windows, like from ^!Info.

      To assure yourself that the character is still present, select the square or whatever shows on your screen, then run a clip with ^$CharToDec(^$GetSelection$)$ to tell you what ts ascii code is.
      Or, in this case, you could just replace the long dash with a short hyphen, or two.


      Cheers,


      Eb




      --- In ntb-clips@yahoogroups.com, Don <don@...> wrote:
      >
      > I am copying web material and missing characters.
      >
      > For example if I copy this line:
      > Drink Up—Water, That Is!
      >
      > I get a square block between Up and Water.
      >
      > Same thing with curly apostophes, smart quotes, etc.
      >
      > Is there some way to preserve them on copying?
      >
    • Don Daugherty
      ... I think it isn t so much that they aren t being copied, but rather that they don t display properly. The characters apparently either have ascii codes
      Message 2 of 8 , Dec 10, 2010
      • 0 Attachment
        On 12/7/2010 7:42 PM, Don wrote:
        > I am copying web material and missing characters.
        >
        > For example if I copy this line:
        > Drink Up—Water, That Is!
        >
        > I get a square block between Up and Water.
        >
        > Same thing with curly apostophes, smart quotes, etc.
        >
        > Is there some way to preserve them on copying?
        >
        I think it isn't so much that they aren't being copied, but rather that
        they don't display "properly." The characters apparently either have
        ascii codes that aren't defined in the font of the document you are are
        pasting into or they are unicode characters (which I know litle about.)
        You can find out what character code NoteTab sees them as having by
        selecting them one at a time and running this simple clip:
        ^!Info Code=^$CharToDec(^$GetSelection$)$
        Then you can look at the fonts available in NoteTab (it varies between
        Pro and Std/Lite) to see whether there is one that contains the
        character in question.
      • John Shotsky
        Unicode characters use two bytes to describe each character. ASCII uses one. You can always use View Source on a web page to see what the character codes are.
        Message 3 of 8 , Dec 10, 2010
        • 0 Attachment
          Unicode characters use two bytes to describe each character. ASCII uses one. You can always use View Source on a web
          page to see what the character codes are.

          If you paste Unicode into NoteTab, you may experience unpredictable results. I have a clip that runs on a file to
          convert such characters, and THEN it is opened in NoteTab. You have to save the file without ever pasting it into
          Notetab, then browse to it using the clip, convert it, and finally open it.

          Regards,
          John
          http://recipetools.gotdns.com


          -----Original Message-----
          From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Don Daugherty
          Sent: Friday, December 10, 2010 7:46 AM
          To: ntb-clips@yahoogroups.com
          Subject: Re: [Clip] issue with copy - missing characters

          On 12/7/2010 7:42 PM, Don wrote:
          > I am copying web material and missing characters.
          >
          > For example if I copy this line:
          > Drink Up-Water, That Is!
          >
          > I get a square block between Up and Water.
          >
          > Same thing with curly apostophes, smart quotes, etc.
          >
          > Is there some way to preserve them on copying?
          >
          I think it isn't so much that they aren't being copied, but rather that
          they don't display "properly." The characters apparently either have
          ascii codes that aren't defined in the font of the document you are are
          pasting into or they are unicode characters (which I know litle about.)
          You can find out what character code NoteTab sees them as having by
          selecting them one at a time and running this simple clip:
          ^!Info Code=^$CharToDec(^$GetSelection$)$
          Then you can look at the fonts available in NoteTab (it varies between
          Pro and Std/Lite) to see whether there is one that contains the
          character in question.



          ------------------------------------

          Fookes Software: http://www.fookes.com/
          NoteTab website: http://www.notetab.com/
          NoteTab Discussion Lists: http://www.notetab.com/groups.php

          ***
          Yahoo! Groups Links
        • Don
          I believe I am working with unicode. I am in notetab pro -- so is there are font I can use? Can I get a copy of your clip John? The characters are for the
          Message 4 of 8 , Dec 10, 2010
          • 0 Attachment
            I believe I am working with unicode. I am in notetab pro -- so is there
            are font I can use? Can I get a copy of your clip John?

            The characters are for the most part em or en dashes, smart quotes and
            smart apostrophes (meaning curled instead of straight).

            I really need the power of regex to work on these files.

            On 12/10/2010 11:02 AM, John Shotsky wrote:
            > Unicode characters use two bytes to describe each character. ASCII uses one. You can always use View Source on a web
            > page to see what the character codes are.
            >
            > If you paste Unicode into NoteTab, you may experience unpredictable results. I have a clip that runs on a file to
            > convert such characters, and THEN it is opened in NoteTab. You have to save the file without ever pasting it into
            > Notetab, then browse to it using the clip, convert it, and finally open it.
            >
            > Regards,
            > John
            > http://recipetools.gotdns.com
            >
            >
            > -----Original Message-----
            > From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Don Daugherty
            > Sent: Friday, December 10, 2010 7:46 AM
            > To: ntb-clips@yahoogroups.com
            > Subject: Re: [Clip] issue with copy - missing characters
            >
            > On 12/7/2010 7:42 PM, Don wrote:
            >> I am copying web material and missing characters.
            >>
            >> For example if I copy this line:
            >> Drink Up-Water, That Is!
            >>
            >> I get a square block between Up and Water.
            >>
            >> Same thing with curly apostophes, smart quotes, etc.
            >>
            >> Is there some way to preserve them on copying?
            >>
            > I think it isn't so much that they aren't being copied, but rather that
            > they don't display "properly." The characters apparently either have
            > ascii codes that aren't defined in the font of the document you are are
            > pasting into or they are unicode characters (which I know litle about.)
            > You can find out what character code NoteTab sees them as having by
            > selecting them one at a time and running this simple clip:
            > ^!Info Code=^$CharToDec(^$GetSelection$)$
            > Then you can look at the fonts available in NoteTab (it varies between
            > Pro and Std/Lite) to see whether there is one that contains the
            > character in question.
            >
            >
            >
            > ------------------------------------
            >
            > Fookes Software: http://www.fookes.com/
            > NoteTab website: http://www.notetab.com/
            > NoteTab Discussion Lists: http://www.notetab.com/groups.php
            >
            > ***
            > Yahoo! Groups Links
            >
            >
            >
            >
            >
            >
            >
            > ------------------------------------
            >
            > Fookes Software: http://www.fookes.com/
            > NoteTab website: http://www.notetab.com/
            > NoteTab Discussion Lists: http://www.notetab.com/groups.php
            >
            > ***
            > Yahoo! Groups Links
            >
            >
            >
            >
          • Don
            I run the clip: ^!Info Code=^$CharToDec(^$GetSelection$)$ I get 151 for example in the current copy and it appears to be a dash.
            Message 5 of 8 , Dec 10, 2010
            • 0 Attachment
              I run the clip:
              ^!Info Code=^$CharToDec(^$GetSelection$)$
              I get 151 for example in the current copy and it appears to be a dash.

              On 12/10/2010 12:07 PM, Don wrote:
              > I believe I am working with unicode. I am in notetab pro -- so is there
              > are font I can use? Can I get a copy of your clip John?
              >
              > The characters are for the most part em or en dashes, smart quotes and
              > smart apostrophes (meaning curled instead of straight).
              >
              > I really need the power of regex to work on these files.
              >
            • Paul
              Don, Windows has a utility that runs from the DOS prompt called iconv. For help, at a dos prompt type iconv --help. This utility will do the job the same way
              Message 6 of 8 , Dec 10, 2010
              • 0 Attachment
                Don,
                Windows has a utility that runs from the DOS prompt called iconv.
                For help, at a dos prompt type iconv --help.

                This utility will do the job the same way John proposed - it reads a saved file, converts the characters and writes another file.

                Naturally a short clip could automate this though I've not tried it, esp. if John has one to do the job.

                The problem, though, is that most utilities ignore 'unconvertible' characters or display them as a single hex code, 0xFF, for example.

                I don't have a UTF-8 doc to play with at the moment so I can't test this. Apologies for the verbiage to follow but Unicode can be a real pain and I haven't met anyone yet that says it's a piece of cake. Anyway...

                This example may be really useful for regex search replace.

                iconv âˆ'f KOI8âˆ'R âˆ'âˆ'byteâˆ'subst="<0x%x>"
                âˆ'âˆ'unicodeâˆ'subst="<U+%04X>"

                "converts input from the old Russian encoding KOI8âˆ'R to the locale encoding, substituting an angle bracket notation with hexadecimal numbers for invalid bytes and for valid but unconvertible characters."
                ref: http://www.gnu.org/software/libiconv/documentation/libiconv/iconv.1.html

                Naturally use "-f UTF-8" for converting from a Unicode source.. be warned though that once you save the file - depending on browser and other factors, the saved file may be ANSI or something other than UTF-8! Just to make life interesting!!

                If you end up with hex numbers in angle brackets then by inspection you can pick apart your document and replace bracketed hex numbers with appropriate normal characters. :) A clip would do nicely.

                Interesting And As-Yet Untested Info:
                //TRANSLIT is supposed to causes iconv to select an appropriate substitute for the destination encoding but may or may not work in your case.

                "When the string "//TRANSLIT" is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several characters that look similar to the original character."
                ref: http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html

                You won't find //TRANSLIT in the prompt help file so for more examples you may like to google it. If it works, it could save a truckload of work.

                Hope that helps. With an example of a problematic web-page I'd have a crack at it. :)
                Paul



                --- In ntb-clips@yahoogroups.com, Don <don@...> wrote:
                >
                > I believe I am working with unicode. I am in notetab pro -- so is there
                > are font I can use? Can I get a copy of your clip John?
                >
                > The characters are for the most part em or en dashes, smart quotes and
                > smart apostrophes (meaning curled instead of straight).
                >
                > I really need the power of regex to work on these files.
                >
                > On 12/10/2010 11:02 AM, John Shotsky wrote:
                > > Unicode characters use two bytes to describe each character. ASCII uses one. You can always use View Source on a web
                > > page to see what the character codes are.
                > >
                > > If you paste Unicode into NoteTab, you may experience unpredictable results. I have a clip that runs on a file to
                > > convert such characters, and THEN it is opened in NoteTab. You have to save the file without ever pasting it into
                > > Notetab, then browse to it using the clip, convert it, and finally open it.
                > >
                > > Regards,
                > > John
                > > http://recipetools.gotdns.com
                > >
                > >
                > > -----Original Message-----
                > > From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Don Daugherty
                > > Sent: Friday, December 10, 2010 7:46 AM
                > > To: ntb-clips@yahoogroups.com
                > > Subject: Re: [Clip] issue with copy - missing characters
                > >
                > > On 12/7/2010 7:42 PM, Don wrote:
                > >> I am copying web material and missing characters.
                > >>
                > >> For example if I copy this line:
                > >> Drink Up-Water, That Is!
                > >>
                > >> I get a square block between Up and Water.
                > >>
                > >> Same thing with curly apostophes, smart quotes, etc.
                > >>
                > >> Is there some way to preserve them on copying?
                > >>
                > > I think it isn't so much that they aren't being copied, but rather that
                > > they don't display "properly." The characters apparently either have
                > > ascii codes that aren't defined in the font of the document you are are
                > > pasting into or they are unicode characters (which I know litle about.)
                > > You can find out what character code NoteTab sees them as having by
                > > selecting them one at a time and running this simple clip:
                > > ^!Info Code=^$CharToDec(^$GetSelection$)$
                > > Then you can look at the fonts available in NoteTab (it varies between
                > > Pro and Std/Lite) to see whether there is one that contains the
                > > character in question.
                > >
                > >
                > >
                > > ------------------------------------
                > >
                > > Fookes Software: http://www.fookes.com/
                > > NoteTab website: http://www.notetab.com/
                > > NoteTab Discussion Lists: http://www.notetab.com/groups.php
                > >
                > > ***
                > > Yahoo! Groups Links
                > >
                > >
                > >
                > >
                > >
                > >
                > >
                > > ------------------------------------
                > >
                > > Fookes Software: http://www.fookes.com/
                > > NoteTab website: http://www.notetab.com/
                > > NoteTab Discussion Lists: http://www.notetab.com/groups.php
                > >
                > > ***
                > > Yahoo! Groups Links
                > >
                > >
                > >
                > >
                >
              Your message has been successfully submitted and would be delivered to recipients shortly.