Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] issue with copy - missing characters

Expand Messages
  • Don Daugherty
    ... I think it isn t so much that they aren t being copied, but rather that they don t display properly. The characters apparently either have ascii codes
    Message 1 of 8 , Dec 10, 2010
    • 0 Attachment
      On 12/7/2010 7:42 PM, Don wrote:
      > I am copying web material and missing characters.
      >
      > For example if I copy this line:
      > Drink Up—Water, That Is!
      >
      > I get a square block between Up and Water.
      >
      > Same thing with curly apostophes, smart quotes, etc.
      >
      > Is there some way to preserve them on copying?
      >
      I think it isn't so much that they aren't being copied, but rather that
      they don't display "properly." The characters apparently either have
      ascii codes that aren't defined in the font of the document you are are
      pasting into or they are unicode characters (which I know litle about.)
      You can find out what character code NoteTab sees them as having by
      selecting them one at a time and running this simple clip:
      ^!Info Code=^$CharToDec(^$GetSelection$)$
      Then you can look at the fonts available in NoteTab (it varies between
      Pro and Std/Lite) to see whether there is one that contains the
      character in question.
    • John Shotsky
      Unicode characters use two bytes to describe each character. ASCII uses one. You can always use View Source on a web page to see what the character codes are.
      Message 2 of 8 , Dec 10, 2010
      • 0 Attachment
        Unicode characters use two bytes to describe each character. ASCII uses one. You can always use View Source on a web
        page to see what the character codes are.

        If you paste Unicode into NoteTab, you may experience unpredictable results. I have a clip that runs on a file to
        convert such characters, and THEN it is opened in NoteTab. You have to save the file without ever pasting it into
        Notetab, then browse to it using the clip, convert it, and finally open it.

        Regards,
        John
        http://recipetools.gotdns.com


        -----Original Message-----
        From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Don Daugherty
        Sent: Friday, December 10, 2010 7:46 AM
        To: ntb-clips@yahoogroups.com
        Subject: Re: [Clip] issue with copy - missing characters

        On 12/7/2010 7:42 PM, Don wrote:
        > I am copying web material and missing characters.
        >
        > For example if I copy this line:
        > Drink Up-Water, That Is!
        >
        > I get a square block between Up and Water.
        >
        > Same thing with curly apostophes, smart quotes, etc.
        >
        > Is there some way to preserve them on copying?
        >
        I think it isn't so much that they aren't being copied, but rather that
        they don't display "properly." The characters apparently either have
        ascii codes that aren't defined in the font of the document you are are
        pasting into or they are unicode characters (which I know litle about.)
        You can find out what character code NoteTab sees them as having by
        selecting them one at a time and running this simple clip:
        ^!Info Code=^$CharToDec(^$GetSelection$)$
        Then you can look at the fonts available in NoteTab (it varies between
        Pro and Std/Lite) to see whether there is one that contains the
        character in question.



        ------------------------------------

        Fookes Software: http://www.fookes.com/
        NoteTab website: http://www.notetab.com/
        NoteTab Discussion Lists: http://www.notetab.com/groups.php

        ***
        Yahoo! Groups Links
      • Don
        I believe I am working with unicode. I am in notetab pro -- so is there are font I can use? Can I get a copy of your clip John? The characters are for the
        Message 3 of 8 , Dec 10, 2010
        • 0 Attachment
          I believe I am working with unicode. I am in notetab pro -- so is there
          are font I can use? Can I get a copy of your clip John?

          The characters are for the most part em or en dashes, smart quotes and
          smart apostrophes (meaning curled instead of straight).

          I really need the power of regex to work on these files.

          On 12/10/2010 11:02 AM, John Shotsky wrote:
          > Unicode characters use two bytes to describe each character. ASCII uses one. You can always use View Source on a web
          > page to see what the character codes are.
          >
          > If you paste Unicode into NoteTab, you may experience unpredictable results. I have a clip that runs on a file to
          > convert such characters, and THEN it is opened in NoteTab. You have to save the file without ever pasting it into
          > Notetab, then browse to it using the clip, convert it, and finally open it.
          >
          > Regards,
          > John
          > http://recipetools.gotdns.com
          >
          >
          > -----Original Message-----
          > From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Don Daugherty
          > Sent: Friday, December 10, 2010 7:46 AM
          > To: ntb-clips@yahoogroups.com
          > Subject: Re: [Clip] issue with copy - missing characters
          >
          > On 12/7/2010 7:42 PM, Don wrote:
          >> I am copying web material and missing characters.
          >>
          >> For example if I copy this line:
          >> Drink Up-Water, That Is!
          >>
          >> I get a square block between Up and Water.
          >>
          >> Same thing with curly apostophes, smart quotes, etc.
          >>
          >> Is there some way to preserve them on copying?
          >>
          > I think it isn't so much that they aren't being copied, but rather that
          > they don't display "properly." The characters apparently either have
          > ascii codes that aren't defined in the font of the document you are are
          > pasting into or they are unicode characters (which I know litle about.)
          > You can find out what character code NoteTab sees them as having by
          > selecting them one at a time and running this simple clip:
          > ^!Info Code=^$CharToDec(^$GetSelection$)$
          > Then you can look at the fonts available in NoteTab (it varies between
          > Pro and Std/Lite) to see whether there is one that contains the
          > character in question.
          >
          >
          >
          > ------------------------------------
          >
          > Fookes Software: http://www.fookes.com/
          > NoteTab website: http://www.notetab.com/
          > NoteTab Discussion Lists: http://www.notetab.com/groups.php
          >
          > ***
          > Yahoo! Groups Links
          >
          >
          >
          >
          >
          >
          >
          > ------------------------------------
          >
          > Fookes Software: http://www.fookes.com/
          > NoteTab website: http://www.notetab.com/
          > NoteTab Discussion Lists: http://www.notetab.com/groups.php
          >
          > ***
          > Yahoo! Groups Links
          >
          >
          >
          >
        • Don
          I run the clip: ^!Info Code=^$CharToDec(^$GetSelection$)$ I get 151 for example in the current copy and it appears to be a dash.
          Message 4 of 8 , Dec 10, 2010
          • 0 Attachment
            I run the clip:
            ^!Info Code=^$CharToDec(^$GetSelection$)$
            I get 151 for example in the current copy and it appears to be a dash.

            On 12/10/2010 12:07 PM, Don wrote:
            > I believe I am working with unicode. I am in notetab pro -- so is there
            > are font I can use? Can I get a copy of your clip John?
            >
            > The characters are for the most part em or en dashes, smart quotes and
            > smart apostrophes (meaning curled instead of straight).
            >
            > I really need the power of regex to work on these files.
            >
          • Paul
            Don, Windows has a utility that runs from the DOS prompt called iconv. For help, at a dos prompt type iconv --help. This utility will do the job the same way
            Message 5 of 8 , Dec 10, 2010
            • 0 Attachment
              Don,
              Windows has a utility that runs from the DOS prompt called iconv.
              For help, at a dos prompt type iconv --help.

              This utility will do the job the same way John proposed - it reads a saved file, converts the characters and writes another file.

              Naturally a short clip could automate this though I've not tried it, esp. if John has one to do the job.

              The problem, though, is that most utilities ignore 'unconvertible' characters or display them as a single hex code, 0xFF, for example.

              I don't have a UTF-8 doc to play with at the moment so I can't test this. Apologies for the verbiage to follow but Unicode can be a real pain and I haven't met anyone yet that says it's a piece of cake. Anyway...

              This example may be really useful for regex search replace.

              iconv âˆ'f KOI8âˆ'R âˆ'âˆ'byteâˆ'subst="<0x%x>"
              âˆ'âˆ'unicodeâˆ'subst="<U+%04X>"

              "converts input from the old Russian encoding KOI8âˆ'R to the locale encoding, substituting an angle bracket notation with hexadecimal numbers for invalid bytes and for valid but unconvertible characters."
              ref: http://www.gnu.org/software/libiconv/documentation/libiconv/iconv.1.html

              Naturally use "-f UTF-8" for converting from a Unicode source.. be warned though that once you save the file - depending on browser and other factors, the saved file may be ANSI or something other than UTF-8! Just to make life interesting!!

              If you end up with hex numbers in angle brackets then by inspection you can pick apart your document and replace bracketed hex numbers with appropriate normal characters. :) A clip would do nicely.

              Interesting And As-Yet Untested Info:
              //TRANSLIT is supposed to causes iconv to select an appropriate substitute for the destination encoding but may or may not work in your case.

              "When the string "//TRANSLIT" is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several characters that look similar to the original character."
              ref: http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html

              You won't find //TRANSLIT in the prompt help file so for more examples you may like to google it. If it works, it could save a truckload of work.

              Hope that helps. With an example of a problematic web-page I'd have a crack at it. :)
              Paul



              --- In ntb-clips@yahoogroups.com, Don <don@...> wrote:
              >
              > I believe I am working with unicode. I am in notetab pro -- so is there
              > are font I can use? Can I get a copy of your clip John?
              >
              > The characters are for the most part em or en dashes, smart quotes and
              > smart apostrophes (meaning curled instead of straight).
              >
              > I really need the power of regex to work on these files.
              >
              > On 12/10/2010 11:02 AM, John Shotsky wrote:
              > > Unicode characters use two bytes to describe each character. ASCII uses one. You can always use View Source on a web
              > > page to see what the character codes are.
              > >
              > > If you paste Unicode into NoteTab, you may experience unpredictable results. I have a clip that runs on a file to
              > > convert such characters, and THEN it is opened in NoteTab. You have to save the file without ever pasting it into
              > > Notetab, then browse to it using the clip, convert it, and finally open it.
              > >
              > > Regards,
              > > John
              > > http://recipetools.gotdns.com
              > >
              > >
              > > -----Original Message-----
              > > From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Don Daugherty
              > > Sent: Friday, December 10, 2010 7:46 AM
              > > To: ntb-clips@yahoogroups.com
              > > Subject: Re: [Clip] issue with copy - missing characters
              > >
              > > On 12/7/2010 7:42 PM, Don wrote:
              > >> I am copying web material and missing characters.
              > >>
              > >> For example if I copy this line:
              > >> Drink Up-Water, That Is!
              > >>
              > >> I get a square block between Up and Water.
              > >>
              > >> Same thing with curly apostophes, smart quotes, etc.
              > >>
              > >> Is there some way to preserve them on copying?
              > >>
              > > I think it isn't so much that they aren't being copied, but rather that
              > > they don't display "properly." The characters apparently either have
              > > ascii codes that aren't defined in the font of the document you are are
              > > pasting into or they are unicode characters (which I know litle about.)
              > > You can find out what character code NoteTab sees them as having by
              > > selecting them one at a time and running this simple clip:
              > > ^!Info Code=^$CharToDec(^$GetSelection$)$
              > > Then you can look at the fonts available in NoteTab (it varies between
              > > Pro and Std/Lite) to see whether there is one that contains the
              > > character in question.
              > >
              > >
              > >
              > > ------------------------------------
              > >
              > > Fookes Software: http://www.fookes.com/
              > > NoteTab website: http://www.notetab.com/
              > > NoteTab Discussion Lists: http://www.notetab.com/groups.php
              > >
              > > ***
              > > Yahoo! Groups Links
              > >
              > >
              > >
              > >
              > >
              > >
              > >
              > > ------------------------------------
              > >
              > > Fookes Software: http://www.fookes.com/
              > > NoteTab website: http://www.notetab.com/
              > > NoteTab Discussion Lists: http://www.notetab.com/groups.php
              > >
              > > ***
              > > Yahoo! Groups Links
              > >
              > >
              > >
              > >
              >
            Your message has been successfully submitted and would be delivered to recipients shortly.