Loading ...
Sorry, an error occurred while loading the content.

issue with copy - missing characters

Expand Messages
  • Don
    I am copying web material and missing characters. For example if I copy this line: Drink Up—Water, That Is! I get a square block between Up and Water. Same
    Message 1 of 8 , Dec 7, 2010
    View Source
    • 0 Attachment
      I am copying web material and missing characters.

      For example if I copy this line:
      Drink Up—Water, That Is!

      I get a square block between Up and Water.

      Same thing with curly apostophes, smart quotes, etc.

      Is there some way to preserve them on copying?
    • Axel Berger
      ... This is an encoding problem. Take a look at the source code what is actually written there. As this only happens with a limited number of characters I d
      Message 2 of 8 , Dec 7, 2010
      View Source
      • 0 Attachment
        Don wrote:
        > I get a square block between Up and Water.
        > Is there some way to preserve them on copying?

        This is an encoding problem. Take a look at the source code what is
        actually written there. As this only happens with a limited number of
        characters I'd run a set of replaces over the file.

        I have a pascal program for this that takes a 256 bytes long file that
        in n-th position holds that character which character n is to be
        replaced with.

        Axel
      • Eb
        Don, The character you used as example _is_ preserved, but NoteTab and/or the font you re using cannot display it properly. In my case, it displays fine in the
        Message 3 of 8 , Dec 8, 2010
        View Source
        • 0 Attachment
          Don,

          The character you used as example _is_ preserved, but NoteTab and/or the font you're using cannot display it properly. In my case, it displays fine in the document window (Courier New), but not in pop-up windows, like from ^!Info.

          To assure yourself that the character is still present, select the square or whatever shows on your screen, then run a clip with ^$CharToDec(^$GetSelection$)$ to tell you what ts ascii code is.
          Or, in this case, you could just replace the long dash with a short hyphen, or two.


          Cheers,


          Eb




          --- In ntb-clips@yahoogroups.com, Don <don@...> wrote:
          >
          > I am copying web material and missing characters.
          >
          > For example if I copy this line:
          > Drink Up—Water, That Is!
          >
          > I get a square block between Up and Water.
          >
          > Same thing with curly apostophes, smart quotes, etc.
          >
          > Is there some way to preserve them on copying?
          >
        • Don Daugherty
          ... I think it isn t so much that they aren t being copied, but rather that they don t display properly. The characters apparently either have ascii codes
          Message 4 of 8 , Dec 10, 2010
          View Source
          • 0 Attachment
            On 12/7/2010 7:42 PM, Don wrote:
            > I am copying web material and missing characters.
            >
            > For example if I copy this line:
            > Drink Up—Water, That Is!
            >
            > I get a square block between Up and Water.
            >
            > Same thing with curly apostophes, smart quotes, etc.
            >
            > Is there some way to preserve them on copying?
            >
            I think it isn't so much that they aren't being copied, but rather that
            they don't display "properly." The characters apparently either have
            ascii codes that aren't defined in the font of the document you are are
            pasting into or they are unicode characters (which I know litle about.)
            You can find out what character code NoteTab sees them as having by
            selecting them one at a time and running this simple clip:
            ^!Info Code=^$CharToDec(^$GetSelection$)$
            Then you can look at the fonts available in NoteTab (it varies between
            Pro and Std/Lite) to see whether there is one that contains the
            character in question.
          • John Shotsky
            Unicode characters use two bytes to describe each character. ASCII uses one. You can always use View Source on a web page to see what the character codes are.
            Message 5 of 8 , Dec 10, 2010
            View Source
            • 0 Attachment
              Unicode characters use two bytes to describe each character. ASCII uses one. You can always use View Source on a web
              page to see what the character codes are.

              If you paste Unicode into NoteTab, you may experience unpredictable results. I have a clip that runs on a file to
              convert such characters, and THEN it is opened in NoteTab. You have to save the file without ever pasting it into
              Notetab, then browse to it using the clip, convert it, and finally open it.

              Regards,
              John
              http://recipetools.gotdns.com


              -----Original Message-----
              From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Don Daugherty
              Sent: Friday, December 10, 2010 7:46 AM
              To: ntb-clips@yahoogroups.com
              Subject: Re: [Clip] issue with copy - missing characters

              On 12/7/2010 7:42 PM, Don wrote:
              > I am copying web material and missing characters.
              >
              > For example if I copy this line:
              > Drink Up-Water, That Is!
              >
              > I get a square block between Up and Water.
              >
              > Same thing with curly apostophes, smart quotes, etc.
              >
              > Is there some way to preserve them on copying?
              >
              I think it isn't so much that they aren't being copied, but rather that
              they don't display "properly." The characters apparently either have
              ascii codes that aren't defined in the font of the document you are are
              pasting into or they are unicode characters (which I know litle about.)
              You can find out what character code NoteTab sees them as having by
              selecting them one at a time and running this simple clip:
              ^!Info Code=^$CharToDec(^$GetSelection$)$
              Then you can look at the fonts available in NoteTab (it varies between
              Pro and Std/Lite) to see whether there is one that contains the
              character in question.



              ------------------------------------

              Fookes Software: http://www.fookes.com/
              NoteTab website: http://www.notetab.com/
              NoteTab Discussion Lists: http://www.notetab.com/groups.php

              ***
              Yahoo! Groups Links
            • Don
              I believe I am working with unicode. I am in notetab pro -- so is there are font I can use? Can I get a copy of your clip John? The characters are for the
              Message 6 of 8 , Dec 10, 2010
              View Source
              • 0 Attachment
                I believe I am working with unicode. I am in notetab pro -- so is there
                are font I can use? Can I get a copy of your clip John?

                The characters are for the most part em or en dashes, smart quotes and
                smart apostrophes (meaning curled instead of straight).

                I really need the power of regex to work on these files.

                On 12/10/2010 11:02 AM, John Shotsky wrote:
                > Unicode characters use two bytes to describe each character. ASCII uses one. You can always use View Source on a web
                > page to see what the character codes are.
                >
                > If you paste Unicode into NoteTab, you may experience unpredictable results. I have a clip that runs on a file to
                > convert such characters, and THEN it is opened in NoteTab. You have to save the file without ever pasting it into
                > Notetab, then browse to it using the clip, convert it, and finally open it.
                >
                > Regards,
                > John
                > http://recipetools.gotdns.com
                >
                >
                > -----Original Message-----
                > From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Don Daugherty
                > Sent: Friday, December 10, 2010 7:46 AM
                > To: ntb-clips@yahoogroups.com
                > Subject: Re: [Clip] issue with copy - missing characters
                >
                > On 12/7/2010 7:42 PM, Don wrote:
                >> I am copying web material and missing characters.
                >>
                >> For example if I copy this line:
                >> Drink Up-Water, That Is!
                >>
                >> I get a square block between Up and Water.
                >>
                >> Same thing with curly apostophes, smart quotes, etc.
                >>
                >> Is there some way to preserve them on copying?
                >>
                > I think it isn't so much that they aren't being copied, but rather that
                > they don't display "properly." The characters apparently either have
                > ascii codes that aren't defined in the font of the document you are are
                > pasting into or they are unicode characters (which I know litle about.)
                > You can find out what character code NoteTab sees them as having by
                > selecting them one at a time and running this simple clip:
                > ^!Info Code=^$CharToDec(^$GetSelection$)$
                > Then you can look at the fonts available in NoteTab (it varies between
                > Pro and Std/Lite) to see whether there is one that contains the
                > character in question.
                >
                >
                >
                > ------------------------------------
                >
                > Fookes Software: http://www.fookes.com/
                > NoteTab website: http://www.notetab.com/
                > NoteTab Discussion Lists: http://www.notetab.com/groups.php
                >
                > ***
                > Yahoo! Groups Links
                >
                >
                >
                >
                >
                >
                >
                > ------------------------------------
                >
                > Fookes Software: http://www.fookes.com/
                > NoteTab website: http://www.notetab.com/
                > NoteTab Discussion Lists: http://www.notetab.com/groups.php
                >
                > ***
                > Yahoo! Groups Links
                >
                >
                >
                >
              • Don
                I run the clip: ^!Info Code=^$CharToDec(^$GetSelection$)$ I get 151 for example in the current copy and it appears to be a dash.
                Message 7 of 8 , Dec 10, 2010
                View Source
                • 0 Attachment
                  I run the clip:
                  ^!Info Code=^$CharToDec(^$GetSelection$)$
                  I get 151 for example in the current copy and it appears to be a dash.

                  On 12/10/2010 12:07 PM, Don wrote:
                  > I believe I am working with unicode. I am in notetab pro -- so is there
                  > are font I can use? Can I get a copy of your clip John?
                  >
                  > The characters are for the most part em or en dashes, smart quotes and
                  > smart apostrophes (meaning curled instead of straight).
                  >
                  > I really need the power of regex to work on these files.
                  >
                • Paul
                  Don, Windows has a utility that runs from the DOS prompt called iconv. For help, at a dos prompt type iconv --help. This utility will do the job the same way
                  Message 8 of 8 , Dec 10, 2010
                  View Source
                  • 0 Attachment
                    Don,
                    Windows has a utility that runs from the DOS prompt called iconv.
                    For help, at a dos prompt type iconv --help.

                    This utility will do the job the same way John proposed - it reads a saved file, converts the characters and writes another file.

                    Naturally a short clip could automate this though I've not tried it, esp. if John has one to do the job.

                    The problem, though, is that most utilities ignore 'unconvertible' characters or display them as a single hex code, 0xFF, for example.

                    I don't have a UTF-8 doc to play with at the moment so I can't test this. Apologies for the verbiage to follow but Unicode can be a real pain and I haven't met anyone yet that says it's a piece of cake. Anyway...

                    This example may be really useful for regex search replace.

                    iconv âˆ'f KOI8âˆ'R âˆ'âˆ'byteâˆ'subst="<0x%x>"
                    âˆ'âˆ'unicodeâˆ'subst="<U+%04X>"

                    "converts input from the old Russian encoding KOI8âˆ'R to the locale encoding, substituting an angle bracket notation with hexadecimal numbers for invalid bytes and for valid but unconvertible characters."
                    ref: http://www.gnu.org/software/libiconv/documentation/libiconv/iconv.1.html

                    Naturally use "-f UTF-8" for converting from a Unicode source.. be warned though that once you save the file - depending on browser and other factors, the saved file may be ANSI or something other than UTF-8! Just to make life interesting!!

                    If you end up with hex numbers in angle brackets then by inspection you can pick apart your document and replace bracketed hex numbers with appropriate normal characters. :) A clip would do nicely.

                    Interesting And As-Yet Untested Info:
                    //TRANSLIT is supposed to causes iconv to select an appropriate substitute for the destination encoding but may or may not work in your case.

                    "When the string "//TRANSLIT" is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several characters that look similar to the original character."
                    ref: http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html

                    You won't find //TRANSLIT in the prompt help file so for more examples you may like to google it. If it works, it could save a truckload of work.

                    Hope that helps. With an example of a problematic web-page I'd have a crack at it. :)
                    Paul



                    --- In ntb-clips@yahoogroups.com, Don <don@...> wrote:
                    >
                    > I believe I am working with unicode. I am in notetab pro -- so is there
                    > are font I can use? Can I get a copy of your clip John?
                    >
                    > The characters are for the most part em or en dashes, smart quotes and
                    > smart apostrophes (meaning curled instead of straight).
                    >
                    > I really need the power of regex to work on these files.
                    >
                    > On 12/10/2010 11:02 AM, John Shotsky wrote:
                    > > Unicode characters use two bytes to describe each character. ASCII uses one. You can always use View Source on a web
                    > > page to see what the character codes are.
                    > >
                    > > If you paste Unicode into NoteTab, you may experience unpredictable results. I have a clip that runs on a file to
                    > > convert such characters, and THEN it is opened in NoteTab. You have to save the file without ever pasting it into
                    > > Notetab, then browse to it using the clip, convert it, and finally open it.
                    > >
                    > > Regards,
                    > > John
                    > > http://recipetools.gotdns.com
                    > >
                    > >
                    > > -----Original Message-----
                    > > From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Don Daugherty
                    > > Sent: Friday, December 10, 2010 7:46 AM
                    > > To: ntb-clips@yahoogroups.com
                    > > Subject: Re: [Clip] issue with copy - missing characters
                    > >
                    > > On 12/7/2010 7:42 PM, Don wrote:
                    > >> I am copying web material and missing characters.
                    > >>
                    > >> For example if I copy this line:
                    > >> Drink Up-Water, That Is!
                    > >>
                    > >> I get a square block between Up and Water.
                    > >>
                    > >> Same thing with curly apostophes, smart quotes, etc.
                    > >>
                    > >> Is there some way to preserve them on copying?
                    > >>
                    > > I think it isn't so much that they aren't being copied, but rather that
                    > > they don't display "properly." The characters apparently either have
                    > > ascii codes that aren't defined in the font of the document you are are
                    > > pasting into or they are unicode characters (which I know litle about.)
                    > > You can find out what character code NoteTab sees them as having by
                    > > selecting them one at a time and running this simple clip:
                    > > ^!Info Code=^$CharToDec(^$GetSelection$)$
                    > > Then you can look at the fonts available in NoteTab (it varies between
                    > > Pro and Std/Lite) to see whether there is one that contains the
                    > > character in question.
                    > >
                    > >
                    > >
                    > > ------------------------------------
                    > >
                    > > Fookes Software: http://www.fookes.com/
                    > > NoteTab website: http://www.notetab.com/
                    > > NoteTab Discussion Lists: http://www.notetab.com/groups.php
                    > >
                    > > ***
                    > > Yahoo! Groups Links
                    > >
                    > >
                    > >
                    > >
                    > >
                    > >
                    > >
                    > > ------------------------------------
                    > >
                    > > Fookes Software: http://www.fookes.com/
                    > > NoteTab website: http://www.notetab.com/
                    > > NoteTab Discussion Lists: http://www.notetab.com/groups.php
                    > >
                    > > ***
                    > > Yahoo! Groups Links
                    > >
                    > >
                    > >
                    > >
                    >
                  Your message has been successfully submitted and would be delivered to recipients shortly.