Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] issue with copy - missing characters

Expand Messages
  • Don
    I run the clip: ^!Info Code=^$CharToDec(^$GetSelection$)$ I get 151 for example in the current copy and it appears to be a dash.
    Message 1 of 8 , Dec 10, 2010
    • 0 Attachment
      I run the clip:
      ^!Info Code=^$CharToDec(^$GetSelection$)$
      I get 151 for example in the current copy and it appears to be a dash.

      On 12/10/2010 12:07 PM, Don wrote:
      > I believe I am working with unicode. I am in notetab pro -- so is there
      > are font I can use? Can I get a copy of your clip John?
      >
      > The characters are for the most part em or en dashes, smart quotes and
      > smart apostrophes (meaning curled instead of straight).
      >
      > I really need the power of regex to work on these files.
      >
    • Paul
      Don, Windows has a utility that runs from the DOS prompt called iconv. For help, at a dos prompt type iconv --help. This utility will do the job the same way
      Message 2 of 8 , Dec 10, 2010
      • 0 Attachment
        Don,
        Windows has a utility that runs from the DOS prompt called iconv.
        For help, at a dos prompt type iconv --help.

        This utility will do the job the same way John proposed - it reads a saved file, converts the characters and writes another file.

        Naturally a short clip could automate this though I've not tried it, esp. if John has one to do the job.

        The problem, though, is that most utilities ignore 'unconvertible' characters or display them as a single hex code, 0xFF, for example.

        I don't have a UTF-8 doc to play with at the moment so I can't test this. Apologies for the verbiage to follow but Unicode can be a real pain and I haven't met anyone yet that says it's a piece of cake. Anyway...

        This example may be really useful for regex search replace.

        iconv âˆ'f KOI8âˆ'R âˆ'âˆ'byteâˆ'subst="<0x%x>"
        âˆ'âˆ'unicodeâˆ'subst="<U+%04X>"

        "converts input from the old Russian encoding KOI8âˆ'R to the locale encoding, substituting an angle bracket notation with hexadecimal numbers for invalid bytes and for valid but unconvertible characters."
        ref: http://www.gnu.org/software/libiconv/documentation/libiconv/iconv.1.html

        Naturally use "-f UTF-8" for converting from a Unicode source.. be warned though that once you save the file - depending on browser and other factors, the saved file may be ANSI or something other than UTF-8! Just to make life interesting!!

        If you end up with hex numbers in angle brackets then by inspection you can pick apart your document and replace bracketed hex numbers with appropriate normal characters. :) A clip would do nicely.

        Interesting And As-Yet Untested Info:
        //TRANSLIT is supposed to causes iconv to select an appropriate substitute for the destination encoding but may or may not work in your case.

        "When the string "//TRANSLIT" is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several characters that look similar to the original character."
        ref: http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html

        You won't find //TRANSLIT in the prompt help file so for more examples you may like to google it. If it works, it could save a truckload of work.

        Hope that helps. With an example of a problematic web-page I'd have a crack at it. :)
        Paul



        --- In ntb-clips@yahoogroups.com, Don <don@...> wrote:
        >
        > I believe I am working with unicode. I am in notetab pro -- so is there
        > are font I can use? Can I get a copy of your clip John?
        >
        > The characters are for the most part em or en dashes, smart quotes and
        > smart apostrophes (meaning curled instead of straight).
        >
        > I really need the power of regex to work on these files.
        >
        > On 12/10/2010 11:02 AM, John Shotsky wrote:
        > > Unicode characters use two bytes to describe each character. ASCII uses one. You can always use View Source on a web
        > > page to see what the character codes are.
        > >
        > > If you paste Unicode into NoteTab, you may experience unpredictable results. I have a clip that runs on a file to
        > > convert such characters, and THEN it is opened in NoteTab. You have to save the file without ever pasting it into
        > > Notetab, then browse to it using the clip, convert it, and finally open it.
        > >
        > > Regards,
        > > John
        > > http://recipetools.gotdns.com
        > >
        > >
        > > -----Original Message-----
        > > From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Don Daugherty
        > > Sent: Friday, December 10, 2010 7:46 AM
        > > To: ntb-clips@yahoogroups.com
        > > Subject: Re: [Clip] issue with copy - missing characters
        > >
        > > On 12/7/2010 7:42 PM, Don wrote:
        > >> I am copying web material and missing characters.
        > >>
        > >> For example if I copy this line:
        > >> Drink Up-Water, That Is!
        > >>
        > >> I get a square block between Up and Water.
        > >>
        > >> Same thing with curly apostophes, smart quotes, etc.
        > >>
        > >> Is there some way to preserve them on copying?
        > >>
        > > I think it isn't so much that they aren't being copied, but rather that
        > > they don't display "properly." The characters apparently either have
        > > ascii codes that aren't defined in the font of the document you are are
        > > pasting into or they are unicode characters (which I know litle about.)
        > > You can find out what character code NoteTab sees them as having by
        > > selecting them one at a time and running this simple clip:
        > > ^!Info Code=^$CharToDec(^$GetSelection$)$
        > > Then you can look at the fonts available in NoteTab (it varies between
        > > Pro and Std/Lite) to see whether there is one that contains the
        > > character in question.
        > >
        > >
        > >
        > > ------------------------------------
        > >
        > > Fookes Software: http://www.fookes.com/
        > > NoteTab website: http://www.notetab.com/
        > > NoteTab Discussion Lists: http://www.notetab.com/groups.php
        > >
        > > ***
        > > Yahoo! Groups Links
        > >
        > >
        > >
        > >
        > >
        > >
        > >
        > > ------------------------------------
        > >
        > > Fookes Software: http://www.fookes.com/
        > > NoteTab website: http://www.notetab.com/
        > > NoteTab Discussion Lists: http://www.notetab.com/groups.php
        > >
        > > ***
        > > Yahoo! Groups Links
        > >
        > >
        > >
        > >
        >
      Your message has been successfully submitted and would be delivered to recipients shortly.