Loading ...
Sorry, an error occurred while loading the content.
 

RE: [Clip] How to fix non-Ascii characters using NoteTab

Expand Messages
  • John Shotsky
    Thanks, Art. I found a solution for now, but I m not sure how long it will be good, since I ll have to see many more files to determine whether it is always
    Message 1 of 33 , Aug 31, 2013
      Thanks, Art.
      I found a solution for now, but I'm not sure how long it will be good, since I'll have to see many more files to determine whether
      it is always valid, or just sometimes. It amounts to opening the file in NoteTab in UTF-8 (no conversion) mode. That way, it doesn't
      throw away any characters. Then the multi-byte characters read as gibberish, but it seems to be consistent gibberish, so when I
      replace these fractions:
      ;��
      ^!Replace "��" >> "1/2" ARSW
      ^!IfError Next Else Skip_-1
      ;���
      ^!Replace "���" >> "1/3" ARSW
      ^!IfError Next Else Skip_-1
      ;���
      ^!Replace "���" >> "2/3" ARSW
      ^!IfError Next Else Skip_-1
      ;��
      ^!Replace "��" >> "1/4" ARSW
      ^!IfError Next Else Skip_-1
      ;��
      ^!Replace "��" >> "3/4" ARSW
      ^!IfError Next Else Skip_-1
      ;���
      ^!Replace "���" >> "1/8" ARSW
      ^!IfError Next Else Skip_-1
      ;���
      ^!Replace "���" >> "3/8" ARSW
      ^!IfError Next Else Skip_-1

      I think the gibberish will always be the same, so it should work on other files. There's no need for my users to understand any of
      this, the whole thing runs without them lifting a finger. When saved, it will be Ansi, so none of the characters will be lost. The
      only problem for me is that I don't know what the gibberish will look like until I see it, and determine what character it was
      supposed to be, then plug in the replace statement. I'm almost through a 10K line file and have found most of the characters that
      appear in that file, but obviously there will be more in the future that aren't in this file - like the missing fractions that
      surely will appear sometime. I haven't found a way to fake this - anything I paste into the html file reads fine.

      Regards,
      John
      RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/
      John's Mags Yahoo Group: <http://groups.yahoo.com/group/johnsmags/> http://groups.yahoo.com/group/johnsmags/

      From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Art Kocsis
      Sent: Saturday, August 31, 2013 15:57
      To: NoteTab-Clips
      Subject: RE: [Clip] How to fix non-Ascii characters using NoteTab


      At 8/31/2013 10:18 AM, John wrote:
      >I use EditPad Pro on an expired trial for working with Unicode files.
      >When I open the html file with EditPad I can see these characters just fine.
      >If I simply copy one line of text containing the vulgar fraction for 1/3 from EditPad Pro (without having saved, etc) and paste it
      >into a brand new blank document in NoteTab Pro, the character is converted to a question mark. If it wasn't, I would not have asked
      >my question about how to fix this problem.
      >If NoteTab could process a disk file without opening it somehow, or by some surreptious kind of an open that would not screw up
      >these characters, then I could fix them. I COULD fix them in EditPad pro, but again, it would one-by-one, and visually scanning
      each
      >line in a 10,000 line document to see what characters were messed up. No program I know of will convert vulgar fractions to three
      >character fractions automatically, so they have to be dealt with at the basic form - the character itself Other characters in the
      >attached document include smart apostrophes, again in the open, not encoded. NoteTab can't handle those either.

      Heh. Although Axel's workaround might work, if people on this list cannot understand your quite clear description of your problem,
      how can you expect your multitudes of (unwashed?) users to perform a complex multi-step process?
      Good luck with that.

      Trying for a solution that will work with all extant versions of NTB, all levels of user expertise and staying strictly within NTB
      seems like an impossibility. However, there are a couple of relatively simple external tools you could use and completely control
      with your clips. All are freeware.

      FAR (Find And Replace) [http://www.f2ko.de/programs.php?lang=en <http://www.f2ko.de/programs.php?lang=en&pid=far> &pid=far]
      FAR is a tiny (14K for X32, 18K for X64), command line app that you could use to preprocess your files. It is an on-disk search and
      replace one text string with another text string. Either or both strings may be defined as numeric character codes. Or you could use
      it in just its search mode to identify, locate all positions and count all occurrences of your problem characters. Unfortunately,
      since it is not a RegEx search, it means an invocation for each possible problem character in each possible problem file. Doable but
      messy.

      FindStr [Included with Windows]
      FindStr is another command line app but it is already on everybody's system, supports RegEx and supports file spec wild cards.
      Although it does not replace any strings found it can output the exact location of any found pattern. And, again, easily called form
      a NTB clip.

      GREP (CLI) [http://www.gnu.org/software/grep/grep.html%5d
      WinGREP (GUI) [http://stefanstools.sourceforge.net/grepWin.html%5d
      Like FindStr, the command line version of GREP searches for but does not replace normal or RegEX search patterns. WinGREP can
      replace search strings, operates on multiple files, supports saved search expressions and provides feedback on results. However, it
      is a separate Windows app not easily controlled by a NTB clip.

      AWK/GAWK [http://www.gnu.org/software/gawk/gawk.html%5d
      AWK/GAWK is a small-medium (350K), command line tool and the most capable. However, it is also the most difficult to learn and use.
      It is not just a search and replace tool but is a text processing language. If you love RegEX you will love AWK. Although powerful,
      it is the typical 'Nix app with esoteric commands and options. You write AWK programs not pattern searches. If you invest the time
      and effort into learning the language you would be able to handle anything thrown your way.

      PERL http://www.perl.org/
      PERL scripts are supported directly by NTB. It is another text processing language that perhaps could be used to accomplish your
      tasks.

      HTH, Art

      GREP [
      Set up a batch file to



      [Non-text portions of this message have been removed]
    • Roopakshi Pathania
      Hi Axel, I ve not been following this thread, but will throw out a couple of suggestions based on what I ve read. If you wish to use those fraction characters
      Message 33 of 33 , Sep 7, 2013
        Hi Axel,

        I've not been following this thread, but will throw out a couple of suggestions based on what I've read.
        If you wish to use those fraction characters both for entering/ back converting into NTP or converting them into HTML, why not try MathML or LaTeX?
        MathML may be a bit tedius, but it is appropriate for HTML form, and is readable as well as replaceable in any text editor.
        LaTeX can be entered and converted into HTML using TeX4HT. It is also replaceable.

        Again, since I didn't read most mails, I'm not sure if my suggestions would help.

        Sent from my Lenovo ThinkPad

        --------------------------------------------
        On Sun, 9/1/13, Axel Berger <Axel-Berger@...> wrote:

        Subject: Re: [Clip] How to fix non-Ascii characters using NoteTab
        To: ntb-clips@yahoogroups.com
        Date: Sunday, September 1, 2013, 12:31 AM
















         









        John Shotsky wrote:

        > I use EditPad Pro on an expired trial for working with
        Unicode files.

        > When I open the html file with EditPad I can see these
        characters

        > just fine.



        That may well be the problem. That and some shenanigans
        Windows itself

        engages in with copying and pasting.



        > I have taken the liberty of cc'ing your personal
        email address,

        > and have attached the html.



        I have opened the html in firefox and a UTF UNaware simple
        editor. In

        the first I see all characters and copying and pasting
        translates them

        from UTF to ANSI or an ASCII equivalent thus:



        ¼ cup flour

        ¾ cup milk

        1/3 cup flour



        The editors shows me the individual bytes the characters are
        made of and

        I can copy them to NT unchanged:



        >¾</strong> cup milk</div>

        >¼ cup flour</strong></div>

        >â…“ cup flour</strong></div>



        Running my own UTF script over them yields:



        >¾</strong> cup milk</div>

        >¼ cup flour</strong></div>

        >⅓ cup flour</strong></div>



        (Converting everything possible to cp-1252 = ANSI is on
        purpose.

        Omitting those parts it would be even easier to make
        everything an

        entity.)



        There may be OS issues here too. Parts of eXPerimental are
        UTF-aware and

        might interfere. I'm using Win98SE, but I doubt
        that's the difference.

        (To try I'd need to install stuff first.)



        Axel
      Your message has been successfully submitted and would be delivered to recipients shortly.