Loading ...
Sorry, an error occurred while loading the content.

RE: [Clip] How to fix non-Ascii characters using NoteTab

Expand Messages
  • John Shotsky
    Search the html code with what, and for what? The point is that NoteTab substitutes question marks for those codes, when you open an html page with it. That is
    Message 1 of 33 , Aug 31, 2013
    • 0 Attachment
      Search the html code with what, and for what? The point is that NoteTab substitutes question marks for those codes, when you open an
      html page with it. That is the whole problem.

      Regards,
      John
      RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/
      John's Mags Yahoo Group: <http://groups.yahoo.com/group/johnsmags/> http://groups.yahoo.com/group/johnsmags/

      From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of John Wallace
      Sent: Saturday, August 31, 2013 09:01
      To: ntb-clips@yahoogroups.com
      Subject: RE: [Clip] How to fix non-Ascii characters using NoteTab


      Why wouldn't you be searching the html code of the page instead of the output that's on the screen?

      John

      -----Original Message-----
      From: ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> [mailto:ntb-clips@yahoogroups.com
      <mailto:ntb-clips%40yahoogroups.com> ] On Behalf Of John Shotsky
      Sent: Saturday, August 31, 2013 11:38 AM
      To: ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com>
      Subject: RE: [Clip] How to fix non-Ascii characters using NoteTab

      Ok, then.
      Well, your job is to write the regex to accomplish this change, as that is exactly what is wanted. Bear in mind that NoteTab cannot
      display those characters, so each of them will be a question mark in NoteTab. J

      Here is a list of the actual entity codes that should be used in html.
      http://demosthenes.info/blog/566/Writing-Fractions-On-Web-Pages-Correctly-With-Entities
      However, even that is not fully correct for xml - the xml rules state that NUMERICAL character codes must be used in place of
      'named' character codes, as shown in the above reference.
      The numerical codes work in both xml and html, but the named ones only work in html.

      I had HOPED there was a way to open such a document in NoteTab, cause the regex to use the numerical codes (instead of the actual
      characters) to locate the bogus characters, and then replace them with the equivalents listed below.

      Regards,
      John
      RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/
      John's Mags Yahoo Group: <http://groups.yahoo.com/group/johnsmags/> http://groups.yahoo.com/group/johnsmags/

      From: ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> [mailto:ntb-clips@yahoogroups.com
      <mailto:ntb-clips%40yahoogroups.com> ] On Behalf Of John Wallace
      Sent: Saturday, August 31, 2013 08:08
      To: ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com>
      Subject: RE: [Clip] How to fix non-Ascii characters using NoteTab


      Html codes to make some 'vulgar' fractions:

      ? - 1/3
      ? - 2/3

      ? - 1/8
      ? - 3/8
      ? - 5/8
      ? - 7/8

      ? - :)

      John Wallace

      -----Original Message-----
      From: ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> <mailto:ntb-clips%40yahoogroups.com>
      [mailto:ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com>
      <mailto:ntb-clips%40yahoogroups.com> ] On Behalf Of John Shotsky
      Sent: Saturday, August 31, 2013 9:59 AM
      To: ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> <mailto:ntb-clips%40yahoogroups.com>
      Subject: RE: [Clip] How to fix non-Ascii characters using NoteTab

      They aren't fractions. They are characters that don't exist in ASCII.

      Regards,
      John
      RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/ John's Mags Yahoo Group:
      <http://groups.yahoo.com/group/johnsmags/> http://groups.yahoo.com/group/johnsmags/

      From: ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> <mailto:ntb-clips%40yahoogroups.com>
      [mailto:ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com>
      <mailto:ntb-clips%40yahoogroups.com> ] On Behalf Of Dave
      Sent: Saturday, August 31, 2013 06:30
      To: ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> <mailto:ntb-clips%40yahoogroups.com>
      Subject: Re: [Clip] How to fix non-Ascii characters using NoteTab

      Hi
      would converting fractions to decimal then back help at all.
      THANKYOU DAVE-211

      ----- Original Message -----
      From: "John Shotsky" <jshotsky@... <mailto:jshotsky%40comcast.net> <mailto:jshotsky%40comcast.net>
      <mailto:jshotsky%40comcast.net> >
      To: <ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> <mailto:ntb-clips%40yahoogroups.com>
      <mailto:ntb-clips%40yahoogroups.com> >
      Sent: Saturday, August 31, 2013 12:51 PM
      Subject: [Clip] How to fix non-Ascii characters using NoteTab

      > When converting Ebooks to other formats, one of the tasks is to
      > convert the ebook to html. Generally, that means converting characters
      > to UTF-8, but because of a lack of understanding on the part of many
      > of those creating ebooks, many of the characters that should be coded
      > entitles are 'in the open'. That is, characters that browsers know how
      > to display even when they are not encoded are displayed correctly, but
      > some of these characters don't exist in ASCII, at all. Here is an
      > example:
      > <strong><span class="sgc-3">2 shallots, chopped (about ? cup) or ? cup
      > chopped scallion or onion</span></strong> Those 1/3 fraction symbols
      > are called 'Vulgar Fractions', but US ASCII only support three of them
      > - halves and fourths.
      > Using NoteTab, there is no way to search and replace these characters,
      > because you can't write the character into your find expression - it
      > doesn't exist in the character set.
      >
      > So, my question is this: Is there a way to use NoteTab to open these
      > html files, FIND these unencoded characters, and replace them with the
      > equivalent US ASCII characters, which in this case would be the three
      > character sequence 1/3?
      >
      > There are a whole host of other characters that are not properly
      > encoded for html/utf-8 as well, but if there is a way to make this one
      > work, I can work out the rest.
      >
      > Thanks,
      > John
      >
      >
      >
      > [Non-text portions of this message have been removed]
      >
      >
      >
      > ------------------------------------
      >
      > Fookes Software: http://www.fookes.com/ NoteTab website:
      > http://www.notetab.com/ NoteTab Discussion Lists:
      > http://www.notetab.com/groups.php
      >
      > ***
      > Yahoo! Groups Links
      >
      >
      >

      [Non-text portions of this message have been removed]

      ------------------------------------

      Fookes Software: http://www.fookes.com/
      NoteTab website: http://www.notetab.com/ NoteTab Discussion Lists: http://www.notetab.com/groups.php

      ***
      Yahoo! Groups Links

      [Non-text portions of this message have been removed]

      ------------------------------------

      Fookes Software: http://www.fookes.com/
      NoteTab website: http://www.notetab.com/
      NoteTab Discussion Lists: http://www.notetab.com/groups.php

      ***
      Yahoo! Groups Links



      [Non-text portions of this message have been removed]
    • Roopakshi Pathania
      Hi Axel, I ve not been following this thread, but will throw out a couple of suggestions based on what I ve read. If you wish to use those fraction characters
      Message 33 of 33 , Sep 7 9:34 AM
      • 0 Attachment
        Hi Axel,

        I've not been following this thread, but will throw out a couple of suggestions based on what I've read.
        If you wish to use those fraction characters both for entering/ back converting into NTP or converting them into HTML, why not try MathML or LaTeX?
        MathML may be a bit tedius, but it is appropriate for HTML form, and is readable as well as replaceable in any text editor.
        LaTeX can be entered and converted into HTML using TeX4HT. It is also replaceable.

        Again, since I didn't read most mails, I'm not sure if my suggestions would help.

        Sent from my Lenovo ThinkPad

        --------------------------------------------
        On Sun, 9/1/13, Axel Berger <Axel-Berger@...> wrote:

        Subject: Re: [Clip] How to fix non-Ascii characters using NoteTab
        To: ntb-clips@yahoogroups.com
        Date: Sunday, September 1, 2013, 12:31 AM
















         









        John Shotsky wrote:

        > I use EditPad Pro on an expired trial for working with
        Unicode files.

        > When I open the html file with EditPad I can see these
        characters

        > just fine.



        That may well be the problem. That and some shenanigans
        Windows itself

        engages in with copying and pasting.



        > I have taken the liberty of cc'ing your personal
        email address,

        > and have attached the html.



        I have opened the html in firefox and a UTF UNaware simple
        editor. In

        the first I see all characters and copying and pasting
        translates them

        from UTF to ANSI or an ASCII equivalent thus:



        ¼ cup flour

        ¾ cup milk

        1/3 cup flour



        The editors shows me the individual bytes the characters are
        made of and

        I can copy them to NT unchanged:



        >¾</strong> cup milk</div>

        >¼ cup flour</strong></div>

        >â…“ cup flour</strong></div>



        Running my own UTF script over them yields:



        >¾</strong> cup milk</div>

        >¼ cup flour</strong></div>

        >⅓ cup flour</strong></div>



        (Converting everything possible to cp-1252 = ANSI is on
        purpose.

        Omitting those parts it would be even easier to make
        everything an

        entity.)



        There may be OS issues here too. Parts of eXPerimental are
        UTF-aware and

        might interfere. I'm using Win98SE, but I doubt
        that's the difference.

        (To try I'd need to install stuff first.)



        Axel
      Your message has been successfully submitted and would be delivered to recipients shortly.