How to fix non-Ascii characters using NoteTab
- When converting Ebooks to other formats, one of the tasks is to convert the ebook to html. Generally, that means converting
characters to UTF-8, but because of a lack of understanding on the part of many of those creating ebooks, many of the characters
that should be coded entitles are 'in the open'. That is, characters that browsers know how to display even when they are not
encoded are displayed correctly, but some of these characters don't exist in ASCII, at all. Here is an example:
<strong><span class="sgc-3">2 shallots, chopped (about ? cup) or ? cup chopped scallion or onion</span></strong>
Those 1/3 fraction symbols are called 'Vulgar Fractions', but US ASCII only support three of them - halves and fourths.
Using NoteTab, there is no way to search and replace these characters, because you can't write the character into your find
expression - it doesn't exist in the character set.
So, my question is this: Is there a way to use NoteTab to open these html files, FIND these unencoded characters, and replace them
with the equivalent US ASCII characters, which in this case would be the three character sequence 1/3?
There are a whole host of other characters that are not properly encoded for html/utf-8 as well, but if there is a way to make this
one work, I can work out the rest.
[Non-text portions of this message have been removed]
- Hi Axel,
I've not been following this thread, but will throw out a couple of suggestions based on what I've read.
If you wish to use those fraction characters both for entering/ back converting into NTP or converting them into HTML, why not try MathML or LaTeX?
MathML may be a bit tedius, but it is appropriate for HTML form, and is readable as well as replaceable in any text editor.
LaTeX can be entered and converted into HTML using TeX4HT. It is also replaceable.
Again, since I didn't read most mails, I'm not sure if my suggestions would help.
Sent from my Lenovo ThinkPad
On Sun, 9/1/13, Axel Berger <Axel-Berger@...> wrote:
Subject: Re: [Clip] How to fix non-Ascii characters using NoteTab
Date: Sunday, September 1, 2013, 12:31 AM
John Shotsky wrote:
> I use EditPad Pro on an expired trial for working with
> When I open the html file with EditPad I can see these
> just fine.
That may well be the problem. That and some shenanigans
engages in with copying and pasting.
> I have taken the liberty of cc'ing your personal
> and have attached the html.
I have opened the html in firefox and a UTF UNaware simple
the first I see all characters and copying and pasting
from UTF to ANSI or an ASCII equivalent thus:
¼ cup flour
¾ cup milk
1/3 cup flour
The editors shows me the individual bytes the characters are
made of and
I can copy them to NT unchanged:
>Â¾</strong> cup milk</div>
>Â¼ cup flour</strong></div>
>â…“ cup flour</strong></div>
Running my own UTF script over them yields:
>¾</strong> cup milk</div>
>¼ cup flour</strong></div>
>⅓ cup flour</strong></div>
(Converting everything possible to cp-1252 = ANSI is on
Omitting those parts it would be even easier to make
There may be OS issues here too. Parts of eXPerimental are
might interfere. I'm using Win98SE, but I doubt
that's the difference.
(To try I'd need to install stuff first.)