RE: [Clip] Code page/character issues
- EditPad Pro is a Unicode editor, so yes, it displays Unicode and utf-8 and many other code pages correctly. But that
file is not Unicode, it is 8-bit UTF. When one of these files is moved, NoteTab not only displays it correctly, but it
also saves it correctly, that is, without the accents. So, that is the workaround for now. What is not acceptable is the
file as first opened, which does not result in a question mark or any valid character in any code page. It is just
garbage. Previously, NoteTab displayed a question mark for any character out of its map. Now, it doesn't.
But that's not actually the point anyway. The file is UTF-8 when it is written, and after it is copied. Nothing is
different about the file except that there is a copy in another location. The copy displays correctly in NoteTab, but
the original doesn't. The copy works with my clip library, the original doesn't. If I export the original in NoteTab to
UTF-8 it displays correctly, but of course just copying it works, as does renaming it, so I can't say the export
actually does anything. However, if I export it to Ascii, question marks show up for those characters, as expected. The
clip library can't work with a bunch of question marks either, of course, as there is no way to guess what the missing
character is except through a very, very complex word map which replaces question marks with characters if the word is
otherwise recognized. So, for the words you correctly detected below, I would simply substitute the unaccented
characters for accented ones and that would be fine. But I can't do that with the original, because it displays EXTRA
characters, as indicated in my 'Hex/Ascii' view below.
So, for now, my instructions will include moving the FireFox-exported file to a work folder, and we'll go with that as
long as it continues to work. As to the problem, I will leave it in the category of unresolvable.
RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/
From: email@example.com [mailto:firstname.lastname@example.org] On Behalf Of Axel Berger
Sent: Saturday, January 12, 2013 07:23
Subject: Re: [Clip] Code page/character issues
John Shotsky wrote:
> Text View: Speka PiragiIt has to, those characters are not in CP1252. Converting your sample
> Hex/Ascii View: Spe��a P��r��gi
> NoteTab correctly detects it as utf-8. But when I force it to
> Windows 1252, it displays as in NoteTab � incorrectly.
and assuming mail transfer has not broken anything I get:
These are from the "extended block A"
NoteTab will never be able to deal with them satisfactorily. What I
don't get at all is how Win7 interferes with them, but then I have so
far refrained from using eXPerimental and stick to Win98. Even that
tries to interfere and impose its preferences over mine, but there I can
more or less control it. Your identical byte count might result from
using UTF-16, don't newer Windoses do that? If so the byte count should
be twice the letter count.
> But, since EditPad Pro detects it correctly, IIf editpad is true UTF, as you say, then it need not detect anything.
> don't think it's Windows.
Notetab is stricly 8-bit and strictly codepage based, all it can do is
read letters from inside that single chosen codepage when encoded as
UTF-8. Letters from more than one codepage inside the same document will
[Non-text portions of this message have been removed]
- John Shotsky wrote:
> But that file is not Unicode, it is 8-bit UTF.To my understanding UTF-8 as a specific encoding is a subset, or rather
one of several possible versions, of Unicode.
> When one of these files is moved, NoteTab not only displays itSorry, but if those letters do have accents, then anything without is
> correctly, but it also saves it correctly, that is, without the
INcorrect. It may be an acceptable workaround, like Muller or Mueller
instead of Müller, but never correct.
> So, that is the workaround for now.Right
> But that's not actually the point anyway.Agreed. Win7 does something strange here and I'm very happy I need not
concern myself with that.
> As to the problem, I will leave it in the category of unresolvable.Probably best.