997Re: Filename encodings under Win32
- Oct 12, 2003On Mon, Oct 13, 2003 at 05:21:04AM +0200, Tony Mechelynck wrote:
> I understood you as meaning that the program-default setting should beI believe that the *internal* encoding ("encoding") can, if the various
> Unicode. I beg to differ, however. Or maybe I misunderstood what you were
> saying. And whatever the program-default settings, Vim should (IMHO) work in
> as constant a manner as possible across all platforms.
bugs are fixed, reasonably be UTF-8, unless there's outcry about memory
usage. I agree that it's very important that keyboard input, file
reading and writing, and so on operate in the ACP by default.
> Now let's say I change 'encoding' to "utf-8". With 'termencoding' left emptyRight: I believe this is poor behavior for Windows. Windows input is
> (the default), gvim now suddenly expects the keyboard to be sending UTF-8
> byte sequences (because an empty 'termencoding' means it takes the same
> value as whatever is the current vazlue of 'encoding'). Windows, however, is
always in the ACP, and if it's not, it should always be possible to find
out what it is. (That is, I don't know exactly what Windows does if you
have multiple keyboard mappings and change languages, but it shouldn't
require special changing of tenc.)
For example, Vim always expects data from the IME in the encoding it
sends (Unicode). termencoding is not used. If I set tenc=cp1242, I
can still enter Japanese kanji with the IME--Vim knows that data is
alwyas in the same format, and handles it correctly, even though it's
not CP1242. Keyboard input is the same: the encoding should always
(I don't know if anyone is using tenc in Windows to do weird things;
I can't think of any practical use for intentionally setting tenc to
a value that doesn't match the ACP.)
> It may be useful in itself; but until and unless it is indeed (as youThat's nice, but not relevant. :) Again, I wasn't suggesting anyone
> suggest) incorporated in mainline Vim source (a possibility towards which
> I'm not averse as long as it doesn't break something else), it "doesn't
> exist" from where I sit.
use the Vim script I supplied, but only using it to demonstrate what the
internal defaults could be.
> Indeed. The difference is virtually nil for English; it is small but nonzeroThe penalty is about 50% for CJK languages (two byte encodings become
> for other Latin-alphabet languages, it approaches 1 to 2 for other-alphabet
> languages like Greek or Russian (a little less than that because of spaces,
> commas, full stops, etc.); I don't know the ratio for languages like hindi
> (with nagari script) or Chinese (hanzi).
three byte sequences).
> By the way: what do you mean by ACP? The currently "active code page" maybe?ANSI codepage. It's the system codepage, set in the "regional settigs"
control panel (or whatever; MS changes the control panels weekly). It's
the codepage that "*A" (ANSI) functions expect (which are the ones Vim
uses, for the most part). Essentially, the ACP is to Windows 9x as
"encoding" is to Vim. In NT, everything is UCS-16 internally--or
is it UTF-16?--and the "*A" functions convert to and from the ACP.
In a sense, MS did with NT what I wish Vim would do--standardize on Unicode
internally, to make the internals simpler, in a way that is transparent
> Hm. Your "kanji in filenames" issue makes me think: could that be related toIt's related, but not exactly the same.
> the fact that my Netscape 7 cannot properly handle Cyrillic letters between
> <title></title> HTML tags (what sits there displays on the title bar, and
> anything out-of-the-way is accepted but doesn't display properly, IIRC not
> even with a <meta> tag specifying that the page is in UTF-8) but can show
> them with no problems in body text, for instance between <H1></H1> (where
> the title could appear again, this time to be displayed on top of the text
> inside the browser window)? But this paragraph may be drifting off-topic.
Vim's problem with titlebars is that it's not converting titlebar
strings to the ACP. ("桜.txt" shows up as <8d><f7>.txt, and 8df7
looks like the Unicode value of 桜; I'm not entirely sure how that's
happening and havn't looked at the code.) Fixing this will allow
displaying characters in the ANSI codepage: a system set to Japanese
will be able to display Kanji, but not Arabic.
For displaying full Unicode, it needs to test if Unicode is available,
create a Unicode window (instead of an ANSI window), and set the title
with the corresponding wide function. This isn't too hard, but it does
take more work and a great deal more testing (to make sure it doesn't
break anything in 9x). This would be nice, but it's above and beyond
"don't break anything in UTF-8 that works in the normal ANSI codepage".
Whoops. I just tried saving "桜.txt", and ended up with "(garbage)÷.txt".
That explains the "<8d><f7>.txt". Looks like file saving isn't working
right when enc=utf-8. This is a much more serious bug, but not one I'm
up to fixing right now, as, like you, I rarely edit files with non-ASCII
characters in the filename. (I'm still using 6.1, though, so this might
well be fixed.)
 or in Unicode in NT if you use the correct Windows messages, but I
don't recall which of those work in 9x (probably none)
- << Previous post in topic Next post in topic >>