1006Re: Filename encodings under Win32
- Oct 13, 2003Camillo wrote:
> > Vim could use Unicode functions for accessing files, but this will be aBecause every fopen(), stat() etc. will have to be changed.
> > huge change.
> Why so? The code earlier in this thread probably did much of what is
> needed. It also involved numerous other changes, which I ignored. I'm not
> being nosy, I'm just curious why this would be a "huge change". It's not
> the file contents we are getting at, it's the filenames (and the GUI).
> Also note that when using the native code page as the encoding (read:This only means extra work, since an "if (encoding == ...)" has to be
> latin1), using the ANSI functions do work as expected. So the fixes would
> only need to concern the UTF-8 encoding, if you get picky. :)
added to select between the traditional file access method and the
> > Requires lots of testing.No, it's actually impossible to test this automatically. It involves
> That's unicode for you. However, deriving a decent test set using
> available unicode test files should be a fairly straight-forward thing.
creating various Win32 environments with code page settings, network
filesystems and installed libraries. Only end-user tests can discover
the real problems.
> > Main problem is when 'encoding' is not a Unicode encoding, then conversionsThe currently used functions work fine for accessing existing files.
> > need to be done, which may fail.
> But what I assume you are doing now is even worse, isn't it? Essentially
> you are be feeding some user-selected encoding to functions that require
> ANSI characters. How's that for "a lot of testing"?
It's only when typing a new name or when displaying the name that
problems may occur.
> Conversions from almost any encoding to unicode should work. I would notMain problem is that sometimes we don't know what the encoding is. In
> expect major trouble there. And note that if the conversion from the
> encoding to unicode fails, I expect that the current usage would fail even
> more severely. And there haven't been reports of that, has there?
that situation you can treat the filename as a sequence of bytes in most
places, but conversion is impossible. This happens more often than you
would expect. Put a floppy disk or CD into your computer...
There is also the situation that Vim uses the active codepage, but the
file is actually in another encoding that could not be detected. Then
doing "gf" on a filename will work if you don't do conversion, but it
will fail if you try converting with the wrong encoding in mind.
> > Thus sticking with the active codepage functions isn't too bad.I don't see why. You can use a file selector to open any file and write
> If it worked that way, but it doesn't. Setting "encoding=utf-8" changes
> that behavior - only us-ascii is usable in filenames.
it back under the same name. Vim doesn't need to know the encoding of
the filename that way.
If you type a file name in utf-8 it won't work properly, thus you have
to use another method to obtain the file name. It's clumsy, I know.
> > But then Vim needs to convert from 'encoding' to the active codepage!As said above, this only works if we are 100% sure of what encoding the
> That would help most users. Including me. But it would not be the
> "ultimate" solution to unicode on win32, as it would still cause trouble
> with characters outside the codepage. As I see it, the easiest fix is
> actually using the unicode-api, as there are less (or no) conversion
> failures that way.
text (filename) is in, and we don't always know that.
> > Why would 'termencoding' be "utf-8"? This won't work, unless you areSetting 'encoding' is full of side effects. There is a clear warning in
> > using an xterm on MS-Windows.
> Yeah, but that's what you get if you just blindly do "set encoding=utf-8".
> Took me a while to figure that one out. I need to do "set
> termencoding=cp1252" first, or the "let &termencoding = &encoding". Not
> exactly transparent to non-experts.
the docs about this.
> > The default 'termencoding' is empty, which means 'encoding' is used.I remember this was proposed before, I can't remember why we didn't do
> > There is no better default.
> On Windows, I'd say "detect active code page" is the right choice.
it this way. Windows is different here, since we can find out what the
active codepage is. On Unix it's not that clear (e.g., depends on what
options the xterm was started with). Consistency between systems is
> >>- Also, my vim (6.2) defaults to "latin1", not my current codepage. ThatYour active codepage must be latin1 then. Vim gets the default from the
> >>would indicate that the ACP detection does not work.
> > Where does it use "latin1"? Not in 'encoding', I suppose.
> Yes. Without a _vimrc, I get:
> Thus changing the encoding only has funny effects.
hundred-and-one symptoms of being an internet addict:
192. Your boss asks you to "go fer" coffee and you come up with 235 FTP sites.
/// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
/// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\
\\\ Project leader for A-A-P -- http://www.A-A-P.org ///
\\\ Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html ///
- << Previous post in topic Next post in topic >>