1016Re: Filename encodings under Win32
- Oct 14, 2003
> While that may sound attractive at first, I would strongly dissuade fromIt's not at all a myth if you want code that is 1: portable and 2: works
> that solution. I consider it to be a myth that using multilingual
> filenames on Windows is hard. Under NT, it's should be a breeze for any
on 9x, too. (If you can deal with nonportable code, you can use Windows's
TCHAR mechanism, and if you don't care about anything but NT, you can write
a UTF-16-only app. Neither of these are the case here, though.)
It's not "hard", it's just "incredibly annoying".
On Tue, Oct 14, 2003 at 02:20:27PM +0200, Bram Moolenaar wrote:
> This is still complicated, but probably requires less changes than using
> Unicode functions for all file access. I only foresee trouble when
> 'encoding' is set to a non-Unicode codepage different from the active
> codepage and using a filename that contains non-ASCII characters.
> Perhaps this situation is too weird to take into account?
If "encoding" is not the ACP codepage, then the main problem is that the
user can enter characters that Vim simply can't put into a filename
(and in 9x, that the system can't, either).
I'd just do a conversion, and if the conversion fails, warn appropriately.
> Eh, what happens when I use fopen() or stat()? There is no ANSI or wide
> version of these functions. And certainly not one that also works on
> non-Win32 systems. And when using the wide version conversion needs to
> be done from 'encoding' to Unicode, thus the conversion has to be there
> as well. That's going to be a lot of work (many #ifdefs) and will
> probably introduce new bugs.
It's not that much work. Windows has _wfopen and _wstat. Vim already
has those abstracted (mch_fopen, mch_stat), so conversions would only
happen in one place (and in a place that's intended to be platform-
specific, mch_*). I believe the code I linked earlier did exactly this.
The only thing needed is sane error recovery.
> Yep, using conversions means failure is possible. And failure mostly
> means the text is in a different encoding than expected. It would take
> some time to figure out how to do this in a way that the user isn't
Well, bear in mind the non-ACP case that already exists. If I create
"foo ♡.txt", and try to edit it with Vim, it edits "foo ?.txt" (which
it can't write, either, since "?" is an invalid character in Windows
filenames). I'd suggest that editing a file with an invalid character
(eg. invalid SJIS sequence) behave identically to editing a file with
a valid character that can't be referenced (eg. "foo ♡.txt").
- << Previous post in topic Next post in topic >>