2620Re: Failed to drag&drop-open a file with wide-chars in its filename
- Jun 23, 20092009/6/23 John (Eljay) Love-Jensen <eljay@...>:
>When I worked on AbiWord a few years ago Linux left filename encoding
> Hi Björn,
>> As far as I can tell (from searching around) HFS+ always uses
>> normalization form D (NFD) for filenames.
> HFS+ uses a variant of NFD for filenames. (The HFS+ variant predates
> standardizatoin of NFD.) This requirement is enforced by the OS.
> Windows uses NFC for filenames. I'm not sure if the Linux world settled on
> NFC or NFK.
up to the filesystem and the user. This may have changed since...
Linux supports many filesystems including Windows and Mac filesystems.
For filesystems which mandate a specific encoding Linux should follow
those rules. For older filesystems the encoding would generally be the
encoding of the OS but... Linux as Unix is a multisuer OS and may have
various users using various languages in various encodings. Each user
gets to decide their language and encoding through enviroment
variables such as LANG, LC_ALL, LC_COLLATE etc. These vary by vintage
of the OS and may well vary for other Unixes too such as FreeBSD.
I think Linux generally uses extN filesytems as default. When I was
last working with it that was ext2 but ext3 has now been in use for
some time and ext4 is the current iteration which may or may not be in
general release. The ext3 or ext4 filesystems may mandate an encoding
that ext2 did not.
The general soltion for the Unix/Linux world may be to honour the
user's locale settings and assume that the filesystem software will
convert to any specifically mandated encoding it requires when you
call the standard open() etc APIs.
But further research is definitely recommended!
> Amiga OS (at least the one I used) is ECMA 94 Latin 1 based (precursor to--
> ISO 8859-1).
>> So as a workaround for the issue the OP had I now normalize filenames
>> to compatibility form C (NFKC) before passing the filename on to Vim
>> and this takes care of the OP's problem.
> NFC or NFKC? Those are different normalizations.
> Windows NTFS file system uses NFC. But it isn't enforced by the OS, yet.
>> However, as I see it this really is a legitimate issue in Vim itself
>> in that it does not handle NFD properly (the example above should
>> always render as one glyph, not three as it does now if NFD is used).
>> Either Vim should ensure that all buffers are normalized to composed
>> form NFC/NFKC or it needs to be made "NFD aware".
> I agree with your assessment.
>> Does anybody on the vim_multibyte list (this mail goes to vim_mac as
>> well) have any comments on this?
> The relevant Mac OS X routine APIs are:
> CFURLRef url =
> char bufferUTF8[32768*4]; // Worst case scenario.
> // As per Apple documentation, paths can be "up to 30,000 UTF-16
> // encoding units long", with each component being up to 255 UTF-16
> // encoding units long. Too bad there isn't an API to specify the
> // exact buffer size /a priori/.
> Boolean success =
> sizeof bufferUTF8);
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
- << Previous post in topic Next post in topic >>