Loading ...
Sorry, an error occurred while loading the content.

Re: Failed to drag&drop-open a file with wide-chars in its filename

Expand Messages
  • Nico Weber
    ... I m pretty sure it hasn t. As far as I know, for linux a filename is just a bunch of bytes, and you only need to know the encoding for lesser tasks such as
    Message 1 of 12 , Jun 23, 2009
    • 0 Attachment
      >> Windows uses NFC for filenames. I'm not sure if the Linux world
      >> settled on
      >> NFC or NFK.
      >
      > When I worked on AbiWord a few years ago Linux left filename encoding
      > up to the filesystem and the user. This may have changed since...


      I'm pretty sure it hasn't. As far as I know, for linux a filename is
      just a bunch of bytes, and you only need to know the encoding for
      lesser tasks such as file name display anyway ;-) In that case, the
      recommended way is to get the encoding from an env var.

      Nico

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • björn
      Hi Eljay, ... Thanks for clarifying that (and for the links!). ... I read that Windows uses NFKC. Have you got a reference for the claim that NFC is used? ...
      Message 2 of 12 , Jun 24, 2009
      • 0 Attachment
        Hi Eljay,

        2009/6/23 John (Eljay) Love-Jensen:
        >
        >> As far as I can tell (from searching around) HFS+ always uses
        >> normalization form D (NFD) for filenames.
        >
        > HFS+ uses a variant of NFD for filenames.  (The HFS+ variant predates
        > standardizatoin of NFD.)  This requirement is enforced by the OS.
        >
        > http://developer.apple.com/technotes/tn/tn1150.html
        > http://developer.apple.com/technotes/tn/tn1150table.html
        > http://developer.apple.com/qa/qa2001/qa1235.html
        > http://www.unicode.org/reports/tr15/

        Thanks for clarifying that (and for the links!).

        > Windows uses NFC for filenames.  I'm not sure if the Linux world settled on
        > NFC or NFK.

        I read that Windows uses NFKC. Have you got a reference for the claim
        that NFC is used?

        >> So as a workaround for the issue the OP had I now normalize filenames
        >> to compatibility form C (NFKC) before passing the filename on to Vim
        >> and this takes care of the OP's problem.
        >
        > NFC or NFKC?  Those are different normalizations.
        >
        > Windows NTFS file system uses NFC.  But it isn't enforced by the OS, yet.

        I did mean the compatibility form NFKC since I read somewhere that
        NTFS uses NFKC, but I did not research that very carefully.


        >> However, as I see it this really is a legitimate issue in Vim itself
        >> in that it does not handle NFD properly (the example above should
        >> always render as one glyph, not three as it does now if NFD is used).
        >> Either Vim should ensure that all buffers are normalized to composed
        >> form NFC/NFKC or it needs to be made "NFD aware".
        >
        > I agree with your assessment.
        >
        >> Does anybody on the vim_multibyte list (this mail goes to vim_mac as
        >> well) have any comments on this?
        >
        > The relevant Mac OS X routine APIs are:
        >
        > CFURLRef url =
        > CFURLCreateWithFileSystemPath(
        >  kCFAllocatorDefault,
        >  cfstringFullPath,
        >  kCFURLPOSIXPathStyle,
        >  false));
        >
        > char bufferUTF8[32768*4]; // Worst case scenario.
        > // As per Apple documentation, paths can be "up to 30,000 UTF-16
        > // encoding units long", with each component being up to 255 UTF-16
        > // encoding units long.  Too bad there isn't an API to specify the
        > // exact buffer size /a priori/.
        >
        > Boolean success =
        > CFURLGetFileSystemRepresentation(
        >  url,
        >  true,
        >  &bufferUTF8[0],
        >  sizeof bufferUTF8);

        Thanks. NSString has a method called fileSystemRepresentation which
        I'm guessing does the same thing(?). I used the NSString method
        precomposedStringWithCompatibilityMapping to convert to NFKC.

        Björn

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_multibyte" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • John (Eljay) Love-Jensen
        Hi Björn, ... Drat, I cannot find the MSDN reference. Maybe my memory has failed me. NFKC is lossy. NFC is non-lossy. Perhaps you are remembering the
        Message 3 of 12 , Jun 24, 2009
        • 0 Attachment
          Hi Björn,

          > I read that Windows uses NFKC. Have you got a reference for the claim
          > that NFC is used?

          Drat, I cannot find the MSDN reference. Maybe my memory has failed me.

          NFKC is lossy. NFC is non-lossy.

          Perhaps you are remembering the security information:
          http://msdn.microsoft.com/en-us/library/dd374047(VS.85).aspx#SC_Unicode

          File Names, Paths, and Namespaces information is here:
          http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx

          Note that modern UNC (starts with "\\?\" (for paths) or with "\\.\" (for
          volumes) -- such as "\\?\C:\Dir\Sub\File.ext", and up to 32,767 UTF-16
          encoding units (Vista), or UCS-2 characters (XP), using 16-bit encoding of
          Unicode) is different from older "short" UNC (DOS-era limit of 260 8-bit
          characters dependent on the OS code page setting).

          The NFC is mentioned here in a MSDN blog:
          http://blogs.msdn.com/michkap/archive/2006/12/07/1232365.aspx

          But I don't consider that canonical, since it was in a blog feedback
          comment.

          I asked for clarification on the MSDN "File Names, Paths, and Namespaces"
          page, in the comments section.

          NOTE: "short" UNC and "old" DOS style has to abide by the OS code page
          setting. Even when using the FooW routines and wchar_t (16-bit) paths.

          > Thanks. NSString has a method called fileSystemRepresentation which
          > I'm guessing does the same thing(?). I used the NSString method
          > precomposedStringWithCompatibilityMapping to convert to NFKC.

          I presume so. My Cocoa experience is not as extensive as my Carbon
          experience.

          Sincerely,
          --Eljay


          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_multibyte" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • Tony Mechelynck
          ... Hm, NFKC and NFKD sometimes fuse slightly different glyphs into a single normalized form. For instance, NFKC(²) = 2, though both are (different) Latin1
          Message 4 of 12 , Jun 24, 2009
          • 0 Attachment
            On 24/06/09 14:00, björn wrote:
            >
            > Hi Eljay,
            >
            > 2009/6/23 John (Eljay) Love-Jensen:
            >>
            >>> As far as I can tell (from searching around) HFS+ always uses
            >>> normalization form D (NFD) for filenames.
            >>
            >> HFS+ uses a variant of NFD for filenames. (The HFS+ variant predates
            >> standardizatoin of NFD.) This requirement is enforced by the OS.
            >>
            >> http://developer.apple.com/technotes/tn/tn1150.html
            >> http://developer.apple.com/technotes/tn/tn1150table.html
            >> http://developer.apple.com/qa/qa2001/qa1235.html
            >> http://www.unicode.org/reports/tr15/
            >
            > Thanks for clarifying that (and for the links!).
            >
            >> Windows uses NFC for filenames. I'm not sure if the Linux world settled on
            >> NFC or NFK.
            >
            > I read that Windows uses NFKC. Have you got a reference for the claim
            > that NFC is used?
            >
            >>> So as a workaround for the issue the OP had I now normalize filenames
            >>> to compatibility form C (NFKC) before passing the filename on to Vim
            >>> and this takes care of the OP's problem.
            >>
            >> NFC or NFKC? Those are different normalizations.
            >>
            >> Windows NTFS file system uses NFC. But it isn't enforced by the OS, yet.
            >
            > I did mean the compatibility form NFKC since I read somewhere that
            > NTFS uses NFKC, but I did not research that very carefully.
            >
            >
            >>> However, as I see it this really is a legitimate issue in Vim itself
            >>> in that it does not handle NFD properly (the example above should
            >>> always render as one glyph, not three as it does now if NFD is used).
            >>> Either Vim should ensure that all buffers are normalized to composed
            >>> form NFC/NFKC or it needs to be made "NFD aware".
            >>
            >> I agree with your assessment.
            >>
            >>> Does anybody on the vim_multibyte list (this mail goes to vim_mac as
            >>> well) have any comments on this?
            >>
            >> The relevant Mac OS X routine APIs are:
            >>
            >> CFURLRef url =
            >> CFURLCreateWithFileSystemPath(
            >> kCFAllocatorDefault,
            >> cfstringFullPath,
            >> kCFURLPOSIXPathStyle,
            >> false));
            >>
            >> char bufferUTF8[32768*4]; // Worst case scenario.
            >> // As per Apple documentation, paths can be "up to 30,000 UTF-16
            >> // encoding units long", with each component being up to 255 UTF-16
            >> // encoding units long. Too bad there isn't an API to specify the
            >> // exact buffer size /a priori/.
            >>
            >> Boolean success =
            >> CFURLGetFileSystemRepresentation(
            >> url,
            >> true,
            >> &bufferUTF8[0],
            >> sizeof bufferUTF8);
            >
            > Thanks. NSString has a method called fileSystemRepresentation which
            > I'm guessing does the same thing(?). I used the NSString method
            > precomposedStringWithCompatibilityMapping to convert to NFKC.
            >
            > Björn

            Hm, NFKC and NFKD sometimes fuse slightly different glyphs into a single
            "normalized" form. For instance, NFKC(²) = 2, though both are
            (different) Latin1 characters (0xB2 and 0x32). IIRC, DOS would have kept
            them distinct.

            Best regards,
            Tony.
            --
            hundred-and-one symptoms of being an internet addict:
            56. You leave the modem speaker on after connecting because you think it
            sounds like the ocean wind...the perfect soundtrack for "surfing
            the net".

            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_multibyte" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          Your message has been successfully submitted and would be delivered to recipients shortly.