Loading ...
Sorry, an error occurred while loading the content.

Re: Failed to drag&drop-open a file with wide-chars in its filename

Expand Messages
  • John (Eljay) Love-Jensen
    Hi Björn, ... Drat, I cannot find the MSDN reference. Maybe my memory has failed me. NFKC is lossy. NFC is non-lossy. Perhaps you are remembering the
    Message 1 of 12 , Jun 24, 2009
    • 0 Attachment
      Hi Björn,

      > I read that Windows uses NFKC. Have you got a reference for the claim
      > that NFC is used?

      Drat, I cannot find the MSDN reference. Maybe my memory has failed me.

      NFKC is lossy. NFC is non-lossy.

      Perhaps you are remembering the security information:
      http://msdn.microsoft.com/en-us/library/dd374047(VS.85).aspx#SC_Unicode

      File Names, Paths, and Namespaces information is here:
      http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx

      Note that modern UNC (starts with "\\?\" (for paths) or with "\\.\" (for
      volumes) -- such as "\\?\C:\Dir\Sub\File.ext", and up to 32,767 UTF-16
      encoding units (Vista), or UCS-2 characters (XP), using 16-bit encoding of
      Unicode) is different from older "short" UNC (DOS-era limit of 260 8-bit
      characters dependent on the OS code page setting).

      The NFC is mentioned here in a MSDN blog:
      http://blogs.msdn.com/michkap/archive/2006/12/07/1232365.aspx

      But I don't consider that canonical, since it was in a blog feedback
      comment.

      I asked for clarification on the MSDN "File Names, Paths, and Namespaces"
      page, in the comments section.

      NOTE: "short" UNC and "old" DOS style has to abide by the OS code page
      setting. Even when using the FooW routines and wchar_t (16-bit) paths.

      > Thanks. NSString has a method called fileSystemRepresentation which
      > I'm guessing does the same thing(?). I used the NSString method
      > precomposedStringWithCompatibilityMapping to convert to NFKC.

      I presume so. My Cocoa experience is not as extensive as my Carbon
      experience.

      Sincerely,
      --Eljay


      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Tony Mechelynck
      ... Hm, NFKC and NFKD sometimes fuse slightly different glyphs into a single normalized form. For instance, NFKC(²) = 2, though both are (different) Latin1
      Message 2 of 12 , Jun 24, 2009
      • 0 Attachment
        On 24/06/09 14:00, björn wrote:
        >
        > Hi Eljay,
        >
        > 2009/6/23 John (Eljay) Love-Jensen:
        >>
        >>> As far as I can tell (from searching around) HFS+ always uses
        >>> normalization form D (NFD) for filenames.
        >>
        >> HFS+ uses a variant of NFD for filenames. (The HFS+ variant predates
        >> standardizatoin of NFD.) This requirement is enforced by the OS.
        >>
        >> http://developer.apple.com/technotes/tn/tn1150.html
        >> http://developer.apple.com/technotes/tn/tn1150table.html
        >> http://developer.apple.com/qa/qa2001/qa1235.html
        >> http://www.unicode.org/reports/tr15/
        >
        > Thanks for clarifying that (and for the links!).
        >
        >> Windows uses NFC for filenames. I'm not sure if the Linux world settled on
        >> NFC or NFK.
        >
        > I read that Windows uses NFKC. Have you got a reference for the claim
        > that NFC is used?
        >
        >>> So as a workaround for the issue the OP had I now normalize filenames
        >>> to compatibility form C (NFKC) before passing the filename on to Vim
        >>> and this takes care of the OP's problem.
        >>
        >> NFC or NFKC? Those are different normalizations.
        >>
        >> Windows NTFS file system uses NFC. But it isn't enforced by the OS, yet.
        >
        > I did mean the compatibility form NFKC since I read somewhere that
        > NTFS uses NFKC, but I did not research that very carefully.
        >
        >
        >>> However, as I see it this really is a legitimate issue in Vim itself
        >>> in that it does not handle NFD properly (the example above should
        >>> always render as one glyph, not three as it does now if NFD is used).
        >>> Either Vim should ensure that all buffers are normalized to composed
        >>> form NFC/NFKC or it needs to be made "NFD aware".
        >>
        >> I agree with your assessment.
        >>
        >>> Does anybody on the vim_multibyte list (this mail goes to vim_mac as
        >>> well) have any comments on this?
        >>
        >> The relevant Mac OS X routine APIs are:
        >>
        >> CFURLRef url =
        >> CFURLCreateWithFileSystemPath(
        >> kCFAllocatorDefault,
        >> cfstringFullPath,
        >> kCFURLPOSIXPathStyle,
        >> false));
        >>
        >> char bufferUTF8[32768*4]; // Worst case scenario.
        >> // As per Apple documentation, paths can be "up to 30,000 UTF-16
        >> // encoding units long", with each component being up to 255 UTF-16
        >> // encoding units long. Too bad there isn't an API to specify the
        >> // exact buffer size /a priori/.
        >>
        >> Boolean success =
        >> CFURLGetFileSystemRepresentation(
        >> url,
        >> true,
        >> &bufferUTF8[0],
        >> sizeof bufferUTF8);
        >
        > Thanks. NSString has a method called fileSystemRepresentation which
        > I'm guessing does the same thing(?). I used the NSString method
        > precomposedStringWithCompatibilityMapping to convert to NFKC.
        >
        > Björn

        Hm, NFKC and NFKD sometimes fuse slightly different glyphs into a single
        "normalized" form. For instance, NFKC(²) = 2, though both are
        (different) Latin1 characters (0xB2 and 0x32). IIRC, DOS would have kept
        them distinct.

        Best regards,
        Tony.
        --
        hundred-and-one symptoms of being an internet addict:
        56. You leave the modem speaker on after connecting because you think it
        sounds like the ocean wind...the perfect soundtrack for "surfing
        the net".

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_multibyte" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      Your message has been successfully submitted and would be delivered to recipients shortly.