Loading ...
Sorry, an error occurred while loading the content.

Re: Failed to drag&drop-open a file with wide-chars in its filename

Expand Messages
  • björn
    Hi Eljay, ... Thanks for clarifying that (and for the links!). ... I read that Windows uses NFKC. Have you got a reference for the claim that NFC is used? ...
    Message 1 of 12 , Jun 24 5:00 AM
    • 0 Attachment
      Hi Eljay,

      2009/6/23 John (Eljay) Love-Jensen:
      >
      >> As far as I can tell (from searching around) HFS+ always uses
      >> normalization form D (NFD) for filenames.
      >
      > HFS+ uses a variant of NFD for filenames.  (The HFS+ variant predates
      > standardizatoin of NFD.)  This requirement is enforced by the OS.
      >
      > http://developer.apple.com/technotes/tn/tn1150.html
      > http://developer.apple.com/technotes/tn/tn1150table.html
      > http://developer.apple.com/qa/qa2001/qa1235.html
      > http://www.unicode.org/reports/tr15/

      Thanks for clarifying that (and for the links!).

      > Windows uses NFC for filenames.  I'm not sure if the Linux world settled on
      > NFC or NFK.

      I read that Windows uses NFKC. Have you got a reference for the claim
      that NFC is used?

      >> So as a workaround for the issue the OP had I now normalize filenames
      >> to compatibility form C (NFKC) before passing the filename on to Vim
      >> and this takes care of the OP's problem.
      >
      > NFC or NFKC?  Those are different normalizations.
      >
      > Windows NTFS file system uses NFC.  But it isn't enforced by the OS, yet.

      I did mean the compatibility form NFKC since I read somewhere that
      NTFS uses NFKC, but I did not research that very carefully.


      >> However, as I see it this really is a legitimate issue in Vim itself
      >> in that it does not handle NFD properly (the example above should
      >> always render as one glyph, not three as it does now if NFD is used).
      >> Either Vim should ensure that all buffers are normalized to composed
      >> form NFC/NFKC or it needs to be made "NFD aware".
      >
      > I agree with your assessment.
      >
      >> Does anybody on the vim_multibyte list (this mail goes to vim_mac as
      >> well) have any comments on this?
      >
      > The relevant Mac OS X routine APIs are:
      >
      > CFURLRef url =
      > CFURLCreateWithFileSystemPath(
      >  kCFAllocatorDefault,
      >  cfstringFullPath,
      >  kCFURLPOSIXPathStyle,
      >  false));
      >
      > char bufferUTF8[32768*4]; // Worst case scenario.
      > // As per Apple documentation, paths can be "up to 30,000 UTF-16
      > // encoding units long", with each component being up to 255 UTF-16
      > // encoding units long.  Too bad there isn't an API to specify the
      > // exact buffer size /a priori/.
      >
      > Boolean success =
      > CFURLGetFileSystemRepresentation(
      >  url,
      >  true,
      >  &bufferUTF8[0],
      >  sizeof bufferUTF8);

      Thanks. NSString has a method called fileSystemRepresentation which
      I'm guessing does the same thing(?). I used the NSString method
      precomposedStringWithCompatibilityMapping to convert to NFKC.

      Björn

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • John (Eljay) Love-Jensen
      Hi Björn, ... Drat, I cannot find the MSDN reference. Maybe my memory has failed me. NFKC is lossy. NFC is non-lossy. Perhaps you are remembering the
      Message 2 of 12 , Jun 24 6:09 AM
      • 0 Attachment
        Hi Björn,

        > I read that Windows uses NFKC. Have you got a reference for the claim
        > that NFC is used?

        Drat, I cannot find the MSDN reference. Maybe my memory has failed me.

        NFKC is lossy. NFC is non-lossy.

        Perhaps you are remembering the security information:
        http://msdn.microsoft.com/en-us/library/dd374047(VS.85).aspx#SC_Unicode

        File Names, Paths, and Namespaces information is here:
        http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx

        Note that modern UNC (starts with "\\?\" (for paths) or with "\\.\" (for
        volumes) -- such as "\\?\C:\Dir\Sub\File.ext", and up to 32,767 UTF-16
        encoding units (Vista), or UCS-2 characters (XP), using 16-bit encoding of
        Unicode) is different from older "short" UNC (DOS-era limit of 260 8-bit
        characters dependent on the OS code page setting).

        The NFC is mentioned here in a MSDN blog:
        http://blogs.msdn.com/michkap/archive/2006/12/07/1232365.aspx

        But I don't consider that canonical, since it was in a blog feedback
        comment.

        I asked for clarification on the MSDN "File Names, Paths, and Namespaces"
        page, in the comments section.

        NOTE: "short" UNC and "old" DOS style has to abide by the OS code page
        setting. Even when using the FooW routines and wchar_t (16-bit) paths.

        > Thanks. NSString has a method called fileSystemRepresentation which
        > I'm guessing does the same thing(?). I used the NSString method
        > precomposedStringWithCompatibilityMapping to convert to NFKC.

        I presume so. My Cocoa experience is not as extensive as my Carbon
        experience.

        Sincerely,
        --Eljay


        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_multibyte" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Tony Mechelynck
        ... Hm, NFKC and NFKD sometimes fuse slightly different glyphs into a single normalized form. For instance, NFKC(²) = 2, though both are (different) Latin1
        Message 3 of 12 , Jun 24 6:33 AM
        • 0 Attachment
          On 24/06/09 14:00, björn wrote:
          >
          > Hi Eljay,
          >
          > 2009/6/23 John (Eljay) Love-Jensen:
          >>
          >>> As far as I can tell (from searching around) HFS+ always uses
          >>> normalization form D (NFD) for filenames.
          >>
          >> HFS+ uses a variant of NFD for filenames. (The HFS+ variant predates
          >> standardizatoin of NFD.) This requirement is enforced by the OS.
          >>
          >> http://developer.apple.com/technotes/tn/tn1150.html
          >> http://developer.apple.com/technotes/tn/tn1150table.html
          >> http://developer.apple.com/qa/qa2001/qa1235.html
          >> http://www.unicode.org/reports/tr15/
          >
          > Thanks for clarifying that (and for the links!).
          >
          >> Windows uses NFC for filenames. I'm not sure if the Linux world settled on
          >> NFC or NFK.
          >
          > I read that Windows uses NFKC. Have you got a reference for the claim
          > that NFC is used?
          >
          >>> So as a workaround for the issue the OP had I now normalize filenames
          >>> to compatibility form C (NFKC) before passing the filename on to Vim
          >>> and this takes care of the OP's problem.
          >>
          >> NFC or NFKC? Those are different normalizations.
          >>
          >> Windows NTFS file system uses NFC. But it isn't enforced by the OS, yet.
          >
          > I did mean the compatibility form NFKC since I read somewhere that
          > NTFS uses NFKC, but I did not research that very carefully.
          >
          >
          >>> However, as I see it this really is a legitimate issue in Vim itself
          >>> in that it does not handle NFD properly (the example above should
          >>> always render as one glyph, not three as it does now if NFD is used).
          >>> Either Vim should ensure that all buffers are normalized to composed
          >>> form NFC/NFKC or it needs to be made "NFD aware".
          >>
          >> I agree with your assessment.
          >>
          >>> Does anybody on the vim_multibyte list (this mail goes to vim_mac as
          >>> well) have any comments on this?
          >>
          >> The relevant Mac OS X routine APIs are:
          >>
          >> CFURLRef url =
          >> CFURLCreateWithFileSystemPath(
          >> kCFAllocatorDefault,
          >> cfstringFullPath,
          >> kCFURLPOSIXPathStyle,
          >> false));
          >>
          >> char bufferUTF8[32768*4]; // Worst case scenario.
          >> // As per Apple documentation, paths can be "up to 30,000 UTF-16
          >> // encoding units long", with each component being up to 255 UTF-16
          >> // encoding units long. Too bad there isn't an API to specify the
          >> // exact buffer size /a priori/.
          >>
          >> Boolean success =
          >> CFURLGetFileSystemRepresentation(
          >> url,
          >> true,
          >> &bufferUTF8[0],
          >> sizeof bufferUTF8);
          >
          > Thanks. NSString has a method called fileSystemRepresentation which
          > I'm guessing does the same thing(?). I used the NSString method
          > precomposedStringWithCompatibilityMapping to convert to NFKC.
          >
          > Björn

          Hm, NFKC and NFKD sometimes fuse slightly different glyphs into a single
          "normalized" form. For instance, NFKC(²) = 2, though both are
          (different) Latin1 characters (0xB2 and 0x32). IIRC, DOS would have kept
          them distinct.

          Best regards,
          Tony.
          --
          hundred-and-one symptoms of being an internet addict:
          56. You leave the modem speaker on after connecting because you think it
          sounds like the ocean wind...the perfect soundtrack for "surfing
          the net".

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_multibyte" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        Your message has been successfully submitted and would be delivered to recipients shortly.