Loading ...
Sorry, an error occurred while loading the content.

Re: Failed to drag&drop-open a file with wide-chars in its filename

Expand Messages
  • Tony Mechelynck
    On Jun 19, 11:22 pm, björn wrote: [...] ... The OP s problem was about a file with 한 as (part of) the filename, not as the
    Message 1 of 12 , Jun 20, 2009
    View Source
    • 0 Attachment
      On Jun 19, 11:22 pm, björn <bjorn.winck...@...> wrote:
      [...]
      > I'm afraid I know way too little about text rendering to fix this, so
      > until somebody else fixes it in core Vim the problem will remain.  I
      > would highly suggest that you take up the rendering problem on the
      > vim_dev mailing list (just send a text file with 한 as the content and
      > ask why it renders as three glyphs).
      >
      > Sorry,
      > Björn

      The OP's problem was about a file with 한 as (part of) the filename,
      not as the content.

      I'm on Linux, so what I see may be different from what you see on
      MacVim; but in gvim I see 한 (when in the content of a file) as one
      glyph (U+D55C, corresponding to three bytes, hex ED 95 9C). Are you
      sure you have 'encoding' correctly set? (I use utf-8).


      Best regards,
      Tony.
      --
      Swipple's Rule of Order:
      He who shouts the loudest has the floor.

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • björn
      ... Tony, I appreciate that you reply to my post, but there really is no need in stating the painfully obvious. The problem appears when the filename is
      Message 2 of 12 , Jun 20, 2009
      View Source
      • 0 Attachment
        2009/6/20 Tony Mechelynck:
        >
        > On Jun 19, 11:22 pm, björn <bjorn.winck...@...> wrote:
        > [...]
        >> I'm afraid I know way too little about text rendering to fix this, so
        >> until somebody else fixes it in core Vim the problem will remain.  I
        >> would highly suggest that you take up the rendering problem on the
        >> vim_dev mailing list (just send a text file with 한 as the content and
        >> ask why it renders as three glyphs).
        >>
        >> Sorry,
        >> Björn
        >
        > The OP's problem was about a file with 한 as (part of) the filename,
        > not as the content.

        Tony,

        I appreciate that you reply to my post, but there really is no need in
        stating the painfully obvious. The problem appears when the filename
        is displayed in the command line (as a result of opening the file) and
        as such you run into the same problem if you have that character as
        part of the contents of a file.

        > I'm on Linux, so what I see may be different from what you see on
        > MacVim; but in gvim I see 한 (when in the content of a file) as one
        > glyph (U+D55C, corresponding to three bytes, hex ED 95 9C). Are you
        > sure you have 'encoding' correctly set? (I use utf-8).

        I tried it myself on Linux and had the same problem and realized that
        the problem has to do with how you represent 한. If done as you
        suggest with U+D55C it works (both Linux and MacVim), but if
        represented by U+1112, U+1161, U+11AB then Vim will render it as three
        glyphs but here the Cocoa text system combines these into one glyph
        and that is where the problem in MacVim appears. (By the way: MacVim
        defaults to use utf-8 for 'encoding'.)

        So in a way the problem is related to having 한 in a filename since Mac
        OS X apparently represents it as U+1112, U+1161, U+11AB instead of as
        U+D55C. Still, if one were to enter those three characters separately
        in a buffer the same problem would arise. As far as I see it this
        only means that the Cocoa text system is not suitable for this purpose
        which only means that we will have to migrate to the ATSUI or CoreText
        renderers sometime in the future.

        To conclude: it seems that this is a problem with the Cocoa text
        system and not that something in Vim has to be "fixed" as I stated in
        my previous post (unless Vim should do the same as Cocoa and
        automatically render ᄒ,ᅡ,ᆫ as 한).

        Björn

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_multibyte" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Tony Mechelynck
        ... Ah, I see. I entered it in Vim by copy-paste from your previous post in the vim_mac Google Group page in my browser. Vim is obviously unaware of hangul
        Message 3 of 12 , Jun 20, 2009
        View Source
        • 0 Attachment
          On 20/06/09 19:58, björn wrote:
          >
          > 2009/6/20 Tony Mechelynck:
          >>
          >> On Jun 19, 11:22 pm, björn<bjorn.winck...@...> wrote:
          >> [...]
          >>> I'm afraid I know way too little about text rendering to fix this, so
          >>> until somebody else fixes it in core Vim the problem will remain. I
          >>> would highly suggest that you take up the rendering problem on the
          >>> vim_dev mailing list (just send a text file with 한 as the content and
          >>> ask why it renders as three glyphs).
          >>>
          >>> Sorry,
          >>> Björn
          >>
          >> The OP's problem was about a file with 한 as (part of) the filename,
          >> not as the content.
          >
          > Tony,
          >
          > I appreciate that you reply to my post, but there really is no need in
          > stating the painfully obvious. The problem appears when the filename
          > is displayed in the command line (as a result of opening the file) and
          > as such you run into the same problem if you have that character as
          > part of the contents of a file.
          >
          >> I'm on Linux, so what I see may be different from what you see on
          >> MacVim; but in gvim I see 한 (when in the content of a file) as one
          >> glyph (U+D55C, corresponding to three bytes, hex ED 95 9C). Are you
          >> sure you have 'encoding' correctly set? (I use utf-8).
          >
          > I tried it myself on Linux and had the same problem and realized that
          > the problem has to do with how you represent 한. If done as you
          > suggest with U+D55C it works (both Linux and MacVim), but if
          > represented by U+1112, U+1161, U+11AB then Vim will render it as three
          > glyphs but here the Cocoa text system combines these into one glyph
          > and that is where the problem in MacVim appears. (By the way: MacVim
          > defaults to use utf-8 for 'encoding'.)

          Ah, I see. I entered it in Vim by copy-paste from your previous post in
          the vim_mac Google Group page in my browser.

          Vim is obviously unaware of hangul jamo decomposition / recomposition
          and IIUC will render each of them as one glyph. I'm not sure how to have
          them be treated as "one spacing + (in this case) 2 composing characters"
          though IIUC it would be "the right way" to do it.

          >
          > So in a way the problem is related to having 한 in a filename since Mac
          > OS X apparently represents it as U+1112, U+1161, U+11AB instead of as
          > U+D55C. Still, if one were to enter those three characters separately
          > in a buffer the same problem would arise. As far as I see it this
          > only means that the Cocoa text system is not suitable for this purpose
          > which only means that we will have to migrate to the ATSUI or CoreText
          > renderers sometime in the future.
          >
          > To conclude: it seems that this is a problem with the Cocoa text
          > system and not that something in Vim has to be "fixed" as I stated in
          > my previous post (unless Vim should do the same as Cocoa and
          > automatically render ᄒ,ᅡ,ᆫ as 한).
          >
          > Björn

          Well, sorry I can't help you.


          Best regards,
          Tony.
          --
          Really heard in court in the U.S.A.:
          Q.: Doctor, before you started the autopsy, did you check the pulse?
          A.: No, I didn't.
          Q.: Did you test the blood pressure?
          A.: No, I didn't.
          Q.: Did you check the breathing?
          A.: No, I didn't.
          Q.: Then there is a possibility that you autopsied a living person?
          A.: No, there isn't.
          Q.: How can you be so sure, Doctor?
          A.: Because his brain was in a jar on my desk.
          Q.: I see. But couldn't the patient be still alive nevertheless?
          A.: Hm, yes, he could still be alive, practicing as a lawyer.

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_multibyte" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • Andrew Dunbar
          ... Hangul jamo (de)composition is part of Unicode normalization. Do we know if OS X does Unicode for all characters or just for Korean? I suspect it is done
          Message 4 of 12 , Jun 20, 2009
          View Source
          • 0 Attachment
            2009/6/20 Tony Mechelynck <antoine.mechelynck@...>:
            >
            > On 20/06/09 19:58, björn wrote:
            >>
            >> 2009/6/20 Tony Mechelynck:
            >>>
            >>> On Jun 19, 11:22 pm, björn<bjorn.winck...@...>  wrote:
            >>> [...]
            >>>> I'm afraid I know way too little about text rendering to fix this, so
            >>>> until somebody else fixes it in core Vim the problem will remain.  I
            >>>> would highly suggest that you take up the rendering problem on the
            >>>> vim_dev mailing list (just send a text file with 한 as the content and
            >>>> ask why it renders as three glyphs).
            >>>>
            >>>> Sorry,
            >>>> Björn
            >>>
            >>> The OP's problem was about a file with 한 as (part of) the filename,
            >>> not as the content.
            >>
            >> Tony,
            >>
            >> I appreciate that you reply to my post, but there really is no need in
            >> stating the painfully obvious.  The problem appears when the filename
            >> is displayed in the command line (as a result of opening the file) and
            >> as such you run into the same problem if you have that character as
            >> part of the contents of a file.
            >>
            >>> I'm on Linux, so what I see may be different from what you see on
            >>> MacVim; but in gvim I see 한 (when in the content of a file) as one
            >>> glyph (U+D55C, corresponding to three bytes, hex ED 95 9C). Are you
            >>> sure you have 'encoding' correctly set? (I use utf-8).
            >>
            >> I tried it myself on Linux and had the same problem and realized that
            >> the problem has to do with how you represent 한.  If done as you
            >> suggest with U+D55C it works (both Linux and MacVim), but if
            >> represented by U+1112, U+1161, U+11AB then Vim will render it as three
            >> glyphs but here the Cocoa text system combines these into one glyph
            >> and that is where the problem in MacVim appears.  (By the way: MacVim
            >> defaults to use utf-8 for 'encoding'.)
            >
            > Ah, I see. I entered it in Vim by copy-paste from your previous post in
            > the vim_mac Google Group page in my browser.
            >
            > Vim is obviously unaware of hangul jamo decomposition / recomposition
            > and IIUC will render each of them as one glyph. I'm not sure how to have
            > them be treated as "one spacing + (in this case) 2 composing characters"
            > though IIUC it would be "the right way" to do it.
            >
            >>
            >> So in a way the problem is related to having 한 in a filename since Mac
            >> OS X apparently represents it as U+1112, U+1161, U+11AB instead of as
            >> U+D55C.  Still, if one were to enter those three characters separately
            >> in a buffer the same problem would arise.  As far as I see it this
            >> only means that the Cocoa text system is not suitable for this purpose
            >> which only means that we will have to migrate to the ATSUI or CoreText
            >> renderers sometime in the future.
            >>
            >> To conclude: it seems that this is a problem with the Cocoa text
            >> system and not that something in Vim has to be "fixed" as I stated in
            >> my previous post (unless Vim should do the same as Cocoa and
            >> automatically render ᄒ,ᅡ,ᆫ as 한).
            >>
            >> Björn

            Hangul jamo (de)composition is part of Unicode normalization. Do we know
            if OS X does Unicode for all characters or just for Korean? I suspect it is
            done for all characters to prevent two identical looking filenames which differ
            only in Unicode normalization. A good language to test this with would be
            Vietnamese which uses Latin script with up to three "accents" per character.

            Unicode normalization might be a feature of the HFS+ filesystem as there is
            a general problem in computing of encodings vs. filesystems.

            Andrew Dunbar (hippietrail)

            > Well, sorry I can't help you.
            >
            >
            > Best regards,
            > Tony.
            > --
            > Really heard in court in the U.S.A.:
            > Q.: Doctor, before you started the autopsy, did you check the pulse?
            > A.: No, I didn't.
            > Q.: Did you test the blood pressure?
            > A.: No, I didn't.
            > Q.: Did you check the breathing?
            > A.: No, I didn't.
            > Q.: Then there is a possibility that you autopsied a living person?
            > A.: No, there isn't.
            > Q.: How can you be so sure, Doctor?
            > A.: Because his brain was in a jar on my desk.
            > Q.: I see. But couldn't the patient be still alive nevertheless?
            > A.: Hm, yes, he could still be alive, practicing as a lawyer.
            >
            > >
            >



            --
            http://wiktionarydev.leuksman.com http://linguaphile.sf.net

            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_multibyte" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • björn
            ... Hi Andrew, As far as I can tell (from searching around) HFS+ always uses normalization form D (NFD) for filenames. So as a workaround for the issue the OP
            Message 5 of 12 , Jun 23, 2009
            View Source
            • 0 Attachment
              2009/6/21 Andrew Dunbar:
              > 2009/6/20 Tony Mechelynck:
              >> On 20/06/09 19:58, björn wrote:
              >>>
              >>> I tried it myself on Linux and had the same problem and realized that
              >>> the problem has to do with how you represent 한.  If done as you
              >>> suggest with U+D55C it works (both Linux and MacVim), but if
              >>> represented by U+1112, U+1161, U+11AB then Vim will render it as three
              >>> glyphs but here the Cocoa text system combines these into one glyph
              >>> and that is where the problem in MacVim appears.  (By the way: MacVim
              >>> defaults to use utf-8 for 'encoding'.)
              >>
              >> Ah, I see. I entered it in Vim by copy-paste from your previous post in
              >> the vim_mac Google Group page in my browser.
              >>
              >> Vim is obviously unaware of hangul jamo decomposition / recomposition
              >> and IIUC will render each of them as one glyph. I'm not sure how to have
              >> them be treated as "one spacing + (in this case) 2 composing characters"
              >> though IIUC it would be "the right way" to do it.
              >
              > Hangul jamo (de)composition is part of Unicode normalization. Do we know
              > if OS X does Unicode for all characters or just for Korean? I suspect it is
              > done for all characters to prevent two identical looking filenames which differ
              > only in Unicode normalization. A good language to test this with would be
              > Vietnamese which uses Latin script with up to three "accents" per character.
              >
              > Unicode normalization might be a feature of the HFS+ filesystem as there is
              > a general problem in computing of encodings vs. filesystems.

              Hi Andrew,

              As far as I can tell (from searching around) HFS+ always uses
              normalization form D (NFD) for filenames. So as a workaround for the
              issue the OP had I now normalize filenames to compatibility form C
              (NFKC) before passing the filename on to Vim and this takes care of
              the OP's problem.

              However, as I see it this really is a legitimate issue in Vim itself
              in that it does not handle NFD properly (the example above should
              always render as one glyph, not three as it does now if NFD is used).
              Either Vim should ensure that all buffers are normalized to composed
              form NFC/NFKC or it needs to be made "NFD aware". Does anybody on the
              vim_multibyte list (this mail goes to vim_mac as well) have any
              comments on this?

              Björn

              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_multibyte" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • John (Eljay) Love-Jensen
              Hi Björn, ... HFS+ uses a variant of NFD for filenames. (The HFS+ variant predates standardizatoin of NFD.) This requirement is enforced by the OS.
              Message 6 of 12 , Jun 23, 2009
              View Source
              • 0 Attachment
                Hi Björn,

                > As far as I can tell (from searching around) HFS+ always uses
                > normalization form D (NFD) for filenames.

                HFS+ uses a variant of NFD for filenames. (The HFS+ variant predates
                standardizatoin of NFD.) This requirement is enforced by the OS.

                http://developer.apple.com/technotes/tn/tn1150.html
                http://developer.apple.com/technotes/tn/tn1150table.html
                http://developer.apple.com/qa/qa2001/qa1235.html
                http://www.unicode.org/reports/tr15/

                Windows uses NFC for filenames. I'm not sure if the Linux world settled on
                NFC or NFK.

                Amiga OS (at least the one I used) is ECMA 94 Latin 1 based (precursor to
                ISO 8859-1).

                > So as a workaround for the issue the OP had I now normalize filenames
                > to compatibility form C (NFKC) before passing the filename on to Vim
                > and this takes care of the OP's problem.

                NFC or NFKC? Those are different normalizations.

                Windows NTFS file system uses NFC. But it isn't enforced by the OS, yet.

                > However, as I see it this really is a legitimate issue in Vim itself
                > in that it does not handle NFD properly (the example above should
                > always render as one glyph, not three as it does now if NFD is used).
                > Either Vim should ensure that all buffers are normalized to composed
                > form NFC/NFKC or it needs to be made "NFD aware".

                I agree with your assessment.

                > Does anybody on the vim_multibyte list (this mail goes to vim_mac as
                > well) have any comments on this?

                The relevant Mac OS X routine APIs are:

                CFURLRef url =
                CFURLCreateWithFileSystemPath(
                kCFAllocatorDefault,
                cfstringFullPath,
                kCFURLPOSIXPathStyle,
                false));

                char bufferUTF8[32768*4]; // Worst case scenario.
                // As per Apple documentation, paths can be "up to 30,000 UTF-16
                // encoding units long", with each component being up to 255 UTF-16
                // encoding units long. Too bad there isn't an API to specify the
                // exact buffer size /a priori/.

                Boolean success =
                CFURLGetFileSystemRepresentation(
                url,
                true,
                &bufferUTF8[0],
                sizeof bufferUTF8);

                Sincerely,
                --Eljay


                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_multibyte" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---
              • John (Eljay) Love-Jensen
                ... I meant: ... NFC or NFD. Fat fingers. --Eljay --~--~---------~--~----~------------~-------~--~----~ You received this message from the vim_multibyte
                Message 7 of 12 , Jun 23, 2009
                View Source
                • 0 Attachment
                  > Windows uses NFC for filenames. I'm not sure if the Linux world settled on
                  > NFC or NFK.

                  I meant: ... NFC or NFD.

                  Fat fingers.

                  --Eljay


                  --~--~---------~--~----~------------~-------~--~----~
                  You received this message from the "vim_multibyte" maillist.
                  For more information, visit http://www.vim.org/maillist.php
                  -~----------~----~----~----~------~----~------~--~---
                • Andrew Dunbar
                  ... When I worked on AbiWord a few years ago Linux left filename encoding up to the filesystem and the user. This may have changed since... Linux supports many
                  Message 8 of 12 , Jun 23, 2009
                  View Source
                  • 0 Attachment
                    2009/6/23 John (Eljay) Love-Jensen <eljay@...>:
                    >
                    > Hi Björn,
                    >
                    >> As far as I can tell (from searching around) HFS+ always uses
                    >> normalization form D (NFD) for filenames.
                    >
                    > HFS+ uses a variant of NFD for filenames.  (The HFS+ variant predates
                    > standardizatoin of NFD.)  This requirement is enforced by the OS.
                    >
                    > http://developer.apple.com/technotes/tn/tn1150.html
                    > http://developer.apple.com/technotes/tn/tn1150table.html
                    > http://developer.apple.com/qa/qa2001/qa1235.html
                    > http://www.unicode.org/reports/tr15/
                    >
                    > Windows uses NFC for filenames.  I'm not sure if the Linux world settled on
                    > NFC or NFK.

                    When I worked on AbiWord a few years ago Linux left filename encoding
                    up to the filesystem and the user. This may have changed since...

                    Linux supports many filesystems including Windows and Mac filesystems.
                    For filesystems which mandate a specific encoding Linux should follow
                    those rules. For older filesystems the encoding would generally be the
                    encoding of the OS but... Linux as Unix is a multisuer OS and may have
                    various users using various languages in various encodings. Each user
                    gets to decide their language and encoding through enviroment
                    variables such as LANG, LC_ALL, LC_COLLATE etc. These vary by vintage
                    of the OS and may well vary for other Unixes too such as FreeBSD.

                    I think Linux generally uses extN filesytems as default. When I was
                    last working with it that was ext2 but ext3 has now been in use for
                    some time and ext4 is the current iteration which may or may not be in
                    general release. The ext3 or ext4 filesystems may mandate an encoding
                    that ext2 did not.

                    The general soltion for the Unix/Linux world may be to honour the
                    user's locale settings and assume that the filesystem software will
                    convert to any specifically mandated encoding it requires when you
                    call the standard open() etc APIs.

                    But further research is definitely recommended!

                    Andrew Dunbar.


                    > Amiga OS (at least the one I used) is ECMA 94 Latin 1 based (precursor to
                    > ISO 8859-1).
                    >
                    >> So as a workaround for the issue the OP had I now normalize filenames
                    >> to compatibility form C (NFKC) before passing the filename on to Vim
                    >> and this takes care of the OP's problem.
                    >
                    > NFC or NFKC?  Those are different normalizations.
                    >
                    > Windows NTFS file system uses NFC.  But it isn't enforced by the OS, yet.
                    >
                    >> However, as I see it this really is a legitimate issue in Vim itself
                    >> in that it does not handle NFD properly (the example above should
                    >> always render as one glyph, not three as it does now if NFD is used).
                    >> Either Vim should ensure that all buffers are normalized to composed
                    >> form NFC/NFKC or it needs to be made "NFD aware".
                    >
                    > I agree with your assessment.
                    >
                    >> Does anybody on the vim_multibyte list (this mail goes to vim_mac as
                    >> well) have any comments on this?
                    >
                    > The relevant Mac OS X routine APIs are:
                    >
                    > CFURLRef url =
                    > CFURLCreateWithFileSystemPath(
                    >  kCFAllocatorDefault,
                    >  cfstringFullPath,
                    >  kCFURLPOSIXPathStyle,
                    >  false));
                    >
                    > char bufferUTF8[32768*4]; // Worst case scenario.
                    > // As per Apple documentation, paths can be "up to 30,000 UTF-16
                    > // encoding units long", with each component being up to 255 UTF-16
                    > // encoding units long.  Too bad there isn't an API to specify the
                    > // exact buffer size /a priori/.
                    >
                    > Boolean success =
                    > CFURLGetFileSystemRepresentation(
                    >  url,
                    >  true,
                    >  &bufferUTF8[0],
                    >  sizeof bufferUTF8);
                    >
                    > Sincerely,
                    > --Eljay
                    >
                    >
                    > >
                    >



                    --
                    http://wiktionarydev.leuksman.com http://linguaphile.sf.net

                    --~--~---------~--~----~------------~-------~--~----~
                    You received this message from the "vim_multibyte" maillist.
                    For more information, visit http://www.vim.org/maillist.php
                    -~----------~----~----~----~------~----~------~--~---
                  • Nico Weber
                    ... I m pretty sure it hasn t. As far as I know, for linux a filename is just a bunch of bytes, and you only need to know the encoding for lesser tasks such as
                    Message 9 of 12 , Jun 23, 2009
                    View Source
                    • 0 Attachment
                      >> Windows uses NFC for filenames. I'm not sure if the Linux world
                      >> settled on
                      >> NFC or NFK.
                      >
                      > When I worked on AbiWord a few years ago Linux left filename encoding
                      > up to the filesystem and the user. This may have changed since...


                      I'm pretty sure it hasn't. As far as I know, for linux a filename is
                      just a bunch of bytes, and you only need to know the encoding for
                      lesser tasks such as file name display anyway ;-) In that case, the
                      recommended way is to get the encoding from an env var.

                      Nico

                      --~--~---------~--~----~------------~-------~--~----~
                      You received this message from the "vim_multibyte" maillist.
                      For more information, visit http://www.vim.org/maillist.php
                      -~----------~----~----~----~------~----~------~--~---
                    • björn
                      Hi Eljay, ... Thanks for clarifying that (and for the links!). ... I read that Windows uses NFKC. Have you got a reference for the claim that NFC is used? ...
                      Message 10 of 12 , Jun 24, 2009
                      View Source
                      • 0 Attachment
                        Hi Eljay,

                        2009/6/23 John (Eljay) Love-Jensen:
                        >
                        >> As far as I can tell (from searching around) HFS+ always uses
                        >> normalization form D (NFD) for filenames.
                        >
                        > HFS+ uses a variant of NFD for filenames.  (The HFS+ variant predates
                        > standardizatoin of NFD.)  This requirement is enforced by the OS.
                        >
                        > http://developer.apple.com/technotes/tn/tn1150.html
                        > http://developer.apple.com/technotes/tn/tn1150table.html
                        > http://developer.apple.com/qa/qa2001/qa1235.html
                        > http://www.unicode.org/reports/tr15/

                        Thanks for clarifying that (and for the links!).

                        > Windows uses NFC for filenames.  I'm not sure if the Linux world settled on
                        > NFC or NFK.

                        I read that Windows uses NFKC. Have you got a reference for the claim
                        that NFC is used?

                        >> So as a workaround for the issue the OP had I now normalize filenames
                        >> to compatibility form C (NFKC) before passing the filename on to Vim
                        >> and this takes care of the OP's problem.
                        >
                        > NFC or NFKC?  Those are different normalizations.
                        >
                        > Windows NTFS file system uses NFC.  But it isn't enforced by the OS, yet.

                        I did mean the compatibility form NFKC since I read somewhere that
                        NTFS uses NFKC, but I did not research that very carefully.


                        >> However, as I see it this really is a legitimate issue in Vim itself
                        >> in that it does not handle NFD properly (the example above should
                        >> always render as one glyph, not three as it does now if NFD is used).
                        >> Either Vim should ensure that all buffers are normalized to composed
                        >> form NFC/NFKC or it needs to be made "NFD aware".
                        >
                        > I agree with your assessment.
                        >
                        >> Does anybody on the vim_multibyte list (this mail goes to vim_mac as
                        >> well) have any comments on this?
                        >
                        > The relevant Mac OS X routine APIs are:
                        >
                        > CFURLRef url =
                        > CFURLCreateWithFileSystemPath(
                        >  kCFAllocatorDefault,
                        >  cfstringFullPath,
                        >  kCFURLPOSIXPathStyle,
                        >  false));
                        >
                        > char bufferUTF8[32768*4]; // Worst case scenario.
                        > // As per Apple documentation, paths can be "up to 30,000 UTF-16
                        > // encoding units long", with each component being up to 255 UTF-16
                        > // encoding units long.  Too bad there isn't an API to specify the
                        > // exact buffer size /a priori/.
                        >
                        > Boolean success =
                        > CFURLGetFileSystemRepresentation(
                        >  url,
                        >  true,
                        >  &bufferUTF8[0],
                        >  sizeof bufferUTF8);

                        Thanks. NSString has a method called fileSystemRepresentation which
                        I'm guessing does the same thing(?). I used the NSString method
                        precomposedStringWithCompatibilityMapping to convert to NFKC.

                        Björn

                        --~--~---------~--~----~------------~-------~--~----~
                        You received this message from the "vim_multibyte" maillist.
                        For more information, visit http://www.vim.org/maillist.php
                        -~----------~----~----~----~------~----~------~--~---
                      • John (Eljay) Love-Jensen
                        Hi Björn, ... Drat, I cannot find the MSDN reference. Maybe my memory has failed me. NFKC is lossy. NFC is non-lossy. Perhaps you are remembering the
                        Message 11 of 12 , Jun 24, 2009
                        View Source
                        • 0 Attachment
                          Hi Björn,

                          > I read that Windows uses NFKC. Have you got a reference for the claim
                          > that NFC is used?

                          Drat, I cannot find the MSDN reference. Maybe my memory has failed me.

                          NFKC is lossy. NFC is non-lossy.

                          Perhaps you are remembering the security information:
                          http://msdn.microsoft.com/en-us/library/dd374047(VS.85).aspx#SC_Unicode

                          File Names, Paths, and Namespaces information is here:
                          http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx

                          Note that modern UNC (starts with "\\?\" (for paths) or with "\\.\" (for
                          volumes) -- such as "\\?\C:\Dir\Sub\File.ext", and up to 32,767 UTF-16
                          encoding units (Vista), or UCS-2 characters (XP), using 16-bit encoding of
                          Unicode) is different from older "short" UNC (DOS-era limit of 260 8-bit
                          characters dependent on the OS code page setting).

                          The NFC is mentioned here in a MSDN blog:
                          http://blogs.msdn.com/michkap/archive/2006/12/07/1232365.aspx

                          But I don't consider that canonical, since it was in a blog feedback
                          comment.

                          I asked for clarification on the MSDN "File Names, Paths, and Namespaces"
                          page, in the comments section.

                          NOTE: "short" UNC and "old" DOS style has to abide by the OS code page
                          setting. Even when using the FooW routines and wchar_t (16-bit) paths.

                          > Thanks. NSString has a method called fileSystemRepresentation which
                          > I'm guessing does the same thing(?). I used the NSString method
                          > precomposedStringWithCompatibilityMapping to convert to NFKC.

                          I presume so. My Cocoa experience is not as extensive as my Carbon
                          experience.

                          Sincerely,
                          --Eljay


                          --~--~---------~--~----~------------~-------~--~----~
                          You received this message from the "vim_multibyte" maillist.
                          For more information, visit http://www.vim.org/maillist.php
                          -~----------~----~----~----~------~----~------~--~---
                        • Tony Mechelynck
                          ... Hm, NFKC and NFKD sometimes fuse slightly different glyphs into a single normalized form. For instance, NFKC(²) = 2, though both are (different) Latin1
                          Message 12 of 12 , Jun 24, 2009
                          View Source
                          • 0 Attachment
                            On 24/06/09 14:00, björn wrote:
                            >
                            > Hi Eljay,
                            >
                            > 2009/6/23 John (Eljay) Love-Jensen:
                            >>
                            >>> As far as I can tell (from searching around) HFS+ always uses
                            >>> normalization form D (NFD) for filenames.
                            >>
                            >> HFS+ uses a variant of NFD for filenames. (The HFS+ variant predates
                            >> standardizatoin of NFD.) This requirement is enforced by the OS.
                            >>
                            >> http://developer.apple.com/technotes/tn/tn1150.html
                            >> http://developer.apple.com/technotes/tn/tn1150table.html
                            >> http://developer.apple.com/qa/qa2001/qa1235.html
                            >> http://www.unicode.org/reports/tr15/
                            >
                            > Thanks for clarifying that (and for the links!).
                            >
                            >> Windows uses NFC for filenames. I'm not sure if the Linux world settled on
                            >> NFC or NFK.
                            >
                            > I read that Windows uses NFKC. Have you got a reference for the claim
                            > that NFC is used?
                            >
                            >>> So as a workaround for the issue the OP had I now normalize filenames
                            >>> to compatibility form C (NFKC) before passing the filename on to Vim
                            >>> and this takes care of the OP's problem.
                            >>
                            >> NFC or NFKC? Those are different normalizations.
                            >>
                            >> Windows NTFS file system uses NFC. But it isn't enforced by the OS, yet.
                            >
                            > I did mean the compatibility form NFKC since I read somewhere that
                            > NTFS uses NFKC, but I did not research that very carefully.
                            >
                            >
                            >>> However, as I see it this really is a legitimate issue in Vim itself
                            >>> in that it does not handle NFD properly (the example above should
                            >>> always render as one glyph, not three as it does now if NFD is used).
                            >>> Either Vim should ensure that all buffers are normalized to composed
                            >>> form NFC/NFKC or it needs to be made "NFD aware".
                            >>
                            >> I agree with your assessment.
                            >>
                            >>> Does anybody on the vim_multibyte list (this mail goes to vim_mac as
                            >>> well) have any comments on this?
                            >>
                            >> The relevant Mac OS X routine APIs are:
                            >>
                            >> CFURLRef url =
                            >> CFURLCreateWithFileSystemPath(
                            >> kCFAllocatorDefault,
                            >> cfstringFullPath,
                            >> kCFURLPOSIXPathStyle,
                            >> false));
                            >>
                            >> char bufferUTF8[32768*4]; // Worst case scenario.
                            >> // As per Apple documentation, paths can be "up to 30,000 UTF-16
                            >> // encoding units long", with each component being up to 255 UTF-16
                            >> // encoding units long. Too bad there isn't an API to specify the
                            >> // exact buffer size /a priori/.
                            >>
                            >> Boolean success =
                            >> CFURLGetFileSystemRepresentation(
                            >> url,
                            >> true,
                            >> &bufferUTF8[0],
                            >> sizeof bufferUTF8);
                            >
                            > Thanks. NSString has a method called fileSystemRepresentation which
                            > I'm guessing does the same thing(?). I used the NSString method
                            > precomposedStringWithCompatibilityMapping to convert to NFKC.
                            >
                            > Björn

                            Hm, NFKC and NFKD sometimes fuse slightly different glyphs into a single
                            "normalized" form. For instance, NFKC(²) = 2, though both are
                            (different) Latin1 characters (0xB2 and 0x32). IIRC, DOS would have kept
                            them distinct.

                            Best regards,
                            Tony.
                            --
                            hundred-and-one symptoms of being an internet addict:
                            56. You leave the modem speaker on after connecting because you think it
                            sounds like the ocean wind...the perfect soundtrack for "surfing
                            the net".

                            --~--~---------~--~----~------------~-------~--~----~
                            You received this message from the "vim_multibyte" maillist.
                            For more information, visit http://www.vim.org/maillist.php
                            -~----------~----~----~----~------~----~------~--~---
                          Your message has been successfully submitted and would be delivered to recipients shortly.