Loading ...
Sorry, an error occurred while loading the content.
 

Re: Woe with MBCS File Names in UTF-8 Mode on Windows

Expand Messages
  • adah@netstd.com
    ... Since I was tracing mch_open out (not from outside in), I soon lost my way. And I was not familiar with the Vim code organization. That is the reason why
    Message 1 of 13 , Jul 1, 2005
      > > I did a trace into Vim, and I found that it was because the `9c' of
      > > e7829c (炜) had been lost before mch_open is called. Could this
      > > give you a clue? Or give me a guidance where I should investigate
      > > further?
      >
      > I would guess that somewhere in the code the DBCS codepage is used to
      > locate the character, instead of using it as UTF-8. Since I don't
      > have a DBCS system, I can't try this.
      >
      > If you are able to see what happens in a debugger then you should be
      > able to follow the route from typing the command to the mch_open()
      > call.

      Since I was tracing mch_open out (not from outside in), I soon lost my
      way. And I was not familiar with the Vim code organization. That is
      the reason why I asked for guidance. I need a starting point to trace
      (where `:w file.txt' is really executed).

      And it is not difficult to change one's system into a DBCS one, as long
      as one has a Windows 2000/XP box with installation files/CD. Just
      install the Far East support and set the default code page in the
      Regional Setting.

      Best regards,

      Yongwei
    • Bram Moolenaar
      ... You can step out of mch_open() to see what happened in the calling function. If you need to step through the code that leads to opening the file you might
      Message 2 of 13 , Jul 1, 2005
        Yongwei wrote:

        > > > I did a trace into Vim, and I found that it was because the `9c' of
        > > > e7829c (ì¿) had been lost before mch_open is called. Could this
        > > > give you a clue? Or give me a guidance where I should investigate
        > > > further?
        > >
        > > I would guess that somewhere in the code the DBCS codepage is used to
        > > locate the character, instead of using it as UTF-8. Since I don't
        > > have a DBCS system, I can't try this.
        > >
        > > If you are able to see what happens in a debugger then you should be
        > > able to follow the route from typing the command to the mch_open()
        > > call.
        >
        > Since I was tracing mch_open out (not from outside in), I soon lost my
        > way. And I was not familiar with the Vim code organization. That is
        > the reason why I asked for guidance. I need a starting point to trace
        > (where `:w file.txt' is really executed).

        You can step out of mch_open() to see what happened in the calling
        function.

        If you need to step through the code that leads to opening the file you
        might want to put a breakpoint in open_buffer(). Check that
        curbuf->b_ffname is right. The file reading is done in readfile().

        --
        hundred-and-one symptoms of being an internet addict:
        202. You're amazed to find out Spam is a food.

        /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
        /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
        \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
        \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html ///
      • adah@netstd.com
        ... I have finally found out the reason. The cause is the _fullpath (which finally calls GetFullPathNameA) in mch_FullName. It is quite normal that the
        Message 3 of 13 , Jul 1, 2005
          Bram wrote:
          >
          > Yongwei wrote:
          >
          > > > > I did a trace into Vim, and I found that it was because the `9c'
          > > > > of e7829c (炜) had been lost before mch_open is called. Could
          > > > > this give you a clue? Or give me a guidance where I should
          > > > > investigate further?
          > > >
          > > > I would guess that somewhere in the code the DBCS codepage is used
          > > > to locate the character, instead of using it as UTF-8. Since I
          > > > don't have a DBCS system, I can't try this.
          > > >
          > > > If you are able to see what happens in a debugger then you should
          > > > be able to follow the route from typing the command to the
          > > > mch_open() call.
          > >
          > > Since I was tracing mch_open out (not from outside in), I soon lost
          > > my way. And I was not familiar with the Vim code organization.
          > > That is the reason why I asked for guidance. I need a starting
          > > point to trace (where `:w file.txt' is really executed).
          >
          > You can step out of mch_open() to see what happened in the calling
          > function.
          >
          > If you need to step through the code that leads to opening the file
          > you might want to put a breakpoint in open_buffer(). Check that
          > curbuf->b_ffname is right. The file reading is done in readfile().

          I have finally found out the reason. The cause is the _fullpath (which
          finally calls GetFullPathNameA) in mch_FullName. It is quite normal
          that the non-Unicode Win32 API requires that file names should be
          provided in native encoding.

          Non-DBCS-system users generally will not feel the problem since valid
          UTF-8 code points are generally valid SBCS (say, Latin1) code points,
          and 炜.txt will be regarded as code points |e7 82 9c 2e 74 78 74|. On
          DBCS systems, |9c2e| is invalid and will become `?' (|3f|).

          To solve this problem, maybe Vim needs to provide its own verion of
          fullpath? Bram, what is your opinion?

          Best regards,

          Yongwei
        • Bram Moolenaar
          ... I m glad you were able to isolate the problem. Vim 7 already included a fix for this. This has been tried out for a while now, thus I think it s safe to
          Message 4 of 13 , Jul 2, 2005
            Yongwei wrote:

            > I have finally found out the reason. The cause is the _fullpath (which
            > finally calls GetFullPathNameA) in mch_FullName. It is quite normal
            > that the non-Unicode Win32 API requires that file names should be
            > provided in native encoding.
            >
            > Non-DBCS-system users generally will not feel the problem since valid
            > UTF-8 code points are generally valid SBCS (say, Latin1) code points,
            > and ì¿.txt will be regarded as code points |e7 82 9c 2e 74 78 74|. On
            > DBCS systems, |9c2e| is invalid and will become `?' (|3f|).
            >
            > To solve this problem, maybe Vim needs to provide its own verion of
            > fullpath? Bram, what is your opinion?

            I'm glad you were able to isolate the problem.

            Vim 7 already included a fix for this. This has been tried out for a
            while now, thus I think it's safe to include in Vim 6.3. Please try out
            this patch. If it works OK for you then I'll release it.

            *** os_mswin.c~ Sun Dec 5 16:39:37 2004
            --- os_mswin.c Sat Jul 2 13:07:35 2005
            ***************
            *** 367,385 ****
            nResult = mch_dirname(buf, len);
            else
            #endif
            - if (_fullpath(buf, fname, len - 1) == NULL)
            {
            ! STRNCPY(buf, fname, len); /* failed, use the relative path name */
            ! buf[len - 1] = NUL;
            ! #ifndef USE_FNAME_CASE
            ! slash_adjust(buf);
            #endif
            }
            - else
            - nResult = OK;

            #ifdef USE_FNAME_CASE
            fname_case(buf, len);
            #endif

            return nResult;
            --- 367,421 ----
            nResult = mch_dirname(buf, len);
            else
            #endif
            {
            ! #ifdef FEAT_MBYTE
            ! if (enc_codepage >= 0 && (int)GetACP() != enc_codepage
            ! # ifdef __BORLANDC__
            ! /* Wide functions of Borland C 5.5 do not work on Windows 98. */
            ! && g_PlatformId == VER_PLATFORM_WIN32_NT
            ! # endif
            ! )
            ! {
            ! WCHAR *wname;
            ! WCHAR wbuf[MAX_PATH];
            ! char_u *cname = NULL;
            !
            ! /* Use the wide function:
            ! * - convert the fname from 'encoding' to UCS2.
            ! * - invoke _wfullpath()
            ! * - convert the result from UCS2 to 'encoding'.
            ! */
            ! wname = enc_to_ucs2(fname, NULL);
            ! if (wname != NULL && _wfullpath(wbuf, wname, MAX_PATH - 1) != NULL)
            ! {
            ! cname = ucs2_to_enc((short_u *)wbuf, NULL);
            ! if (cname != NULL)
            ! {
            ! STRNCPY(buf, cname, len);
            ! buf[len - 1] = NUL;
            ! nResult = OK;
            ! }
            ! }
            ! vim_free(wname);
            ! vim_free(cname);
            ! }
            ! if (nResult == FAIL) /* fall back to non-wide function */
            #endif
            + {
            + if (_fullpath(buf, fname, len - 1) == NULL)
            + {
            + STRNCPY(buf, fname, len); /* failed, use relative path name */
            + buf[len - 1] = NUL;
            + }
            + else
            + nResult = OK;
            + }
            }

            #ifdef USE_FNAME_CASE
            fname_case(buf, len);
            + #else
            + slash_adjust(buf);
            #endif

            return nResult;

            --
            hundred-and-one symptoms of being an internet addict:
            210. When you get a divorce, you don't care about who gets the children,
            but discuss endlessly who can use the email address.

            /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
            /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
            \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
            \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html ///
          • adah@netstd.com
            ... Yes, your patch works like a charm. Thanks, Bram! Best regards, Yongwei
            Message 5 of 13 , Jul 3, 2005
              Bram wrote:
              >
              > Yongwei wrote:
              >
              > > I have finally found out the reason. The cause is the _fullpath
              > > (which finally calls GetFullPathNameA) in mch_FullName. It is quite
              > > normal that the non-Unicode Win32 API requires that file names
              > > should be provided in native encoding.
              > >
              > > Non-DBCS-system users generally will not feel the problem since
              > > valid UTF-8 code points are generally valid SBCS (say, Latin1) code
              > > points, and 炜.txt will be regarded as code points |e7 82 9c 2e 74
              > > 78 74|. On DBCS systems, |9c2e| is invalid and will become `?'
              > > (|3f|).
              > >
              > > To solve this problem, maybe Vim needs to provide its own verion of
              > > fullpath? Bram, what is your opinion?
              >
              > I'm glad you were able to isolate the problem.
              >
              > Vim 7 already included a fix for this. This has been tried out for a
              > while now, thus I think it's safe to include in Vim 6.3. Please try
              > out this patch. If it works OK for you then I'll release it.

              Yes, your patch works like a charm. Thanks, Bram!

              Best regards,

              Yongwei
            Your message has been successfully submitted and would be delivered to recipients shortly.