Loading ...
Sorry, an error occurred while loading the content.

Re: Woe with MBCS File Names in UTF-8 Mode on Windows

Expand Messages
  • adah@netstd.com
    ... I have finally found out the reason. The cause is the _fullpath (which finally calls GetFullPathNameA) in mch_FullName. It is quite normal that the
    Message 1 of 13 , Jul 1, 2005
    • 0 Attachment
      Bram wrote:
      >
      > Yongwei wrote:
      >
      > > > > I did a trace into Vim, and I found that it was because the `9c'
      > > > > of e7829c (炜) had been lost before mch_open is called. Could
      > > > > this give you a clue? Or give me a guidance where I should
      > > > > investigate further?
      > > >
      > > > I would guess that somewhere in the code the DBCS codepage is used
      > > > to locate the character, instead of using it as UTF-8. Since I
      > > > don't have a DBCS system, I can't try this.
      > > >
      > > > If you are able to see what happens in a debugger then you should
      > > > be able to follow the route from typing the command to the
      > > > mch_open() call.
      > >
      > > Since I was tracing mch_open out (not from outside in), I soon lost
      > > my way. And I was not familiar with the Vim code organization.
      > > That is the reason why I asked for guidance. I need a starting
      > > point to trace (where `:w file.txt' is really executed).
      >
      > You can step out of mch_open() to see what happened in the calling
      > function.
      >
      > If you need to step through the code that leads to opening the file
      > you might want to put a breakpoint in open_buffer(). Check that
      > curbuf->b_ffname is right. The file reading is done in readfile().

      I have finally found out the reason. The cause is the _fullpath (which
      finally calls GetFullPathNameA) in mch_FullName. It is quite normal
      that the non-Unicode Win32 API requires that file names should be
      provided in native encoding.

      Non-DBCS-system users generally will not feel the problem since valid
      UTF-8 code points are generally valid SBCS (say, Latin1) code points,
      and 炜.txt will be regarded as code points |e7 82 9c 2e 74 78 74|. On
      DBCS systems, |9c2e| is invalid and will become `?' (|3f|).

      To solve this problem, maybe Vim needs to provide its own verion of
      fullpath? Bram, what is your opinion?

      Best regards,

      Yongwei
    • Bram Moolenaar
      ... I m glad you were able to isolate the problem. Vim 7 already included a fix for this. This has been tried out for a while now, thus I think it s safe to
      Message 2 of 13 , Jul 2, 2005
      • 0 Attachment
        Yongwei wrote:

        > I have finally found out the reason. The cause is the _fullpath (which
        > finally calls GetFullPathNameA) in mch_FullName. It is quite normal
        > that the non-Unicode Win32 API requires that file names should be
        > provided in native encoding.
        >
        > Non-DBCS-system users generally will not feel the problem since valid
        > UTF-8 code points are generally valid SBCS (say, Latin1) code points,
        > and ì¿.txt will be regarded as code points |e7 82 9c 2e 74 78 74|. On
        > DBCS systems, |9c2e| is invalid and will become `?' (|3f|).
        >
        > To solve this problem, maybe Vim needs to provide its own verion of
        > fullpath? Bram, what is your opinion?

        I'm glad you were able to isolate the problem.

        Vim 7 already included a fix for this. This has been tried out for a
        while now, thus I think it's safe to include in Vim 6.3. Please try out
        this patch. If it works OK for you then I'll release it.

        *** os_mswin.c~ Sun Dec 5 16:39:37 2004
        --- os_mswin.c Sat Jul 2 13:07:35 2005
        ***************
        *** 367,385 ****
        nResult = mch_dirname(buf, len);
        else
        #endif
        - if (_fullpath(buf, fname, len - 1) == NULL)
        {
        ! STRNCPY(buf, fname, len); /* failed, use the relative path name */
        ! buf[len - 1] = NUL;
        ! #ifndef USE_FNAME_CASE
        ! slash_adjust(buf);
        #endif
        }
        - else
        - nResult = OK;

        #ifdef USE_FNAME_CASE
        fname_case(buf, len);
        #endif

        return nResult;
        --- 367,421 ----
        nResult = mch_dirname(buf, len);
        else
        #endif
        {
        ! #ifdef FEAT_MBYTE
        ! if (enc_codepage >= 0 && (int)GetACP() != enc_codepage
        ! # ifdef __BORLANDC__
        ! /* Wide functions of Borland C 5.5 do not work on Windows 98. */
        ! && g_PlatformId == VER_PLATFORM_WIN32_NT
        ! # endif
        ! )
        ! {
        ! WCHAR *wname;
        ! WCHAR wbuf[MAX_PATH];
        ! char_u *cname = NULL;
        !
        ! /* Use the wide function:
        ! * - convert the fname from 'encoding' to UCS2.
        ! * - invoke _wfullpath()
        ! * - convert the result from UCS2 to 'encoding'.
        ! */
        ! wname = enc_to_ucs2(fname, NULL);
        ! if (wname != NULL && _wfullpath(wbuf, wname, MAX_PATH - 1) != NULL)
        ! {
        ! cname = ucs2_to_enc((short_u *)wbuf, NULL);
        ! if (cname != NULL)
        ! {
        ! STRNCPY(buf, cname, len);
        ! buf[len - 1] = NUL;
        ! nResult = OK;
        ! }
        ! }
        ! vim_free(wname);
        ! vim_free(cname);
        ! }
        ! if (nResult == FAIL) /* fall back to non-wide function */
        #endif
        + {
        + if (_fullpath(buf, fname, len - 1) == NULL)
        + {
        + STRNCPY(buf, fname, len); /* failed, use relative path name */
        + buf[len - 1] = NUL;
        + }
        + else
        + nResult = OK;
        + }
        }

        #ifdef USE_FNAME_CASE
        fname_case(buf, len);
        + #else
        + slash_adjust(buf);
        #endif

        return nResult;

        --
        hundred-and-one symptoms of being an internet addict:
        210. When you get a divorce, you don't care about who gets the children,
        but discuss endlessly who can use the email address.

        /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
        /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
        \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
        \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html ///
      • adah@netstd.com
        ... Yes, your patch works like a charm. Thanks, Bram! Best regards, Yongwei
        Message 3 of 13 , Jul 3, 2005
        • 0 Attachment
          Bram wrote:
          >
          > Yongwei wrote:
          >
          > > I have finally found out the reason. The cause is the _fullpath
          > > (which finally calls GetFullPathNameA) in mch_FullName. It is quite
          > > normal that the non-Unicode Win32 API requires that file names
          > > should be provided in native encoding.
          > >
          > > Non-DBCS-system users generally will not feel the problem since
          > > valid UTF-8 code points are generally valid SBCS (say, Latin1) code
          > > points, and 炜.txt will be regarded as code points |e7 82 9c 2e 74
          > > 78 74|. On DBCS systems, |9c2e| is invalid and will become `?'
          > > (|3f|).
          > >
          > > To solve this problem, maybe Vim needs to provide its own verion of
          > > fullpath? Bram, what is your opinion?
          >
          > I'm glad you were able to isolate the problem.
          >
          > Vim 7 already included a fix for this. This has been tried out for a
          > while now, thus I think it's safe to include in Vim 6.3. Please try
          > out this patch. If it works OK for you then I'll release it.

          Yes, your patch works like a charm. Thanks, Bram!

          Best regards,

          Yongwei
        Your message has been successfully submitted and would be delivered to recipients shortly.