Loading ...
Sorry, an error occurred while loading the content.

Re: Vim multibyte support for non-utf8 encodings

Expand Messages
  • Bram Moolenaar
    ... The behavior of mblen() on various systems has always been a bit unclear to me. Your remark makes a lot of sense, but I wonder why nobody had this problem
    Message 1 of 3 , Jul 16, 2005
    • 0 Attachment
      Hye-Shik Chang wrote:

      > The current version of vim doesn't handle non-utf8 multibyte encodings
      > such as EUC and/or GBK in FreeBSD. Cursor moves around weird places
      > inside a character and the last character on each lines disappears
      > sometimes.
      >
      > This problem is due to vim's dependency to undefined behavior of
      > mblen(3). Looking vim's source code mbyte.c:653, the routine assumes
      > that mblen(3) isn't stateful. On glibc or Solaris libc, mblen(3)
      > does not change the internal state when EILSEQ or EINVAL is occurred.
      > But FreeBSD libc changes the internal state even when it meets an
      > error. The mblen(3) behavior is undefined in POSIX [1] and none
      > of each libc implementations are wrong. So I think it's required
      > to reset multibyte states before a mblen(3) call to work the routine
      > free from implementation.
      >
      > My patch is attached.
      >
      > [1] http://www.opengroup.org/onlinepubs/009695399/functions/mblen.html

      > --- mbyte.c.orig Fri Apr 23 17:44:36 2004
      > +++ mbyte.c Thu May 12 08:48:35 2005
      > @@ -650,6 +650,7 @@
      > * where mblen() returns 0 for invalid character.
      > * Therefore, following condition includes 0.
      > */
      > + (void)mblen(NULL, 0);
      > if (mblen(buf, (size_t)1) <= 0)
      > n = 2;
      > else

      The behavior of mblen() on various systems has always been a bit unclear
      to me. Your remark makes a lot of sense, but I wonder why nobody had
      this problem before.

      I'll include this now in Vim 7 and await further comments. Hopefully
      there is no mblen() implementation that crashes when invoked with a NULL
      pointer.

      --
      CUSTOMER: Well, can you hang around a couple of minutes? He won't be
      long.
      MORTICIAN: Naaah, I got to go on to Robinson's -- they've lost nine today.
      CUSTOMER: Well, when is your next round?
      MORTICIAN: Thursday.
      DEAD PERSON: I think I'll go for a walk.
      The Quest for the Holy Grail (Monty Python)

      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
      /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
      \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
      \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html ///
    • Hye-Shik Chang
      ... [snip] ... In fact, many of Japanese FreeBSD users seems to have been suffered from the problem:
      Message 2 of 3 , Jul 16, 2005
      • 0 Attachment
        On Sat, Jul 16, 2005 at 12:44:34PM +0200, Bram Moolenaar wrote:
        >
        > Hye-Shik Chang wrote:
        >
        > > The current version of vim doesn't handle non-utf8 multibyte encodings
        > > such as EUC and/or GBK in FreeBSD. Cursor moves around weird places
        > > inside a character and the last character on each lines disappears
        > > sometimes.
        [snip]
        >
        > The behavior of mblen() on various systems has always been a bit unclear
        > to me. Your remark makes a lot of sense, but I wonder why nobody had
        > this problem before.
        >

        In fact, many of Japanese FreeBSD users seems to have been suffered
        from the problem:

        http://www.queen.ne.jp/iMA/showmdir.pl?ports-jp=Current&num=14694&link=20040430015955%2eGA52106%25st%40be%2eto
        (even if you can't read japanese, you still can discover some
        alphabets on the page. :)

        I didn't aware of the problem because I'm using UTF-8 locale, but
        few friends of mine asked a help to me.

        > I'll include this now in Vim 7 and await further comments. Hopefully
        > there is no mblen() implementation that crashes when invoked with a NULL
        > pointer.

        Thanks for applying the fix! I think the fix will not harm any
        platform. mblen(NULL, 0); is clearly defined in POSIX as a reset
        method.


        Thanks,
        Hye-Shik
      Your message has been successfully submitted and would be delivered to recipients shortly.