Re: Vim multibyte support for non-utf8 encodings
- Hye-Shik Chang wrote:
> The current version of vim doesn't handle non-utf8 multibyte encodingsThe behavior of mblen() on various systems has always been a bit unclear
> such as EUC and/or GBK in FreeBSD. Cursor moves around weird places
> inside a character and the last character on each lines disappears
> This problem is due to vim's dependency to undefined behavior of
> mblen(3). Looking vim's source code mbyte.c:653, the routine assumes
> that mblen(3) isn't stateful. On glibc or Solaris libc, mblen(3)
> does not change the internal state when EILSEQ or EINVAL is occurred.
> But FreeBSD libc changes the internal state even when it meets an
> error. The mblen(3) behavior is undefined in POSIX  and none
> of each libc implementations are wrong. So I think it's required
> to reset multibyte states before a mblen(3) call to work the routine
> free from implementation.
> My patch is attached.
>  http://www.opengroup.org/onlinepubs/009695399/functions/mblen.html
> --- mbyte.c.orig Fri Apr 23 17:44:36 2004
> +++ mbyte.c Thu May 12 08:48:35 2005
> @@ -650,6 +650,7 @@
> * where mblen() returns 0 for invalid character.
> * Therefore, following condition includes 0.
> + (void)mblen(NULL, 0);
> if (mblen(buf, (size_t)1) <= 0)
> n = 2;
to me. Your remark makes a lot of sense, but I wonder why nobody had
this problem before.
I'll include this now in Vim 7 and await further comments. Hopefully
there is no mblen() implementation that crashes when invoked with a NULL
CUSTOMER: Well, can you hang around a couple of minutes? He won't be
MORTICIAN: Naaah, I got to go on to Robinson's -- they've lost nine today.
CUSTOMER: Well, when is your next round?
DEAD PERSON: I think I'll go for a walk.
The Quest for the Holy Grail (Monty Python)
/// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
/// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ Project leader for A-A-P -- http://www.A-A-P.org ///
\\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html ///
- On Sat, Jul 16, 2005 at 12:44:34PM +0200, Bram Moolenaar wrote:
> Hye-Shik Chang wrote:
> > The current version of vim doesn't handle non-utf8 multibyte encodings
> > such as EUC and/or GBK in FreeBSD. Cursor moves around weird places
> > inside a character and the last character on each lines disappears
> > sometimes.
>In fact, many of Japanese FreeBSD users seems to have been suffered
> The behavior of mblen() on various systems has always been a bit unclear
> to me. Your remark makes a lot of sense, but I wonder why nobody had
> this problem before.
from the problem:
(even if you can't read japanese, you still can discover some
alphabets on the page. :)
I didn't aware of the problem because I'm using UTF-8 locale, but
few friends of mine asked a help to me.
> I'll include this now in Vim 7 and await further comments. HopefullyThanks for applying the fix! I think the fix will not harm any
> there is no mblen() implementation that crashes when invoked with a NULL
platform. mblen(NULL, 0); is clearly defined in POSIX as a reset