Loading ...
Sorry, an error occurred while loading the content.

99985Re: possible to make iskeyword supports multibyte charactor?

Expand Messages
  • Tony Mechelynck
    Jan 3, 2009
      On 04/01/09 04:07, pansz wrote:
      > Tony Mechelynck 写道:
      >> For the meaning of its settings, ":help 'iskeyword'" resends to ":help
      >> 'isfname'" where it is said:
      >>> Multi-byte characters 256 and above are always included, only the
      >>> characters up to 255 are specified with this option.
      >>> For UTF-8 the characters 0xa0 to 0xff are included as well.
      >> IOW it is not possible to treat some hanzi as 'iskeyword' characters and
      >> others not. I think the above means that even the "ideographic
      >> full-width space" U+3000 is treated as a keyword character, OTOH I
      >> wouldn't affirm this without an experiment (maybe Vim with +multi_byte
      >> knows about the main divisions of the Unicode codepoint range).
      > This seems to hint vim is not using the standard iswalpha(), iswpunct()
      > series widechar-type-check functions in<wctypes.h>.
      > As far as I know the iswalpha() returns true only on true hanzi
      > characters and will not return true on characters such as "ideographic
      > full-width space".
      > I guess this is a choice for efficiency if vim uses utf-8 internally,
      > since utf-8 must be converted to ucs in order to use wctypes.
      > If that is the case, making iskeyword supports multibyte character isn't
      > hard (I had done similar things for Lua script language), but will
      > sacrifice performance.

      If you want to be sure, try some Chinese text with both hanzi and
      wide-punctuation and see where the yiw (yank inner word) or viw (visual
      inner word) stops. Here's a sample for you: 道可道、非常道。名可名、非常
      名。 ;-)

      In my Huge gvim 7.2.077 with +multi_byte, viw includes neither
      ideographic comma nor ideographic full stop; but AFAIK there's no way to
      tell vim that 不 "not", 故 "thus", 之 "'s" etc. are non-keyword
      characters, since for multibyte characters this kind of status is hardcoded.

      Best regards,
      TV is chewing gum for the eyes.
      -- Frank Lloyd Wright

      You received this message from the "vim_use" maillist.
      For more information, visit http://www.vim.org/maillist.php
    • Show all 17 messages in this topic