Loading ...
Sorry, an error occurred while loading the content.

100009Re: possible to make iskeyword supports multibyte charactor?

Expand Messages
  • Tony Mechelynck
    Jan 4, 2009
    • 0 Attachment
      On 04/01/09 07:53, pansz wrote:
      > Tony Mechelynck 写道:
      >> I'm not sure. I suppose that option was defined before Unicode became
      >> well-known, maybe even before it existed, when most charsets were of the
      >> 8-bit kind except for East-Asian scripts, which required "special" MBCS
      >> versions of the OSes anyway (such as MS-DOS 2.25).
      >> Once the Unicode standard was published, it included not only mappings
      >> of codepoints to glyphs but also quite a lot of metadata about these
      >> codepoints (such as wide vs. narrow vs. ambiguous, LTR vs. RTL vs.
      >> ambiguous, lower/ upper/ titlecase, punctuation, number systems, etc.).
      >> However, Vim versions with -multi_byte must still be supported, and they
      >> don't have access to that wealth of meta-information. Also, IIUC it's in
      >> the ASCII range that there is most variation between programming
      >> languages, operating systems, human languages, etc. concerning which
      >> characters may be used in which circumstances.
      > Human languages of CJK are not in the ASCII range at all and I bet CJK
      > have more than 30% of the world population. Vim is for programmers, is
      > it _only_ for programmers?

      No, but each hanzi (not fullwidth punct) is supposed to be a "word" or
      "word part" of some kind, with punctuation, whitespace and diacritics
      all totally outside the "word" range. "Not" is a word in English,
      regardless of whether it's used alone or in "cannot" or
      "notwithstanding". These two uses sound almost Chinese-like to me... who
      don't really know more than a handful of Chinese words. I suppose that
      if English, like Japanese, used Han-script, "notwithstanding" might be
      written not-against-stay-now with four glyphs? But I'm daydreaming.

      > The difficulties may be that 'iskeyword' is a whitelist, not a
      > blacklist, we cannot easily blacklist a single Unicode character in
      > 'iskeyword' without knowing *all* the Unicode characters which matches
      > iswalpha().

      A more important difficulty is that 'iskeyword' applies only to Unicode
      codepoints U+0000 to U+007F when 'encoding' is UTF-8 (or any Unicode
      value aliased to UTF-8 for internal memory), and to characters 0x00 to
      0xFF when it isn't. Otherwise we might perhaps use ":setlocal isk-=不
      isk-=之" or some such. This would also mean several arrays of 2 gigabits
      rather than 256 bits to remember the settings (Vim treats the Unicode
      range as 0 to 7FFFFFFF. Even if it limited itself to the current
      official maximum of 10FFFD it would still mean a big increase.)

      > Perhaps the simplest approach is to add an option 'isnkeyword' which
      > supports any Unicode character and we can blacklist some Unicode
      > characters while still retain the 'iskeyword' option functioning.

      Hm. Don't know if Bram would accept that, but you can always try to
      publish (and maintain) an unofficial patch to the C source. Don't know
      how easy (and foolproof) it would be. For a single option, a has()
      feature might be useful but it's less needed than for a whole batch of
      them: we would always be able to test ":if exists('+isnkeyword')".

      Best regards,
      A truly wise man never plays leapfrog with a unicorn.

      You received this message from the "vim_use" maillist.
      For more information, visit http://www.vim.org/maillist.php
    • Show all 17 messages in this topic