100009Re: possible to make iskeyword supports multibyte charactor?
- Jan 4, 2009On 04/01/09 07:53, pansz wrote:
> Tony Mechelynck 写道:No, but each hanzi (not fullwidth punct) is supposed to be a "word" or
>> I'm not sure. I suppose that option was defined before Unicode became
>> well-known, maybe even before it existed, when most charsets were of the
>> 8-bit kind except for East-Asian scripts, which required "special" MBCS
>> versions of the OSes anyway (such as MS-DOS 2.25).
>> Once the Unicode standard was published, it included not only mappings
>> of codepoints to glyphs but also quite a lot of metadata about these
>> codepoints (such as wide vs. narrow vs. ambiguous, LTR vs. RTL vs.
>> ambiguous, lower/ upper/ titlecase, punctuation, number systems, etc.).
>> However, Vim versions with -multi_byte must still be supported, and they
>> don't have access to that wealth of meta-information. Also, IIUC it's in
>> the ASCII range that there is most variation between programming
>> languages, operating systems, human languages, etc. concerning which
>> characters may be used in which circumstances.
> Human languages of CJK are not in the ASCII range at all and I bet CJK
> have more than 30% of the world population. Vim is for programmers, is
> it _only_ for programmers?
"word part" of some kind, with punctuation, whitespace and diacritics
all totally outside the "word" range. "Not" is a word in English,
regardless of whether it's used alone or in "cannot" or
"notwithstanding". These two uses sound almost Chinese-like to me... who
don't really know more than a handful of Chinese words. I suppose that
if English, like Japanese, used Han-script, "notwithstanding" might be
written not-against-stay-now with four glyphs? But I'm daydreaming.
>A more important difficulty is that 'iskeyword' applies only to Unicode
> The difficulties may be that 'iskeyword' is a whitelist, not a
> blacklist, we cannot easily blacklist a single Unicode character in
> 'iskeyword' without knowing *all* the Unicode characters which matches
codepoints U+0000 to U+007F when 'encoding' is UTF-8 (or any Unicode
value aliased to UTF-8 for internal memory), and to characters 0x00 to
0xFF when it isn't. Otherwise we might perhaps use ":setlocal isk-=不
isk-=之" or some such. This would also mean several arrays of 2 gigabits
rather than 256 bits to remember the settings (Vim treats the Unicode
range as 0 to 7FFFFFFF. Even if it limited itself to the current
official maximum of 10FFFD it would still mean a big increase.)
>Hm. Don't know if Bram would accept that, but you can always try to
> Perhaps the simplest approach is to add an option 'isnkeyword' which
> supports any Unicode character and we can blacklist some Unicode
> characters while still retain the 'iskeyword' option functioning.
publish (and maintain) an unofficial patch to the C source. Don't know
how easy (and foolproof) it would be. For a single option, a has()
feature might be useful but it's less needed than for a whole batch of
them: we would always be able to test ":if exists('+isnkeyword')".
A truly wise man never plays leapfrog with a unicorn.
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
- << Previous post in topic Next post in topic >>