Loading ...
Sorry, an error occurred while loading the content.

Re: possible to make iskeyword supports multibyte charactor?

Expand Messages
  • StarWing
    okay£¬i think Vim can developed to support word-table when ^n^p and fFtT, e.g: chinese.vim this is a chinese word-table file: beginwordtable chinese
    Message 1 of 17 , Jan 2, 2009
    • 0 Attachment
      okay,i think Vim can developed to support word-table when ^n^p and
      fFtT, e.g:
      chinese.vim " this is a chinese word-table file:
      beginwordtable chinese <<EOF "this is a command
      this leisi
      an an
      apple aipo
      and ande
      other ade
      is yizi
      or wuwo
      EOF

      " first col is word of chinese, second col is the word in pinyin
      (sound) or input-method.

      and we can set a option named "wordtable": set wordtable=chinese

      so, if i input: thisisanappleandother, Vim will know "this" is a word,
      and so as "is", "an", "apple", etc.
      and, if i want find an, i press "fa" or "fan", the cursor will goto
      "an", or "and", etc, i think it's no easy to make f operator to
      support multi-input(Vim don't know how many word will be input, so,
      vim can't transaction at once).

      can bram or anyone could implement it? and this function can conbime
      with the spell-check function of Vim.

      On 1月2日, 下午9时39分, Tony Mechelynck <antoine.mechely...@...>
      wrote:
      > On 02/01/09 11:30, anhnmncb wrote:
      >
      > > Ping!
      >
      > If you don't get a reply on this ML, the meaning usually is not that
      > nobody saw the question, but rather that nobody knows the answer. Search
      > the help first, then try to make your question clearer if the help
      > doesn't give you an answer (in this case it does, see below).
      >
      >
      >
      >
      >
      > > On 2008-12-31, anhnmncb wrote:
      > >> On 2008-12-31, anhnmncb wrote:
      > >>> Hi, list,
      >
      > >>> when I type Chinese text in vim, I find it's unconvenient for completing
      > >>> Chinese word with C-p/n, because a Chinese word is not seperated by space but
      > >>> some charactors like "and", "or" and others(I use English to reprent a Chinese
      > >>> charactor), so a Chinese sententce will like this:
      >
      > >>> ThisIsAChineseWordInSentence.(This is a Chinese word in sentence.)
      >
      > >>> When I have typed "ThisIsAChineseWordIn", now if I want to type Sen<C-p> then
      > >>> vim can't complete word "Sentence" for me. So I think if iskeyword supports
      > >>> adding Chinese charactor to itself, for example(My client doesn't support
      > >>> Chinese, so I use "and" to represent a Chinese charactor):
      >
      > >>> set iskeyword+="and"
      > >> I meant set iskeyword-="and".
      > >>> then autocompletion will be without problem with Chinese. I don't know if it
      > >>> is easy to handle?
      > >> Also, it will let me can navigate quicker in a long Chinese sentence, now I
      > >> have to use /? or fFtT or some hjkls then input a Chinese charactor(sometimes
      > >> To input a Chinese charactor needs to type at least 3 english charactor).
      >
      > For the meaning of its settings, ":help 'iskeyword'" resends to ":help
      > 'isfname'" where it is said:
      >
      > > Multi-byte characters 256 and above are always included, only the
      > > characters up to 255 are specified with this option.
      > > For UTF-8 the characters 0xa0 to 0xff are included as well.
      >
      > IOW it is not possible to treat some hanzi as 'iskeyword' characters and
      > others not. I think the above means that even the "ideographic
      > full-width space" U+3000 is treated as a keyword character, OTOH I
      > wouldn't affirm this without an experiment (maybe Vim with +multi_byte
      > knows about the main divisions of the Unicode codepoint range).
      >
      > Since I found no satisfactory way to use the IM (which _is_ installed on
      > my system), I need at least 6 keystrokes to input any hanzi: for
      > instance, for the simplest of them all, the digit one, 一 yi1 U+4E00, I
      > need (after getting into Insert mode) to press Ctrl-V u 4 e 0 0 -- or
      > else, I can use copy-paste if I can find it ready-made in some document.
      >
      > Best regards,
      > Tony.
      > --
      > Paradise is exactly like where you are right now ... only much, much
      > better.
      > -- Laurie Anderson
      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_use" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • StarWing
      or anyone can make developer know this? ... --~--~---------~--~----~------------~-------~--~----~ You received this message from the vim_use maillist. For
      Message 2 of 17 , Jan 2, 2009
      • 0 Attachment
        or anyone can make developer know this?

        On 1月2日, 下午10时29分, StarWing <weasley...@...> wrote:
        > okay,i think Vim can developed to support word-table when ^n^p and
        > fFtT, e.g:
        > chinese.vim " this is a chinese word-table file:
        > beginwordtable chinese <<EOF "this is a command
        > this leisi
        > an an
        > apple aipo
        > and ande
        > other ade
        > is yizi
        > or wuwo
        > EOF
        >
        > " first col is word of chinese, second col is the word in pinyin
        > (sound) or input-method.
        >
        > and we can set a option named "wordtable": set wordtable=chinese
        >
        > so, if i input: thisisanappleandother, Vim will know "this" is a word,
        > and so as "is", "an", "apple", etc.
        > and, if i want find an, i press "fa" or "fan", the cursor will goto
        > "an", or "and", etc, i think it's no easy to make f operator to
        > support multi-input(Vim don't know how many word will be input, so,
        > vim can't transaction at once).
        >
        > can bram or anyone could implement it? and this function can conbime
        > with the spell-check function of Vim.
        >
        > On 1月2日, 下午9时39分, Tony Mechelynck <antoine.mechely...@...>
        > wrote:
        >
        > > On 02/01/09 11:30, anhnmncb wrote:
        >
        > > > Ping!
        >
        > > If you don't get a reply on this ML, the meaning usually is not that
        > > nobody saw the question, but rather that nobody knows the answer. Search
        > > the help first, then try to make your question clearer if the help
        > > doesn't give you an answer (in this case it does, see below).
        >
        > > > On 2008-12-31, anhnmncb wrote:
        > > >> On 2008-12-31, anhnmncb wrote:
        > > >>> Hi, list,
        >
        > > >>> when I type Chinese text in vim, I find it's unconvenient for completing
        > > >>> Chinese word with C-p/n, because a Chinese word is not seperated by space but
        > > >>> some charactors like "and", "or" and others(I use English to reprent a Chinese
        > > >>> charactor), so a Chinese sententce will like this:
        >
        > > >>> ThisIsAChineseWordInSentence.(This is a Chinese word in sentence.)
        >
        > > >>> When I have typed "ThisIsAChineseWordIn", now if I want to type Sen<C-p> then
        > > >>> vim can't complete word "Sentence" for me. So I think if iskeyword supports
        > > >>> adding Chinese charactor to itself, for example(My client doesn't support
        > > >>> Chinese, so I use "and" to represent a Chinese charactor):
        >
        > > >>> set iskeyword+="and"
        > > >> I meant set iskeyword-="and".
        > > >>> then autocompletion will be without problem with Chinese. I don't know if it
        > > >>> is easy to handle?
        > > >> Also, it will let me can navigate quicker in a long Chinese sentence, now I
        > > >> have to use /? or fFtT or some hjkls then input a Chinese charactor(sometimes
        > > >> To input a Chinese charactor needs to type at least 3 english charactor).
        >
        > > For the meaning of its settings, ":help 'iskeyword'" resends to ":help
        > > 'isfname'" where it is said:
        >
        > > > Multi-byte characters 256 and above are always included, only the
        > > > characters up to 255 are specified with this option.
        > > > For UTF-8 the characters 0xa0 to 0xff are included as well.
        >
        > > IOW it is not possible to treat some hanzi as 'iskeyword' characters and
        > > others not. I think the above means that even the "ideographic
        > > full-width space" U+3000 is treated as a keyword character, OTOH I
        > > wouldn't affirm this without an experiment (maybe Vim with +multi_byte
        > > knows about the main divisions of the Unicode codepoint range).
        >
        > > Since I found no satisfactory way to use the IM (which _is_ installed on
        > > my system), I need at least 6 keystrokes to input any hanzi: for
        > > instance, for the simplest of them all, the digit one, 一 yi1 U+4E00, I
        > > need (after getting into Insert mode) to press Ctrl-V u 4 e 0 0 -- or
        > > else, I can use copy-paste if I can find it ready-made in some document.
        >
        > > Best regards,
        > > Tony.
        > > --
        > > Paradise is exactly like where you are right now ... only much, much
        > > better.
        > > -- Laurie Anderson
        >
        >
        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_use" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Tony Mechelynck
        ... [...] AFAIK, most Vim developers read not only the vim_dev group but also the vim_use group. Best regards, Tony. -- The trouble with doing something right
        Message 3 of 17 , Jan 2, 2009
        • 0 Attachment
          On 02/01/09 15:32, StarWing wrote:
          > or anyone can make developer know this?
          [...]

          AFAIK, most Vim developers read not only the vim_dev group but also the
          vim_use group.

          Best regards,
          Tony.
          --
          The trouble with doing something right the first time is that nobody
          appreciates how difficult it was.

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_use" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • pansz
          ... This seems to hint vim is not using the standard iswalpha(), iswpunct() series widechar-type-check functions in . As far as I know the
          Message 4 of 17 , Jan 3, 2009
          • 0 Attachment
            Tony Mechelynck 写道:
            > For the meaning of its settings, ":help 'iskeyword'" resends to ":help
            > 'isfname'" where it is said:
            >
            >> Multi-byte characters 256 and above are always included, only the
            >> characters up to 255 are specified with this option.
            >> For UTF-8 the characters 0xa0 to 0xff are included as well.
            >
            > IOW it is not possible to treat some hanzi as 'iskeyword' characters and
            > others not. I think the above means that even the "ideographic
            > full-width space" U+3000 is treated as a keyword character, OTOH I
            > wouldn't affirm this without an experiment (maybe Vim with +multi_byte
            > knows about the main divisions of the Unicode codepoint range).

            This seems to hint vim is not using the standard iswalpha(), iswpunct()
            series widechar-type-check functions in <wctypes.h>.

            As far as I know the iswalpha() returns true only on true hanzi
            characters and will not return true on characters such as "ideographic
            full-width space".

            I guess this is a choice for efficiency if vim uses utf-8 internally,
            since utf-8 must be converted to ucs in order to use wctypes.

            If that is the case, making iskeyword supports multibyte character isn't
            hard (I had done similar things for Lua script language), but will
            sacrifice performance.


            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_use" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • Tony Mechelynck
            ... If you want to be sure, try some Chinese text with both hanzi and wide-punctuation and see where the yiw (yank inner word) or viw (visual inner word)
            Message 5 of 17 , Jan 3, 2009
            • 0 Attachment
              On 04/01/09 04:07, pansz wrote:
              > Tony Mechelynck 写道:
              >> For the meaning of its settings, ":help 'iskeyword'" resends to ":help
              >> 'isfname'" where it is said:
              >>
              >>> Multi-byte characters 256 and above are always included, only the
              >>> characters up to 255 are specified with this option.
              >>> For UTF-8 the characters 0xa0 to 0xff are included as well.
              >> IOW it is not possible to treat some hanzi as 'iskeyword' characters and
              >> others not. I think the above means that even the "ideographic
              >> full-width space" U+3000 is treated as a keyword character, OTOH I
              >> wouldn't affirm this without an experiment (maybe Vim with +multi_byte
              >> knows about the main divisions of the Unicode codepoint range).
              >
              > This seems to hint vim is not using the standard iswalpha(), iswpunct()
              > series widechar-type-check functions in<wctypes.h>.
              >
              > As far as I know the iswalpha() returns true only on true hanzi
              > characters and will not return true on characters such as "ideographic
              > full-width space".
              >
              > I guess this is a choice for efficiency if vim uses utf-8 internally,
              > since utf-8 must be converted to ucs in order to use wctypes.
              >
              > If that is the case, making iskeyword supports multibyte character isn't
              > hard (I had done similar things for Lua script language), but will
              > sacrifice performance.

              If you want to be sure, try some Chinese text with both hanzi and
              wide-punctuation and see where the yiw (yank inner word) or viw (visual
              inner word) stops. Here's a sample for you: 道可道、非常道。名可名、非常
              名。 ;-)

              In my Huge gvim 7.2.077 with +multi_byte, viw includes neither
              ideographic comma nor ideographic full stop; but AFAIK there's no way to
              tell vim that 不 "not", 故 "thus", 之 "'s" etc. are non-keyword
              characters, since for multibyte characters this kind of status is hardcoded.


              Best regards,
              Tony.
              --
              TV is chewing gum for the eyes.
              -- Frank Lloyd Wright

              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_use" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • Sean
              ... It only takes three keystrokes (yi ) to type your example, 一, using my newly-created IME at
              Message 6 of 17 , Jan 3, 2009
              • 0 Attachment
                > Since I found no satisfactory way to use the IM (which _is_
                > installed on my system), I need at least 6 keystrokes to input any
                > hanzi: for instance, for the simplest of them all, the digit one,
                > 一 yi1 U+4E00, I need (after getting into Insert mode) to press
                > Ctrl-V u 4 e 0 0 -- or else, I can use copy-paste if I can find it
                > ready-made in some document.

                It only takes three keystrokes (yi<C-I>) to type your example, 一,
                using my newly-created IME at
                http://vim.sourceforge.net/scripts/script.php?script_id=2506

                Welcome to vim built-in IME :))

                Sean


                On Jan 2, 5:39 am, Tony Mechelynck <antoine.mechely...@...>
                wrote:
                > On 02/01/09 11:30, anhnmncb wrote:
                >
                > > Ping!
                >
                > If you don't get a reply on this ML, the meaning usually is not that
                > nobody saw the question, but rather that nobody knows the answer. Search
                > the help first, then try to make your question clearer if the help
                > doesn't give you an answer (in this case it does, see below).
                >
                >
                >
                >
                >
                > > On 2008-12-31, anhnmncb wrote:
                > >> On 2008-12-31, anhnmncb wrote:
                > >>> Hi, list,
                >
                > >>> when I type Chinese text in vim, I find it's unconvenient for completing
                > >>> Chinese word with C-p/n, because a Chinese word is not seperated by space but
                > >>> some charactors like "and", "or" and others(I use English to reprent a Chinese
                > >>> charactor), so a Chinese sententce will like this:
                >
                > >>> ThisIsAChineseWordInSentence.(This is a Chinese word in sentence.)
                >
                > >>> When I have typed "ThisIsAChineseWordIn", now if I want to type Sen<C-p> then
                > >>> vim can't complete word "Sentence" for me. So I think if iskeyword supports
                > >>> adding Chinese charactor to itself, for example(My client doesn't support
                > >>> Chinese, so I use "and" to represent a Chinese charactor):
                >
                > >>> set iskeyword+="and"
                > >> I meant set iskeyword-="and".
                > >>> then autocompletion will be without problem with Chinese. I don't know if it
                > >>> is easy to handle?
                > >> Also, it will let me can navigate quicker in a long Chinese sentence, now I
                > >> have to use /? or fFtT or some hjkls then input a Chinese charactor(sometimes
                > >> To input a Chinese charactor needs to type at least 3 english charactor).
                >
                > For the meaning of its settings, ":help 'iskeyword'" resends to ":help
                > 'isfname'" where it is said:
                >
                > > Multi-byte characters 256 and above are always included, only the
                > > characters up to 255 are specified with this option.
                > > For UTF-8 the characters 0xa0 to 0xff are included as well.
                >
                > IOW it is not possible to treat some hanzi as 'iskeyword' characters and
                > others not. I think the above means that even the "ideographic
                > full-width space" U+3000 is treated as a keyword character, OTOH I
                > wouldn't affirm this without an experiment (maybe Vim with +multi_byte
                > knows about the main divisions of the Unicode codepoint range).
                >
                > Since I found no satisfactory way to use the IM (which _is_ installed on
                > my system), I need at least 6 keystrokes to input any hanzi: for
                > instance, for the simplest of them all, the digit one, 一 yi1 U+4E00, I
                > need (after getting into Insert mode) to press Ctrl-V u 4 e 0 0 -- or
                > else, I can use copy-paste if I can find it ready-made in some document.
                >
                > Best regards,
                > Tony.
                > --
                > Paradise is exactly like where you are right now ... only much, much
                > better.
                > -- Laurie Anderson
                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_use" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---
              • pansz
                ... Interesting, I see the wide punctuation characters are recognized, so vim is using wide character internally, and omitting some particular wide-character
                Message 7 of 17 , Jan 3, 2009
                • 0 Attachment
                  Tony Mechelynck 写道:
                  > If you want to be sure, try some Chinese text with both hanzi and
                  > wide-punctuation and see where the yiw (yank inner word) or viw (visual
                  > inner word) stops. Here's a sample for you: 道可道、非常道。名可名、非常
                  > 名。 ;-)

                  Interesting, I see the wide punctuation characters are recognized, so
                  vim is using wide character internally, and omitting some particular
                  wide-character from 'iskeyword' shouldn't be hard.

                  Then why the 'iskeyword' supports only characters from 0-255?

                  --~--~---------~--~----~------------~-------~--~----~
                  You received this message from the "vim_use" maillist.
                  For more information, visit http://www.vim.org/maillist.php
                  -~----------~----~----~----~------~----~------~--~---
                • Tony Mechelynck
                  ... I m not sure. I suppose that option was defined before Unicode became well-known, maybe even before it existed, when most charsets were of the 8-bit kind
                  Message 8 of 17 , Jan 3, 2009
                  • 0 Attachment
                    On 04/01/09 06:30, pansz wrote:
                    > Tony Mechelynck 写道:
                    >> If you want to be sure, try some Chinese text with both hanzi and
                    >> wide-punctuation and see where the yiw (yank inner word) or viw (visual
                    >> inner word) stops. Here's a sample for you: 道可道、非常道。名可名、非常
                    >> 名。 ;-)
                    >
                    > Interesting, I see the wide punctuation characters are recognized, so
                    > vim is using wide character internally, and omitting some particular
                    > wide-character from 'iskeyword' shouldn't be hard.
                    >
                    > Then why the 'iskeyword' supports only characters from 0-255?

                    I'm not sure. I suppose that option was defined before Unicode became
                    well-known, maybe even before it existed, when most charsets were of the
                    8-bit kind except for East-Asian scripts, which required "special" MBCS
                    versions of the OSes anyway (such as MS-DOS 2.25).

                    Once the Unicode standard was published, it included not only mappings
                    of codepoints to glyphs but also quite a lot of metadata about these
                    codepoints (such as wide vs. narrow vs. ambiguous, LTR vs. RTL vs.
                    ambiguous, lower/ upper/ titlecase, punctuation, number systems, etc.).
                    However, Vim versions with -multi_byte must still be supported, and they
                    don't have access to that wealth of meta-information. Also, IIUC it's in
                    the ASCII range that there is most variation between programming
                    languages, operating systems, human languages, etc. concerning which
                    characters may be used in which circumstances.


                    Best regards,
                    Tony.
                    --
                    If there are epigrams, there must be meta-epigrams.

                    --~--~---------~--~----~------------~-------~--~----~
                    You received this message from the "vim_use" maillist.
                    For more information, visit http://www.vim.org/maillist.php
                    -~----------~----~----~----~------~----~------~--~---
                  • pansz
                    ... Human languages of CJK are not in the ASCII range at all and I bet CJK have more than 30% of the world population. Vim is for programmers, is it _only_ for
                    Message 9 of 17 , Jan 3, 2009
                    • 0 Attachment
                      Tony Mechelynck 写道:
                      > I'm not sure. I suppose that option was defined before Unicode became
                      > well-known, maybe even before it existed, when most charsets were of the
                      > 8-bit kind except for East-Asian scripts, which required "special" MBCS
                      > versions of the OSes anyway (such as MS-DOS 2.25).
                      >
                      > Once the Unicode standard was published, it included not only mappings
                      > of codepoints to glyphs but also quite a lot of metadata about these
                      > codepoints (such as wide vs. narrow vs. ambiguous, LTR vs. RTL vs.
                      > ambiguous, lower/ upper/ titlecase, punctuation, number systems, etc.).
                      > However, Vim versions with -multi_byte must still be supported, and they
                      > don't have access to that wealth of meta-information. Also, IIUC it's in
                      > the ASCII range that there is most variation between programming
                      > languages, operating systems, human languages, etc. concerning which
                      > characters may be used in which circumstances.

                      Human languages of CJK are not in the ASCII range at all and I bet CJK
                      have more than 30% of the world population. Vim is for programmers, is
                      it _only_ for programmers?

                      The difficulties may be that 'iskeyword' is a whitelist, not a
                      blacklist, we cannot easily blacklist a single Unicode character in
                      'iskeyword' without knowing *all* the Unicode characters which matches
                      iswalpha().

                      Perhaps the simplest approach is to add an option 'isnkeyword' which
                      supports any Unicode character and we can blacklist some Unicode
                      characters while still retain the 'iskeyword' option functioning.



                      --~--~---------~--~----~------------~-------~--~----~
                      You received this message from the "vim_use" maillist.
                      For more information, visit http://www.vim.org/maillist.php
                      -~----------~----~----~----~------~----~------~--~---
                    • bill lam
                      On Sun, 04 Jan 2009, pansz wrote: Interesting, I see the wide punctuation characters are recognized, so vim is using wide character internally, and
                      Message 10 of 17 , Jan 3, 2009
                      • 0 Attachment
                        On Sun, 04 Jan 2009, pansz wrote:
                        > Interesting, I see the wide punctuation characters are recognized, so
                        > vim is using wide character internally, and omitting some particular
                        > wide-character from 'iskeyword' shouldn't be hard.
                        >
                        > Then why the 'iskeyword' supports only characters from 0-255?

                        Just wild guess since I've never looked into vim's source code. I
                        think that iskeyword or spellcheck for that matter use FSM to
                        implement the parser. It's ok to have a table of 256 characters but
                        not so easy to work with a table of millions of unicode characters.
                        A quick and dirty workaround is to coerce all non 8-bit characters as
                        white space.

                        --
                        regards,
                        ====================================================
                        GPG key 1024D/4434BAB3 2008-08-24
                        gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
                        唐詩202 盧綸 晚次鄂州
                        雲開遠見漢陽城 猶是孤帆一日程 估客晝眠知浪靜 舟人夜語覺潮生
                        三湘愁鬢逢秋色 萬里歸心對月明 舊業已隨征戰盡 更堪江上鼓鼙聲

                        --~--~---------~--~----~------------~-------~--~----~
                        You received this message from the "vim_use" maillist.
                        For more information, visit http://www.vim.org/maillist.php
                        -~----------~----~----~----~------~----~------~--~---
                      • Tony Mechelynck
                        ... No, but each hanzi (not fullwidth punct) is supposed to be a word or word part of some kind, with punctuation, whitespace and diacritics all totally
                        Message 11 of 17 , Jan 4, 2009
                        • 0 Attachment
                          On 04/01/09 07:53, pansz wrote:
                          > Tony Mechelynck 写道:
                          >> I'm not sure. I suppose that option was defined before Unicode became
                          >> well-known, maybe even before it existed, when most charsets were of the
                          >> 8-bit kind except for East-Asian scripts, which required "special" MBCS
                          >> versions of the OSes anyway (such as MS-DOS 2.25).
                          >>
                          >> Once the Unicode standard was published, it included not only mappings
                          >> of codepoints to glyphs but also quite a lot of metadata about these
                          >> codepoints (such as wide vs. narrow vs. ambiguous, LTR vs. RTL vs.
                          >> ambiguous, lower/ upper/ titlecase, punctuation, number systems, etc.).
                          >> However, Vim versions with -multi_byte must still be supported, and they
                          >> don't have access to that wealth of meta-information. Also, IIUC it's in
                          >> the ASCII range that there is most variation between programming
                          >> languages, operating systems, human languages, etc. concerning which
                          >> characters may be used in which circumstances.
                          >
                          > Human languages of CJK are not in the ASCII range at all and I bet CJK
                          > have more than 30% of the world population. Vim is for programmers, is
                          > it _only_ for programmers?

                          No, but each hanzi (not fullwidth punct) is supposed to be a "word" or
                          "word part" of some kind, with punctuation, whitespace and diacritics
                          all totally outside the "word" range. "Not" is a word in English,
                          regardless of whether it's used alone or in "cannot" or
                          "notwithstanding". These two uses sound almost Chinese-like to me... who
                          don't really know more than a handful of Chinese words. I suppose that
                          if English, like Japanese, used Han-script, "notwithstanding" might be
                          written not-against-stay-now with four glyphs? But I'm daydreaming.

                          >
                          > The difficulties may be that 'iskeyword' is a whitelist, not a
                          > blacklist, we cannot easily blacklist a single Unicode character in
                          > 'iskeyword' without knowing *all* the Unicode characters which matches
                          > iswalpha().

                          A more important difficulty is that 'iskeyword' applies only to Unicode
                          codepoints U+0000 to U+007F when 'encoding' is UTF-8 (or any Unicode
                          value aliased to UTF-8 for internal memory), and to characters 0x00 to
                          0xFF when it isn't. Otherwise we might perhaps use ":setlocal isk-=不
                          isk-=之" or some such. This would also mean several arrays of 2 gigabits
                          rather than 256 bits to remember the settings (Vim treats the Unicode
                          range as 0 to 7FFFFFFF. Even if it limited itself to the current
                          official maximum of 10FFFD it would still mean a big increase.)

                          >
                          > Perhaps the simplest approach is to add an option 'isnkeyword' which
                          > supports any Unicode character and we can blacklist some Unicode
                          > characters while still retain the 'iskeyword' option functioning.

                          Hm. Don't know if Bram would accept that, but you can always try to
                          publish (and maintain) an unofficial patch to the C source. Don't know
                          how easy (and foolproof) it would be. For a single option, a has()
                          feature might be useful but it's less needed than for a whole batch of
                          them: we would always be able to test ":if exists('+isnkeyword')".


                          Best regards,
                          Tony.
                          --
                          A truly wise man never plays leapfrog with a unicorn.

                          --~--~---------~--~----~------------~-------~--~----~
                          You received this message from the "vim_use" maillist.
                          For more information, visit http://www.vim.org/maillist.php
                          -~----------~----~----~----~------~----~------~--~---
                        • Tony Mechelynck
                          ... Actually Vim uses a different method (a table of ranges, I think) for Unicode codepoints which require two or more UTF-8 bytes, since we ve established
                          Message 12 of 17 , Jan 4, 2009
                          • 0 Attachment
                            On 04/01/09 08:10, bill lam wrote:
                            > On Sun, 04 Jan 2009, pansz wrote:
                            >> Interesting, I see the wide punctuation characters are recognized, so
                            >> vim is using wide character internally, and omitting some particular
                            >> wide-character from 'iskeyword' shouldn't be hard.
                            >>
                            >> Then why the 'iskeyword' supports only characters from 0-255?
                            >
                            > Just wild guess since I've never looked into vim's source code. I
                            > think that iskeyword or spellcheck for that matter use FSM to
                            > implement the parser. It's ok to have a table of 256 characters but
                            > not so easy to work with a table of millions of unicode characters.
                            > A quick and dirty workaround is to coerce all non 8-bit characters as
                            > white space.
                            >

                            Actually Vim uses a different method (a table of ranges, I think) for
                            Unicode codepoints which require two or more UTF-8 bytes, since we've
                            established that fullwith comma and fullwidth fullstop are (properly)
                            recognized as breaking "word" selection, and that "ordinary" hanzi aren't.

                            Best regards,
                            Tony.
                            --
                            Hippogriff, n.:
                            An animal (now extinct) which was half horse and half griffin.
                            The griffin was itself a compound creature, half lion and half eagle.
                            The hippogriff was actually, therefore, only one quarter eagle, which
                            is two dollars and fifty cents in gold. The study of zoology is full
                            of surprises.
                            -- Ambrose Bierce, "The Devil's Dictionary"

                            --~--~---------~--~----~------------~-------~--~----~
                            You received this message from the "vim_use" maillist.
                            For more information, visit http://www.vim.org/maillist.php
                            -~----------~----~----~----~------~----~------~--~---
                          Your message has been successfully submitted and would be delivered to recipients shortly.