Loading ...
Sorry, an error occurred while loading the content.
 

possible to make iskeyword supports multibyte charactor?

Expand Messages
  • anhnmncb
    Hi, list, when I type Chinese text in vim, I find it s unconvenient for completing Chinese word with C-p/n, because a Chinese word is not seperated by space
    Message 1 of 17 , Dec 31, 2008
      Hi, list,

      when I type Chinese text in vim, I find it's unconvenient for completing
      Chinese word with C-p/n, because a Chinese word is not seperated by space but
      some charactors like "and", "or" and others(I use English to reprent a Chinese
      charactor), so a Chinese sententce will like this:

      ThisIsAChineseWordInSentence.(This is a Chinese word in sentence.)

      When I have typed "ThisIsAChineseWordIn", now if I want to type Sen<C-p> then
      vim can't complete word "Sentence" for me. So I think if iskeyword supports
      adding Chinese charactor to itself, for example(My client doesn't support
      Chinese, so I use "and" to represent a Chinese charactor):

      set iskeyword+="and"

      then autocompletion will be without problem with Chinese. I don't know if it
      is easy to handle?

      --
      Regards,
      anhnmncb


      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_use" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • anhnmncb
      ... I meant set iskeyword-= and . ... Also, it will let me can navigate quicker in a long Chinese sentence, now I have to use /? or fFtT or some hjkls then
      Message 2 of 17 , Dec 31, 2008
        On 2008-12-31, anhnmncb wrote:
        >
        > Hi, list,
        >
        > when I type Chinese text in vim, I find it's unconvenient for completing
        > Chinese word with C-p/n, because a Chinese word is not seperated by space but
        > some charactors like "and", "or" and others(I use English to reprent a Chinese
        > charactor), so a Chinese sententce will like this:
        >
        > ThisIsAChineseWordInSentence.(This is a Chinese word in sentence.)
        >
        > When I have typed "ThisIsAChineseWordIn", now if I want to type Sen<C-p> then
        > vim can't complete word "Sentence" for me. So I think if iskeyword supports
        > adding Chinese charactor to itself, for example(My client doesn't support
        > Chinese, so I use "and" to represent a Chinese charactor):
        >
        > set iskeyword+="and"
        I meant set iskeyword-="and".
        >
        > then autocompletion will be without problem with Chinese. I don't know if it
        > is easy to handle?

        Also, it will let me can navigate quicker in a long Chinese sentence, now I
        have to use /? or fFtT or some hjkls then input a Chinese charactor(sometimes
        To input a Chinese charactor needs to type at least 3 english charactor).


        --
        Regards,
        anhnmncb


        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_use" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • anhnmncb
        Ping! ... -- Regards, anhnmncb --~--~---------~--~----~------------~-------~--~----~ You received this message from the vim_use maillist. For more
        Message 3 of 17 , Jan 2, 2009
          Ping!

          On 2008-12-31, anhnmncb wrote:
          >
          > On 2008-12-31, anhnmncb wrote:
          >>
          >> Hi, list,
          >>
          >> when I type Chinese text in vim, I find it's unconvenient for completing
          >> Chinese word with C-p/n, because a Chinese word is not seperated by space but
          >> some charactors like "and", "or" and others(I use English to reprent a Chinese
          >> charactor), so a Chinese sententce will like this:
          >>
          >> ThisIsAChineseWordInSentence.(This is a Chinese word in sentence.)
          >>
          >> When I have typed "ThisIsAChineseWordIn", now if I want to type Sen<C-p> then
          >> vim can't complete word "Sentence" for me. So I think if iskeyword supports
          >> adding Chinese charactor to itself, for example(My client doesn't support
          >> Chinese, so I use "and" to represent a Chinese charactor):
          >>
          >> set iskeyword+="and"
          > I meant set iskeyword-="and".
          >>
          >> then autocompletion will be without problem with Chinese. I don't know if it
          >> is easy to handle?
          >
          > Also, it will let me can navigate quicker in a long Chinese sentence, now I
          > have to use /? or fFtT or some hjkls then input a Chinese charactor(sometimes
          > To input a Chinese charactor needs to type at least 3 english charactor).
          >
          >


          --
          Regards,
          anhnmncb


          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_use" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • Matt Wozniski
          AFAICS, there s no way to make multibyte characters non-keyword characters. :help isk says See isfname for a description of the format of this option.
          Message 4 of 17 , Jan 2, 2009
            AFAICS, there's no way to make multibyte characters non-keyword
            characters. :help 'isk' says "See 'isfname' for a description of the
            format of this option." and :help 'isfname' says "Multi-byte
            characters 256 and above are always included" - doesn't seem
            changable.

            ~Matt

            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_use" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • Tony Mechelynck
            ... If you don t get a reply on this ML, the meaning usually is not that nobody saw the question, but rather that nobody knows the answer. Search the help
            Message 5 of 17 , Jan 2, 2009
              On 02/01/09 11:30, anhnmncb wrote:
              > Ping!

              If you don't get a reply on this ML, the meaning usually is not that
              nobody saw the question, but rather that nobody knows the answer. Search
              the help first, then try to make your question clearer if the help
              doesn't give you an answer (in this case it does, see below).

              >
              > On 2008-12-31, anhnmncb wrote:
              >> On 2008-12-31, anhnmncb wrote:
              >>> Hi, list,
              >>>
              >>> when I type Chinese text in vim, I find it's unconvenient for completing
              >>> Chinese word with C-p/n, because a Chinese word is not seperated by space but
              >>> some charactors like "and", "or" and others(I use English to reprent a Chinese
              >>> charactor), so a Chinese sententce will like this:
              >>>
              >>> ThisIsAChineseWordInSentence.(This is a Chinese word in sentence.)
              >>>
              >>> When I have typed "ThisIsAChineseWordIn", now if I want to type Sen<C-p> then
              >>> vim can't complete word "Sentence" for me. So I think if iskeyword supports
              >>> adding Chinese charactor to itself, for example(My client doesn't support
              >>> Chinese, so I use "and" to represent a Chinese charactor):
              >>>
              >>> set iskeyword+="and"
              >> I meant set iskeyword-="and".
              >>> then autocompletion will be without problem with Chinese. I don't know if it
              >>> is easy to handle?
              >> Also, it will let me can navigate quicker in a long Chinese sentence, now I
              >> have to use /? or fFtT or some hjkls then input a Chinese charactor(sometimes
              >> To input a Chinese charactor needs to type at least 3 english charactor).
              >>
              >>
              >
              >

              For the meaning of its settings, ":help 'iskeyword'" resends to ":help
              'isfname'" where it is said:

              > Multi-byte characters 256 and above are always included, only the
              > characters up to 255 are specified with this option.
              > For UTF-8 the characters 0xa0 to 0xff are included as well.

              IOW it is not possible to treat some hanzi as 'iskeyword' characters and
              others not. I think the above means that even the "ideographic
              full-width space" U+3000 is treated as a keyword character, OTOH I
              wouldn't affirm this without an experiment (maybe Vim with +multi_byte
              knows about the main divisions of the Unicode codepoint range).

              Since I found no satisfactory way to use the IM (which _is_ installed on
              my system), I need at least 6 keystrokes to input any hanzi: for
              instance, for the simplest of them all, the digit one, 一 yi1 U+4E00, I
              need (after getting into Insert mode) to press Ctrl-V u 4 e 0 0 -- or
              else, I can use copy-paste if I can find it ready-made in some document.


              Best regards,
              Tony.
              --
              Paradise is exactly like where you are right now ... only much, much
              better.
              -- Laurie Anderson

              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_use" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • StarWing
              okay£¬i think Vim can developed to support word-table when ^n^p and fFtT, e.g: chinese.vim this is a chinese word-table file: beginwordtable chinese
              Message 6 of 17 , Jan 2, 2009
                okay,i think Vim can developed to support word-table when ^n^p and
                fFtT, e.g:
                chinese.vim " this is a chinese word-table file:
                beginwordtable chinese <<EOF "this is a command
                this leisi
                an an
                apple aipo
                and ande
                other ade
                is yizi
                or wuwo
                EOF

                " first col is word of chinese, second col is the word in pinyin
                (sound) or input-method.

                and we can set a option named "wordtable": set wordtable=chinese

                so, if i input: thisisanappleandother, Vim will know "this" is a word,
                and so as "is", "an", "apple", etc.
                and, if i want find an, i press "fa" or "fan", the cursor will goto
                "an", or "and", etc, i think it's no easy to make f operator to
                support multi-input(Vim don't know how many word will be input, so,
                vim can't transaction at once).

                can bram or anyone could implement it? and this function can conbime
                with the spell-check function of Vim.

                On 1月2日, 下午9时39分, Tony Mechelynck <antoine.mechely...@...>
                wrote:
                > On 02/01/09 11:30, anhnmncb wrote:
                >
                > > Ping!
                >
                > If you don't get a reply on this ML, the meaning usually is not that
                > nobody saw the question, but rather that nobody knows the answer. Search
                > the help first, then try to make your question clearer if the help
                > doesn't give you an answer (in this case it does, see below).
                >
                >
                >
                >
                >
                > > On 2008-12-31, anhnmncb wrote:
                > >> On 2008-12-31, anhnmncb wrote:
                > >>> Hi, list,
                >
                > >>> when I type Chinese text in vim, I find it's unconvenient for completing
                > >>> Chinese word with C-p/n, because a Chinese word is not seperated by space but
                > >>> some charactors like "and", "or" and others(I use English to reprent a Chinese
                > >>> charactor), so a Chinese sententce will like this:
                >
                > >>> ThisIsAChineseWordInSentence.(This is a Chinese word in sentence.)
                >
                > >>> When I have typed "ThisIsAChineseWordIn", now if I want to type Sen<C-p> then
                > >>> vim can't complete word "Sentence" for me. So I think if iskeyword supports
                > >>> adding Chinese charactor to itself, for example(My client doesn't support
                > >>> Chinese, so I use "and" to represent a Chinese charactor):
                >
                > >>> set iskeyword+="and"
                > >> I meant set iskeyword-="and".
                > >>> then autocompletion will be without problem with Chinese. I don't know if it
                > >>> is easy to handle?
                > >> Also, it will let me can navigate quicker in a long Chinese sentence, now I
                > >> have to use /? or fFtT or some hjkls then input a Chinese charactor(sometimes
                > >> To input a Chinese charactor needs to type at least 3 english charactor).
                >
                > For the meaning of its settings, ":help 'iskeyword'" resends to ":help
                > 'isfname'" where it is said:
                >
                > > Multi-byte characters 256 and above are always included, only the
                > > characters up to 255 are specified with this option.
                > > For UTF-8 the characters 0xa0 to 0xff are included as well.
                >
                > IOW it is not possible to treat some hanzi as 'iskeyword' characters and
                > others not. I think the above means that even the "ideographic
                > full-width space" U+3000 is treated as a keyword character, OTOH I
                > wouldn't affirm this without an experiment (maybe Vim with +multi_byte
                > knows about the main divisions of the Unicode codepoint range).
                >
                > Since I found no satisfactory way to use the IM (which _is_ installed on
                > my system), I need at least 6 keystrokes to input any hanzi: for
                > instance, for the simplest of them all, the digit one, 一 yi1 U+4E00, I
                > need (after getting into Insert mode) to press Ctrl-V u 4 e 0 0 -- or
                > else, I can use copy-paste if I can find it ready-made in some document.
                >
                > Best regards,
                > Tony.
                > --
                > Paradise is exactly like where you are right now ... only much, much
                > better.
                > -- Laurie Anderson
                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_use" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---
              • StarWing
                or anyone can make developer know this? ... --~--~---------~--~----~------------~-------~--~----~ You received this message from the vim_use maillist. For
                Message 7 of 17 , Jan 2, 2009
                  or anyone can make developer know this?

                  On 1月2日, 下午10时29分, StarWing <weasley...@...> wrote:
                  > okay,i think Vim can developed to support word-table when ^n^p and
                  > fFtT, e.g:
                  > chinese.vim " this is a chinese word-table file:
                  > beginwordtable chinese <<EOF "this is a command
                  > this leisi
                  > an an
                  > apple aipo
                  > and ande
                  > other ade
                  > is yizi
                  > or wuwo
                  > EOF
                  >
                  > " first col is word of chinese, second col is the word in pinyin
                  > (sound) or input-method.
                  >
                  > and we can set a option named "wordtable": set wordtable=chinese
                  >
                  > so, if i input: thisisanappleandother, Vim will know "this" is a word,
                  > and so as "is", "an", "apple", etc.
                  > and, if i want find an, i press "fa" or "fan", the cursor will goto
                  > "an", or "and", etc, i think it's no easy to make f operator to
                  > support multi-input(Vim don't know how many word will be input, so,
                  > vim can't transaction at once).
                  >
                  > can bram or anyone could implement it? and this function can conbime
                  > with the spell-check function of Vim.
                  >
                  > On 1月2日, 下午9时39分, Tony Mechelynck <antoine.mechely...@...>
                  > wrote:
                  >
                  > > On 02/01/09 11:30, anhnmncb wrote:
                  >
                  > > > Ping!
                  >
                  > > If you don't get a reply on this ML, the meaning usually is not that
                  > > nobody saw the question, but rather that nobody knows the answer. Search
                  > > the help first, then try to make your question clearer if the help
                  > > doesn't give you an answer (in this case it does, see below).
                  >
                  > > > On 2008-12-31, anhnmncb wrote:
                  > > >> On 2008-12-31, anhnmncb wrote:
                  > > >>> Hi, list,
                  >
                  > > >>> when I type Chinese text in vim, I find it's unconvenient for completing
                  > > >>> Chinese word with C-p/n, because a Chinese word is not seperated by space but
                  > > >>> some charactors like "and", "or" and others(I use English to reprent a Chinese
                  > > >>> charactor), so a Chinese sententce will like this:
                  >
                  > > >>> ThisIsAChineseWordInSentence.(This is a Chinese word in sentence.)
                  >
                  > > >>> When I have typed "ThisIsAChineseWordIn", now if I want to type Sen<C-p> then
                  > > >>> vim can't complete word "Sentence" for me. So I think if iskeyword supports
                  > > >>> adding Chinese charactor to itself, for example(My client doesn't support
                  > > >>> Chinese, so I use "and" to represent a Chinese charactor):
                  >
                  > > >>> set iskeyword+="and"
                  > > >> I meant set iskeyword-="and".
                  > > >>> then autocompletion will be without problem with Chinese. I don't know if it
                  > > >>> is easy to handle?
                  > > >> Also, it will let me can navigate quicker in a long Chinese sentence, now I
                  > > >> have to use /? or fFtT or some hjkls then input a Chinese charactor(sometimes
                  > > >> To input a Chinese charactor needs to type at least 3 english charactor).
                  >
                  > > For the meaning of its settings, ":help 'iskeyword'" resends to ":help
                  > > 'isfname'" where it is said:
                  >
                  > > > Multi-byte characters 256 and above are always included, only the
                  > > > characters up to 255 are specified with this option.
                  > > > For UTF-8 the characters 0xa0 to 0xff are included as well.
                  >
                  > > IOW it is not possible to treat some hanzi as 'iskeyword' characters and
                  > > others not. I think the above means that even the "ideographic
                  > > full-width space" U+3000 is treated as a keyword character, OTOH I
                  > > wouldn't affirm this without an experiment (maybe Vim with +multi_byte
                  > > knows about the main divisions of the Unicode codepoint range).
                  >
                  > > Since I found no satisfactory way to use the IM (which _is_ installed on
                  > > my system), I need at least 6 keystrokes to input any hanzi: for
                  > > instance, for the simplest of them all, the digit one, 一 yi1 U+4E00, I
                  > > need (after getting into Insert mode) to press Ctrl-V u 4 e 0 0 -- or
                  > > else, I can use copy-paste if I can find it ready-made in some document.
                  >
                  > > Best regards,
                  > > Tony.
                  > > --
                  > > Paradise is exactly like where you are right now ... only much, much
                  > > better.
                  > > -- Laurie Anderson
                  >
                  >
                  --~--~---------~--~----~------------~-------~--~----~
                  You received this message from the "vim_use" maillist.
                  For more information, visit http://www.vim.org/maillist.php
                  -~----------~----~----~----~------~----~------~--~---
                • Tony Mechelynck
                  ... [...] AFAIK, most Vim developers read not only the vim_dev group but also the vim_use group. Best regards, Tony. -- The trouble with doing something right
                  Message 8 of 17 , Jan 2, 2009
                    On 02/01/09 15:32, StarWing wrote:
                    > or anyone can make developer know this?
                    [...]

                    AFAIK, most Vim developers read not only the vim_dev group but also the
                    vim_use group.

                    Best regards,
                    Tony.
                    --
                    The trouble with doing something right the first time is that nobody
                    appreciates how difficult it was.

                    --~--~---------~--~----~------------~-------~--~----~
                    You received this message from the "vim_use" maillist.
                    For more information, visit http://www.vim.org/maillist.php
                    -~----------~----~----~----~------~----~------~--~---
                  • pansz
                    ... This seems to hint vim is not using the standard iswalpha(), iswpunct() series widechar-type-check functions in . As far as I know the
                    Message 9 of 17 , Jan 3, 2009
                      Tony Mechelynck 写道:
                      > For the meaning of its settings, ":help 'iskeyword'" resends to ":help
                      > 'isfname'" where it is said:
                      >
                      >> Multi-byte characters 256 and above are always included, only the
                      >> characters up to 255 are specified with this option.
                      >> For UTF-8 the characters 0xa0 to 0xff are included as well.
                      >
                      > IOW it is not possible to treat some hanzi as 'iskeyword' characters and
                      > others not. I think the above means that even the "ideographic
                      > full-width space" U+3000 is treated as a keyword character, OTOH I
                      > wouldn't affirm this without an experiment (maybe Vim with +multi_byte
                      > knows about the main divisions of the Unicode codepoint range).

                      This seems to hint vim is not using the standard iswalpha(), iswpunct()
                      series widechar-type-check functions in <wctypes.h>.

                      As far as I know the iswalpha() returns true only on true hanzi
                      characters and will not return true on characters such as "ideographic
                      full-width space".

                      I guess this is a choice for efficiency if vim uses utf-8 internally,
                      since utf-8 must be converted to ucs in order to use wctypes.

                      If that is the case, making iskeyword supports multibyte character isn't
                      hard (I had done similar things for Lua script language), but will
                      sacrifice performance.


                      --~--~---------~--~----~------------~-------~--~----~
                      You received this message from the "vim_use" maillist.
                      For more information, visit http://www.vim.org/maillist.php
                      -~----------~----~----~----~------~----~------~--~---
                    • Tony Mechelynck
                      ... If you want to be sure, try some Chinese text with both hanzi and wide-punctuation and see where the yiw (yank inner word) or viw (visual inner word)
                      Message 10 of 17 , Jan 3, 2009
                        On 04/01/09 04:07, pansz wrote:
                        > Tony Mechelynck 写道:
                        >> For the meaning of its settings, ":help 'iskeyword'" resends to ":help
                        >> 'isfname'" where it is said:
                        >>
                        >>> Multi-byte characters 256 and above are always included, only the
                        >>> characters up to 255 are specified with this option.
                        >>> For UTF-8 the characters 0xa0 to 0xff are included as well.
                        >> IOW it is not possible to treat some hanzi as 'iskeyword' characters and
                        >> others not. I think the above means that even the "ideographic
                        >> full-width space" U+3000 is treated as a keyword character, OTOH I
                        >> wouldn't affirm this without an experiment (maybe Vim with +multi_byte
                        >> knows about the main divisions of the Unicode codepoint range).
                        >
                        > This seems to hint vim is not using the standard iswalpha(), iswpunct()
                        > series widechar-type-check functions in<wctypes.h>.
                        >
                        > As far as I know the iswalpha() returns true only on true hanzi
                        > characters and will not return true on characters such as "ideographic
                        > full-width space".
                        >
                        > I guess this is a choice for efficiency if vim uses utf-8 internally,
                        > since utf-8 must be converted to ucs in order to use wctypes.
                        >
                        > If that is the case, making iskeyword supports multibyte character isn't
                        > hard (I had done similar things for Lua script language), but will
                        > sacrifice performance.

                        If you want to be sure, try some Chinese text with both hanzi and
                        wide-punctuation and see where the yiw (yank inner word) or viw (visual
                        inner word) stops. Here's a sample for you: 道可道、非常道。名可名、非常
                        名。 ;-)

                        In my Huge gvim 7.2.077 with +multi_byte, viw includes neither
                        ideographic comma nor ideographic full stop; but AFAIK there's no way to
                        tell vim that 不 "not", 故 "thus", 之 "'s" etc. are non-keyword
                        characters, since for multibyte characters this kind of status is hardcoded.


                        Best regards,
                        Tony.
                        --
                        TV is chewing gum for the eyes.
                        -- Frank Lloyd Wright

                        --~--~---------~--~----~------------~-------~--~----~
                        You received this message from the "vim_use" maillist.
                        For more information, visit http://www.vim.org/maillist.php
                        -~----------~----~----~----~------~----~------~--~---
                      • Sean
                        ... It only takes three keystrokes (yi ) to type your example, 一, using my newly-created IME at
                        Message 11 of 17 , Jan 3, 2009
                          > Since I found no satisfactory way to use the IM (which _is_
                          > installed on my system), I need at least 6 keystrokes to input any
                          > hanzi: for instance, for the simplest of them all, the digit one,
                          > 一 yi1 U+4E00, I need (after getting into Insert mode) to press
                          > Ctrl-V u 4 e 0 0 -- or else, I can use copy-paste if I can find it
                          > ready-made in some document.

                          It only takes three keystrokes (yi<C-I>) to type your example, 一,
                          using my newly-created IME at
                          http://vim.sourceforge.net/scripts/script.php?script_id=2506

                          Welcome to vim built-in IME :))

                          Sean


                          On Jan 2, 5:39 am, Tony Mechelynck <antoine.mechely...@...>
                          wrote:
                          > On 02/01/09 11:30, anhnmncb wrote:
                          >
                          > > Ping!
                          >
                          > If you don't get a reply on this ML, the meaning usually is not that
                          > nobody saw the question, but rather that nobody knows the answer. Search
                          > the help first, then try to make your question clearer if the help
                          > doesn't give you an answer (in this case it does, see below).
                          >
                          >
                          >
                          >
                          >
                          > > On 2008-12-31, anhnmncb wrote:
                          > >> On 2008-12-31, anhnmncb wrote:
                          > >>> Hi, list,
                          >
                          > >>> when I type Chinese text in vim, I find it's unconvenient for completing
                          > >>> Chinese word with C-p/n, because a Chinese word is not seperated by space but
                          > >>> some charactors like "and", "or" and others(I use English to reprent a Chinese
                          > >>> charactor), so a Chinese sententce will like this:
                          >
                          > >>> ThisIsAChineseWordInSentence.(This is a Chinese word in sentence.)
                          >
                          > >>> When I have typed "ThisIsAChineseWordIn", now if I want to type Sen<C-p> then
                          > >>> vim can't complete word "Sentence" for me. So I think if iskeyword supports
                          > >>> adding Chinese charactor to itself, for example(My client doesn't support
                          > >>> Chinese, so I use "and" to represent a Chinese charactor):
                          >
                          > >>> set iskeyword+="and"
                          > >> I meant set iskeyword-="and".
                          > >>> then autocompletion will be without problem with Chinese. I don't know if it
                          > >>> is easy to handle?
                          > >> Also, it will let me can navigate quicker in a long Chinese sentence, now I
                          > >> have to use /? or fFtT or some hjkls then input a Chinese charactor(sometimes
                          > >> To input a Chinese charactor needs to type at least 3 english charactor).
                          >
                          > For the meaning of its settings, ":help 'iskeyword'" resends to ":help
                          > 'isfname'" where it is said:
                          >
                          > > Multi-byte characters 256 and above are always included, only the
                          > > characters up to 255 are specified with this option.
                          > > For UTF-8 the characters 0xa0 to 0xff are included as well.
                          >
                          > IOW it is not possible to treat some hanzi as 'iskeyword' characters and
                          > others not. I think the above means that even the "ideographic
                          > full-width space" U+3000 is treated as a keyword character, OTOH I
                          > wouldn't affirm this without an experiment (maybe Vim with +multi_byte
                          > knows about the main divisions of the Unicode codepoint range).
                          >
                          > Since I found no satisfactory way to use the IM (which _is_ installed on
                          > my system), I need at least 6 keystrokes to input any hanzi: for
                          > instance, for the simplest of them all, the digit one, 一 yi1 U+4E00, I
                          > need (after getting into Insert mode) to press Ctrl-V u 4 e 0 0 -- or
                          > else, I can use copy-paste if I can find it ready-made in some document.
                          >
                          > Best regards,
                          > Tony.
                          > --
                          > Paradise is exactly like where you are right now ... only much, much
                          > better.
                          > -- Laurie Anderson
                          --~--~---------~--~----~------------~-------~--~----~
                          You received this message from the "vim_use" maillist.
                          For more information, visit http://www.vim.org/maillist.php
                          -~----------~----~----~----~------~----~------~--~---
                        • pansz
                          ... Interesting, I see the wide punctuation characters are recognized, so vim is using wide character internally, and omitting some particular wide-character
                          Message 12 of 17 , Jan 3, 2009
                            Tony Mechelynck 写道:
                            > If you want to be sure, try some Chinese text with both hanzi and
                            > wide-punctuation and see where the yiw (yank inner word) or viw (visual
                            > inner word) stops. Here's a sample for you: 道可道、非常道。名可名、非常
                            > 名。 ;-)

                            Interesting, I see the wide punctuation characters are recognized, so
                            vim is using wide character internally, and omitting some particular
                            wide-character from 'iskeyword' shouldn't be hard.

                            Then why the 'iskeyword' supports only characters from 0-255?

                            --~--~---------~--~----~------------~-------~--~----~
                            You received this message from the "vim_use" maillist.
                            For more information, visit http://www.vim.org/maillist.php
                            -~----------~----~----~----~------~----~------~--~---
                          • Tony Mechelynck
                            ... I m not sure. I suppose that option was defined before Unicode became well-known, maybe even before it existed, when most charsets were of the 8-bit kind
                            Message 13 of 17 , Jan 3, 2009
                              On 04/01/09 06:30, pansz wrote:
                              > Tony Mechelynck 写道:
                              >> If you want to be sure, try some Chinese text with both hanzi and
                              >> wide-punctuation and see where the yiw (yank inner word) or viw (visual
                              >> inner word) stops. Here's a sample for you: 道可道、非常道。名可名、非常
                              >> 名。 ;-)
                              >
                              > Interesting, I see the wide punctuation characters are recognized, so
                              > vim is using wide character internally, and omitting some particular
                              > wide-character from 'iskeyword' shouldn't be hard.
                              >
                              > Then why the 'iskeyword' supports only characters from 0-255?

                              I'm not sure. I suppose that option was defined before Unicode became
                              well-known, maybe even before it existed, when most charsets were of the
                              8-bit kind except for East-Asian scripts, which required "special" MBCS
                              versions of the OSes anyway (such as MS-DOS 2.25).

                              Once the Unicode standard was published, it included not only mappings
                              of codepoints to glyphs but also quite a lot of metadata about these
                              codepoints (such as wide vs. narrow vs. ambiguous, LTR vs. RTL vs.
                              ambiguous, lower/ upper/ titlecase, punctuation, number systems, etc.).
                              However, Vim versions with -multi_byte must still be supported, and they
                              don't have access to that wealth of meta-information. Also, IIUC it's in
                              the ASCII range that there is most variation between programming
                              languages, operating systems, human languages, etc. concerning which
                              characters may be used in which circumstances.


                              Best regards,
                              Tony.
                              --
                              If there are epigrams, there must be meta-epigrams.

                              --~--~---------~--~----~------------~-------~--~----~
                              You received this message from the "vim_use" maillist.
                              For more information, visit http://www.vim.org/maillist.php
                              -~----------~----~----~----~------~----~------~--~---
                            • pansz
                              ... Human languages of CJK are not in the ASCII range at all and I bet CJK have more than 30% of the world population. Vim is for programmers, is it _only_ for
                              Message 14 of 17 , Jan 3, 2009
                                Tony Mechelynck 写道:
                                > I'm not sure. I suppose that option was defined before Unicode became
                                > well-known, maybe even before it existed, when most charsets were of the
                                > 8-bit kind except for East-Asian scripts, which required "special" MBCS
                                > versions of the OSes anyway (such as MS-DOS 2.25).
                                >
                                > Once the Unicode standard was published, it included not only mappings
                                > of codepoints to glyphs but also quite a lot of metadata about these
                                > codepoints (such as wide vs. narrow vs. ambiguous, LTR vs. RTL vs.
                                > ambiguous, lower/ upper/ titlecase, punctuation, number systems, etc.).
                                > However, Vim versions with -multi_byte must still be supported, and they
                                > don't have access to that wealth of meta-information. Also, IIUC it's in
                                > the ASCII range that there is most variation between programming
                                > languages, operating systems, human languages, etc. concerning which
                                > characters may be used in which circumstances.

                                Human languages of CJK are not in the ASCII range at all and I bet CJK
                                have more than 30% of the world population. Vim is for programmers, is
                                it _only_ for programmers?

                                The difficulties may be that 'iskeyword' is a whitelist, not a
                                blacklist, we cannot easily blacklist a single Unicode character in
                                'iskeyword' without knowing *all* the Unicode characters which matches
                                iswalpha().

                                Perhaps the simplest approach is to add an option 'isnkeyword' which
                                supports any Unicode character and we can blacklist some Unicode
                                characters while still retain the 'iskeyword' option functioning.



                                --~--~---------~--~----~------------~-------~--~----~
                                You received this message from the "vim_use" maillist.
                                For more information, visit http://www.vim.org/maillist.php
                                -~----------~----~----~----~------~----~------~--~---
                              • bill lam
                                On Sun, 04 Jan 2009, pansz wrote: Interesting, I see the wide punctuation characters are recognized, so vim is using wide character internally, and
                                Message 15 of 17 , Jan 3, 2009
                                  On Sun, 04 Jan 2009, pansz wrote:
                                  > Interesting, I see the wide punctuation characters are recognized, so
                                  > vim is using wide character internally, and omitting some particular
                                  > wide-character from 'iskeyword' shouldn't be hard.
                                  >
                                  > Then why the 'iskeyword' supports only characters from 0-255?

                                  Just wild guess since I've never looked into vim's source code. I
                                  think that iskeyword or spellcheck for that matter use FSM to
                                  implement the parser. It's ok to have a table of 256 characters but
                                  not so easy to work with a table of millions of unicode characters.
                                  A quick and dirty workaround is to coerce all non 8-bit characters as
                                  white space.

                                  --
                                  regards,
                                  ====================================================
                                  GPG key 1024D/4434BAB3 2008-08-24
                                  gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
                                  唐詩202 盧綸 晚次鄂州
                                  雲開遠見漢陽城 猶是孤帆一日程 估客晝眠知浪靜 舟人夜語覺潮生
                                  三湘愁鬢逢秋色 萬里歸心對月明 舊業已隨征戰盡 更堪江上鼓鼙聲

                                  --~--~---------~--~----~------------~-------~--~----~
                                  You received this message from the "vim_use" maillist.
                                  For more information, visit http://www.vim.org/maillist.php
                                  -~----------~----~----~----~------~----~------~--~---
                                • Tony Mechelynck
                                  ... No, but each hanzi (not fullwidth punct) is supposed to be a word or word part of some kind, with punctuation, whitespace and diacritics all totally
                                  Message 16 of 17 , Jan 4, 2009
                                    On 04/01/09 07:53, pansz wrote:
                                    > Tony Mechelynck 写道:
                                    >> I'm not sure. I suppose that option was defined before Unicode became
                                    >> well-known, maybe even before it existed, when most charsets were of the
                                    >> 8-bit kind except for East-Asian scripts, which required "special" MBCS
                                    >> versions of the OSes anyway (such as MS-DOS 2.25).
                                    >>
                                    >> Once the Unicode standard was published, it included not only mappings
                                    >> of codepoints to glyphs but also quite a lot of metadata about these
                                    >> codepoints (such as wide vs. narrow vs. ambiguous, LTR vs. RTL vs.
                                    >> ambiguous, lower/ upper/ titlecase, punctuation, number systems, etc.).
                                    >> However, Vim versions with -multi_byte must still be supported, and they
                                    >> don't have access to that wealth of meta-information. Also, IIUC it's in
                                    >> the ASCII range that there is most variation between programming
                                    >> languages, operating systems, human languages, etc. concerning which
                                    >> characters may be used in which circumstances.
                                    >
                                    > Human languages of CJK are not in the ASCII range at all and I bet CJK
                                    > have more than 30% of the world population. Vim is for programmers, is
                                    > it _only_ for programmers?

                                    No, but each hanzi (not fullwidth punct) is supposed to be a "word" or
                                    "word part" of some kind, with punctuation, whitespace and diacritics
                                    all totally outside the "word" range. "Not" is a word in English,
                                    regardless of whether it's used alone or in "cannot" or
                                    "notwithstanding". These two uses sound almost Chinese-like to me... who
                                    don't really know more than a handful of Chinese words. I suppose that
                                    if English, like Japanese, used Han-script, "notwithstanding" might be
                                    written not-against-stay-now with four glyphs? But I'm daydreaming.

                                    >
                                    > The difficulties may be that 'iskeyword' is a whitelist, not a
                                    > blacklist, we cannot easily blacklist a single Unicode character in
                                    > 'iskeyword' without knowing *all* the Unicode characters which matches
                                    > iswalpha().

                                    A more important difficulty is that 'iskeyword' applies only to Unicode
                                    codepoints U+0000 to U+007F when 'encoding' is UTF-8 (or any Unicode
                                    value aliased to UTF-8 for internal memory), and to characters 0x00 to
                                    0xFF when it isn't. Otherwise we might perhaps use ":setlocal isk-=不
                                    isk-=之" or some such. This would also mean several arrays of 2 gigabits
                                    rather than 256 bits to remember the settings (Vim treats the Unicode
                                    range as 0 to 7FFFFFFF. Even if it limited itself to the current
                                    official maximum of 10FFFD it would still mean a big increase.)

                                    >
                                    > Perhaps the simplest approach is to add an option 'isnkeyword' which
                                    > supports any Unicode character and we can blacklist some Unicode
                                    > characters while still retain the 'iskeyword' option functioning.

                                    Hm. Don't know if Bram would accept that, but you can always try to
                                    publish (and maintain) an unofficial patch to the C source. Don't know
                                    how easy (and foolproof) it would be. For a single option, a has()
                                    feature might be useful but it's less needed than for a whole batch of
                                    them: we would always be able to test ":if exists('+isnkeyword')".


                                    Best regards,
                                    Tony.
                                    --
                                    A truly wise man never plays leapfrog with a unicorn.

                                    --~--~---------~--~----~------------~-------~--~----~
                                    You received this message from the "vim_use" maillist.
                                    For more information, visit http://www.vim.org/maillist.php
                                    -~----------~----~----~----~------~----~------~--~---
                                  • Tony Mechelynck
                                    ... Actually Vim uses a different method (a table of ranges, I think) for Unicode codepoints which require two or more UTF-8 bytes, since we ve established
                                    Message 17 of 17 , Jan 4, 2009
                                      On 04/01/09 08:10, bill lam wrote:
                                      > On Sun, 04 Jan 2009, pansz wrote:
                                      >> Interesting, I see the wide punctuation characters are recognized, so
                                      >> vim is using wide character internally, and omitting some particular
                                      >> wide-character from 'iskeyword' shouldn't be hard.
                                      >>
                                      >> Then why the 'iskeyword' supports only characters from 0-255?
                                      >
                                      > Just wild guess since I've never looked into vim's source code. I
                                      > think that iskeyword or spellcheck for that matter use FSM to
                                      > implement the parser. It's ok to have a table of 256 characters but
                                      > not so easy to work with a table of millions of unicode characters.
                                      > A quick and dirty workaround is to coerce all non 8-bit characters as
                                      > white space.
                                      >

                                      Actually Vim uses a different method (a table of ranges, I think) for
                                      Unicode codepoints which require two or more UTF-8 bytes, since we've
                                      established that fullwith comma and fullwidth fullstop are (properly)
                                      recognized as breaking "word" selection, and that "ordinary" hanzi aren't.

                                      Best regards,
                                      Tony.
                                      --
                                      Hippogriff, n.:
                                      An animal (now extinct) which was half horse and half griffin.
                                      The griffin was itself a compound creature, half lion and half eagle.
                                      The hippogriff was actually, therefore, only one quarter eagle, which
                                      is two dollars and fifty cents in gold. The study of zoology is full
                                      of surprises.
                                      -- Ambrose Bierce, "The Devil's Dictionary"

                                      --~--~---------~--~----~------------~-------~--~----~
                                      You received this message from the "vim_use" maillist.
                                      For more information, visit http://www.vim.org/maillist.php
                                      -~----------~----~----~----~------~----~------~--~---
                                    Your message has been successfully submitted and would be delivered to recipients shortly.