Loading ...
Sorry, an error occurred while loading the content.

Re: possible to make iskeyword supports multibyte charactor?

Expand Messages
  • Matt Wozniski
    AFAICS, there s no way to make multibyte characters non-keyword characters. :help isk says See isfname for a description of the format of this option.
    Message 1 of 17 , Jan 2, 2009
    • 0 Attachment
      AFAICS, there's no way to make multibyte characters non-keyword
      characters. :help 'isk' says "See 'isfname' for a description of the
      format of this option." and :help 'isfname' says "Multi-byte
      characters 256 and above are always included" - doesn't seem
      changable.

      ~Matt

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_use" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Tony Mechelynck
      ... If you don t get a reply on this ML, the meaning usually is not that nobody saw the question, but rather that nobody knows the answer. Search the help
      Message 2 of 17 , Jan 2, 2009
      • 0 Attachment
        On 02/01/09 11:30, anhnmncb wrote:
        > Ping!

        If you don't get a reply on this ML, the meaning usually is not that
        nobody saw the question, but rather that nobody knows the answer. Search
        the help first, then try to make your question clearer if the help
        doesn't give you an answer (in this case it does, see below).

        >
        > On 2008-12-31, anhnmncb wrote:
        >> On 2008-12-31, anhnmncb wrote:
        >>> Hi, list,
        >>>
        >>> when I type Chinese text in vim, I find it's unconvenient for completing
        >>> Chinese word with C-p/n, because a Chinese word is not seperated by space but
        >>> some charactors like "and", "or" and others(I use English to reprent a Chinese
        >>> charactor), so a Chinese sententce will like this:
        >>>
        >>> ThisIsAChineseWordInSentence.(This is a Chinese word in sentence.)
        >>>
        >>> When I have typed "ThisIsAChineseWordIn", now if I want to type Sen<C-p> then
        >>> vim can't complete word "Sentence" for me. So I think if iskeyword supports
        >>> adding Chinese charactor to itself, for example(My client doesn't support
        >>> Chinese, so I use "and" to represent a Chinese charactor):
        >>>
        >>> set iskeyword+="and"
        >> I meant set iskeyword-="and".
        >>> then autocompletion will be without problem with Chinese. I don't know if it
        >>> is easy to handle?
        >> Also, it will let me can navigate quicker in a long Chinese sentence, now I
        >> have to use /? or fFtT or some hjkls then input a Chinese charactor(sometimes
        >> To input a Chinese charactor needs to type at least 3 english charactor).
        >>
        >>
        >
        >

        For the meaning of its settings, ":help 'iskeyword'" resends to ":help
        'isfname'" where it is said:

        > Multi-byte characters 256 and above are always included, only the
        > characters up to 255 are specified with this option.
        > For UTF-8 the characters 0xa0 to 0xff are included as well.

        IOW it is not possible to treat some hanzi as 'iskeyword' characters and
        others not. I think the above means that even the "ideographic
        full-width space" U+3000 is treated as a keyword character, OTOH I
        wouldn't affirm this without an experiment (maybe Vim with +multi_byte
        knows about the main divisions of the Unicode codepoint range).

        Since I found no satisfactory way to use the IM (which _is_ installed on
        my system), I need at least 6 keystrokes to input any hanzi: for
        instance, for the simplest of them all, the digit one, 一 yi1 U+4E00, I
        need (after getting into Insert mode) to press Ctrl-V u 4 e 0 0 -- or
        else, I can use copy-paste if I can find it ready-made in some document.


        Best regards,
        Tony.
        --
        Paradise is exactly like where you are right now ... only much, much
        better.
        -- Laurie Anderson

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_use" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • StarWing
        okay£¬i think Vim can developed to support word-table when ^n^p and fFtT, e.g: chinese.vim this is a chinese word-table file: beginwordtable chinese
        Message 3 of 17 , Jan 2, 2009
        • 0 Attachment
          okay,i think Vim can developed to support word-table when ^n^p and
          fFtT, e.g:
          chinese.vim " this is a chinese word-table file:
          beginwordtable chinese <<EOF "this is a command
          this leisi
          an an
          apple aipo
          and ande
          other ade
          is yizi
          or wuwo
          EOF

          " first col is word of chinese, second col is the word in pinyin
          (sound) or input-method.

          and we can set a option named "wordtable": set wordtable=chinese

          so, if i input: thisisanappleandother, Vim will know "this" is a word,
          and so as "is", "an", "apple", etc.
          and, if i want find an, i press "fa" or "fan", the cursor will goto
          "an", or "and", etc, i think it's no easy to make f operator to
          support multi-input(Vim don't know how many word will be input, so,
          vim can't transaction at once).

          can bram or anyone could implement it? and this function can conbime
          with the spell-check function of Vim.

          On 1月2日, 下午9时39分, Tony Mechelynck <antoine.mechely...@...>
          wrote:
          > On 02/01/09 11:30, anhnmncb wrote:
          >
          > > Ping!
          >
          > If you don't get a reply on this ML, the meaning usually is not that
          > nobody saw the question, but rather that nobody knows the answer. Search
          > the help first, then try to make your question clearer if the help
          > doesn't give you an answer (in this case it does, see below).
          >
          >
          >
          >
          >
          > > On 2008-12-31, anhnmncb wrote:
          > >> On 2008-12-31, anhnmncb wrote:
          > >>> Hi, list,
          >
          > >>> when I type Chinese text in vim, I find it's unconvenient for completing
          > >>> Chinese word with C-p/n, because a Chinese word is not seperated by space but
          > >>> some charactors like "and", "or" and others(I use English to reprent a Chinese
          > >>> charactor), so a Chinese sententce will like this:
          >
          > >>> ThisIsAChineseWordInSentence.(This is a Chinese word in sentence.)
          >
          > >>> When I have typed "ThisIsAChineseWordIn", now if I want to type Sen<C-p> then
          > >>> vim can't complete word "Sentence" for me. So I think if iskeyword supports
          > >>> adding Chinese charactor to itself, for example(My client doesn't support
          > >>> Chinese, so I use "and" to represent a Chinese charactor):
          >
          > >>> set iskeyword+="and"
          > >> I meant set iskeyword-="and".
          > >>> then autocompletion will be without problem with Chinese. I don't know if it
          > >>> is easy to handle?
          > >> Also, it will let me can navigate quicker in a long Chinese sentence, now I
          > >> have to use /? or fFtT or some hjkls then input a Chinese charactor(sometimes
          > >> To input a Chinese charactor needs to type at least 3 english charactor).
          >
          > For the meaning of its settings, ":help 'iskeyword'" resends to ":help
          > 'isfname'" where it is said:
          >
          > > Multi-byte characters 256 and above are always included, only the
          > > characters up to 255 are specified with this option.
          > > For UTF-8 the characters 0xa0 to 0xff are included as well.
          >
          > IOW it is not possible to treat some hanzi as 'iskeyword' characters and
          > others not. I think the above means that even the "ideographic
          > full-width space" U+3000 is treated as a keyword character, OTOH I
          > wouldn't affirm this without an experiment (maybe Vim with +multi_byte
          > knows about the main divisions of the Unicode codepoint range).
          >
          > Since I found no satisfactory way to use the IM (which _is_ installed on
          > my system), I need at least 6 keystrokes to input any hanzi: for
          > instance, for the simplest of them all, the digit one, 一 yi1 U+4E00, I
          > need (after getting into Insert mode) to press Ctrl-V u 4 e 0 0 -- or
          > else, I can use copy-paste if I can find it ready-made in some document.
          >
          > Best regards,
          > Tony.
          > --
          > Paradise is exactly like where you are right now ... only much, much
          > better.
          > -- Laurie Anderson
          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_use" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • StarWing
          or anyone can make developer know this? ... --~--~---------~--~----~------------~-------~--~----~ You received this message from the vim_use maillist. For
          Message 4 of 17 , Jan 2, 2009
          • 0 Attachment
            or anyone can make developer know this?

            On 1月2日, 下午10时29分, StarWing <weasley...@...> wrote:
            > okay,i think Vim can developed to support word-table when ^n^p and
            > fFtT, e.g:
            > chinese.vim " this is a chinese word-table file:
            > beginwordtable chinese <<EOF "this is a command
            > this leisi
            > an an
            > apple aipo
            > and ande
            > other ade
            > is yizi
            > or wuwo
            > EOF
            >
            > " first col is word of chinese, second col is the word in pinyin
            > (sound) or input-method.
            >
            > and we can set a option named "wordtable": set wordtable=chinese
            >
            > so, if i input: thisisanappleandother, Vim will know "this" is a word,
            > and so as "is", "an", "apple", etc.
            > and, if i want find an, i press "fa" or "fan", the cursor will goto
            > "an", or "and", etc, i think it's no easy to make f operator to
            > support multi-input(Vim don't know how many word will be input, so,
            > vim can't transaction at once).
            >
            > can bram or anyone could implement it? and this function can conbime
            > with the spell-check function of Vim.
            >
            > On 1月2日, 下午9时39分, Tony Mechelynck <antoine.mechely...@...>
            > wrote:
            >
            > > On 02/01/09 11:30, anhnmncb wrote:
            >
            > > > Ping!
            >
            > > If you don't get a reply on this ML, the meaning usually is not that
            > > nobody saw the question, but rather that nobody knows the answer. Search
            > > the help first, then try to make your question clearer if the help
            > > doesn't give you an answer (in this case it does, see below).
            >
            > > > On 2008-12-31, anhnmncb wrote:
            > > >> On 2008-12-31, anhnmncb wrote:
            > > >>> Hi, list,
            >
            > > >>> when I type Chinese text in vim, I find it's unconvenient for completing
            > > >>> Chinese word with C-p/n, because a Chinese word is not seperated by space but
            > > >>> some charactors like "and", "or" and others(I use English to reprent a Chinese
            > > >>> charactor), so a Chinese sententce will like this:
            >
            > > >>> ThisIsAChineseWordInSentence.(This is a Chinese word in sentence.)
            >
            > > >>> When I have typed "ThisIsAChineseWordIn", now if I want to type Sen<C-p> then
            > > >>> vim can't complete word "Sentence" for me. So I think if iskeyword supports
            > > >>> adding Chinese charactor to itself, for example(My client doesn't support
            > > >>> Chinese, so I use "and" to represent a Chinese charactor):
            >
            > > >>> set iskeyword+="and"
            > > >> I meant set iskeyword-="and".
            > > >>> then autocompletion will be without problem with Chinese. I don't know if it
            > > >>> is easy to handle?
            > > >> Also, it will let me can navigate quicker in a long Chinese sentence, now I
            > > >> have to use /? or fFtT or some hjkls then input a Chinese charactor(sometimes
            > > >> To input a Chinese charactor needs to type at least 3 english charactor).
            >
            > > For the meaning of its settings, ":help 'iskeyword'" resends to ":help
            > > 'isfname'" where it is said:
            >
            > > > Multi-byte characters 256 and above are always included, only the
            > > > characters up to 255 are specified with this option.
            > > > For UTF-8 the characters 0xa0 to 0xff are included as well.
            >
            > > IOW it is not possible to treat some hanzi as 'iskeyword' characters and
            > > others not. I think the above means that even the "ideographic
            > > full-width space" U+3000 is treated as a keyword character, OTOH I
            > > wouldn't affirm this without an experiment (maybe Vim with +multi_byte
            > > knows about the main divisions of the Unicode codepoint range).
            >
            > > Since I found no satisfactory way to use the IM (which _is_ installed on
            > > my system), I need at least 6 keystrokes to input any hanzi: for
            > > instance, for the simplest of them all, the digit one, 一 yi1 U+4E00, I
            > > need (after getting into Insert mode) to press Ctrl-V u 4 e 0 0 -- or
            > > else, I can use copy-paste if I can find it ready-made in some document.
            >
            > > Best regards,
            > > Tony.
            > > --
            > > Paradise is exactly like where you are right now ... only much, much
            > > better.
            > > -- Laurie Anderson
            >
            >
            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_use" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • Tony Mechelynck
            ... [...] AFAIK, most Vim developers read not only the vim_dev group but also the vim_use group. Best regards, Tony. -- The trouble with doing something right
            Message 5 of 17 , Jan 2, 2009
            • 0 Attachment
              On 02/01/09 15:32, StarWing wrote:
              > or anyone can make developer know this?
              [...]

              AFAIK, most Vim developers read not only the vim_dev group but also the
              vim_use group.

              Best regards,
              Tony.
              --
              The trouble with doing something right the first time is that nobody
              appreciates how difficult it was.

              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_use" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • pansz
              ... This seems to hint vim is not using the standard iswalpha(), iswpunct() series widechar-type-check functions in . As far as I know the
              Message 6 of 17 , Jan 3, 2009
              • 0 Attachment
                Tony Mechelynck 写道:
                > For the meaning of its settings, ":help 'iskeyword'" resends to ":help
                > 'isfname'" where it is said:
                >
                >> Multi-byte characters 256 and above are always included, only the
                >> characters up to 255 are specified with this option.
                >> For UTF-8 the characters 0xa0 to 0xff are included as well.
                >
                > IOW it is not possible to treat some hanzi as 'iskeyword' characters and
                > others not. I think the above means that even the "ideographic
                > full-width space" U+3000 is treated as a keyword character, OTOH I
                > wouldn't affirm this without an experiment (maybe Vim with +multi_byte
                > knows about the main divisions of the Unicode codepoint range).

                This seems to hint vim is not using the standard iswalpha(), iswpunct()
                series widechar-type-check functions in <wctypes.h>.

                As far as I know the iswalpha() returns true only on true hanzi
                characters and will not return true on characters such as "ideographic
                full-width space".

                I guess this is a choice for efficiency if vim uses utf-8 internally,
                since utf-8 must be converted to ucs in order to use wctypes.

                If that is the case, making iskeyword supports multibyte character isn't
                hard (I had done similar things for Lua script language), but will
                sacrifice performance.


                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_use" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---
              • Tony Mechelynck
                ... If you want to be sure, try some Chinese text with both hanzi and wide-punctuation and see where the yiw (yank inner word) or viw (visual inner word)
                Message 7 of 17 , Jan 3, 2009
                • 0 Attachment
                  On 04/01/09 04:07, pansz wrote:
                  > Tony Mechelynck 写道:
                  >> For the meaning of its settings, ":help 'iskeyword'" resends to ":help
                  >> 'isfname'" where it is said:
                  >>
                  >>> Multi-byte characters 256 and above are always included, only the
                  >>> characters up to 255 are specified with this option.
                  >>> For UTF-8 the characters 0xa0 to 0xff are included as well.
                  >> IOW it is not possible to treat some hanzi as 'iskeyword' characters and
                  >> others not. I think the above means that even the "ideographic
                  >> full-width space" U+3000 is treated as a keyword character, OTOH I
                  >> wouldn't affirm this without an experiment (maybe Vim with +multi_byte
                  >> knows about the main divisions of the Unicode codepoint range).
                  >
                  > This seems to hint vim is not using the standard iswalpha(), iswpunct()
                  > series widechar-type-check functions in<wctypes.h>.
                  >
                  > As far as I know the iswalpha() returns true only on true hanzi
                  > characters and will not return true on characters such as "ideographic
                  > full-width space".
                  >
                  > I guess this is a choice for efficiency if vim uses utf-8 internally,
                  > since utf-8 must be converted to ucs in order to use wctypes.
                  >
                  > If that is the case, making iskeyword supports multibyte character isn't
                  > hard (I had done similar things for Lua script language), but will
                  > sacrifice performance.

                  If you want to be sure, try some Chinese text with both hanzi and
                  wide-punctuation and see where the yiw (yank inner word) or viw (visual
                  inner word) stops. Here's a sample for you: 道可道、非常道。名可名、非常
                  名。 ;-)

                  In my Huge gvim 7.2.077 with +multi_byte, viw includes neither
                  ideographic comma nor ideographic full stop; but AFAIK there's no way to
                  tell vim that 不 "not", 故 "thus", 之 "'s" etc. are non-keyword
                  characters, since for multibyte characters this kind of status is hardcoded.


                  Best regards,
                  Tony.
                  --
                  TV is chewing gum for the eyes.
                  -- Frank Lloyd Wright

                  --~--~---------~--~----~------------~-------~--~----~
                  You received this message from the "vim_use" maillist.
                  For more information, visit http://www.vim.org/maillist.php
                  -~----------~----~----~----~------~----~------~--~---
                • Sean
                  ... It only takes three keystrokes (yi ) to type your example, 一, using my newly-created IME at
                  Message 8 of 17 , Jan 3, 2009
                  • 0 Attachment
                    > Since I found no satisfactory way to use the IM (which _is_
                    > installed on my system), I need at least 6 keystrokes to input any
                    > hanzi: for instance, for the simplest of them all, the digit one,
                    > 一 yi1 U+4E00, I need (after getting into Insert mode) to press
                    > Ctrl-V u 4 e 0 0 -- or else, I can use copy-paste if I can find it
                    > ready-made in some document.

                    It only takes three keystrokes (yi<C-I>) to type your example, 一,
                    using my newly-created IME at
                    http://vim.sourceforge.net/scripts/script.php?script_id=2506

                    Welcome to vim built-in IME :))

                    Sean


                    On Jan 2, 5:39 am, Tony Mechelynck <antoine.mechely...@...>
                    wrote:
                    > On 02/01/09 11:30, anhnmncb wrote:
                    >
                    > > Ping!
                    >
                    > If you don't get a reply on this ML, the meaning usually is not that
                    > nobody saw the question, but rather that nobody knows the answer. Search
                    > the help first, then try to make your question clearer if the help
                    > doesn't give you an answer (in this case it does, see below).
                    >
                    >
                    >
                    >
                    >
                    > > On 2008-12-31, anhnmncb wrote:
                    > >> On 2008-12-31, anhnmncb wrote:
                    > >>> Hi, list,
                    >
                    > >>> when I type Chinese text in vim, I find it's unconvenient for completing
                    > >>> Chinese word with C-p/n, because a Chinese word is not seperated by space but
                    > >>> some charactors like "and", "or" and others(I use English to reprent a Chinese
                    > >>> charactor), so a Chinese sententce will like this:
                    >
                    > >>> ThisIsAChineseWordInSentence.(This is a Chinese word in sentence.)
                    >
                    > >>> When I have typed "ThisIsAChineseWordIn", now if I want to type Sen<C-p> then
                    > >>> vim can't complete word "Sentence" for me. So I think if iskeyword supports
                    > >>> adding Chinese charactor to itself, for example(My client doesn't support
                    > >>> Chinese, so I use "and" to represent a Chinese charactor):
                    >
                    > >>> set iskeyword+="and"
                    > >> I meant set iskeyword-="and".
                    > >>> then autocompletion will be without problem with Chinese. I don't know if it
                    > >>> is easy to handle?
                    > >> Also, it will let me can navigate quicker in a long Chinese sentence, now I
                    > >> have to use /? or fFtT or some hjkls then input a Chinese charactor(sometimes
                    > >> To input a Chinese charactor needs to type at least 3 english charactor).
                    >
                    > For the meaning of its settings, ":help 'iskeyword'" resends to ":help
                    > 'isfname'" where it is said:
                    >
                    > > Multi-byte characters 256 and above are always included, only the
                    > > characters up to 255 are specified with this option.
                    > > For UTF-8 the characters 0xa0 to 0xff are included as well.
                    >
                    > IOW it is not possible to treat some hanzi as 'iskeyword' characters and
                    > others not. I think the above means that even the "ideographic
                    > full-width space" U+3000 is treated as a keyword character, OTOH I
                    > wouldn't affirm this without an experiment (maybe Vim with +multi_byte
                    > knows about the main divisions of the Unicode codepoint range).
                    >
                    > Since I found no satisfactory way to use the IM (which _is_ installed on
                    > my system), I need at least 6 keystrokes to input any hanzi: for
                    > instance, for the simplest of them all, the digit one, 一 yi1 U+4E00, I
                    > need (after getting into Insert mode) to press Ctrl-V u 4 e 0 0 -- or
                    > else, I can use copy-paste if I can find it ready-made in some document.
                    >
                    > Best regards,
                    > Tony.
                    > --
                    > Paradise is exactly like where you are right now ... only much, much
                    > better.
                    > -- Laurie Anderson
                    --~--~---------~--~----~------------~-------~--~----~
                    You received this message from the "vim_use" maillist.
                    For more information, visit http://www.vim.org/maillist.php
                    -~----------~----~----~----~------~----~------~--~---
                  • pansz
                    ... Interesting, I see the wide punctuation characters are recognized, so vim is using wide character internally, and omitting some particular wide-character
                    Message 9 of 17 , Jan 3, 2009
                    • 0 Attachment
                      Tony Mechelynck 写道:
                      > If you want to be sure, try some Chinese text with both hanzi and
                      > wide-punctuation and see where the yiw (yank inner word) or viw (visual
                      > inner word) stops. Here's a sample for you: 道可道、非常道。名可名、非常
                      > 名。 ;-)

                      Interesting, I see the wide punctuation characters are recognized, so
                      vim is using wide character internally, and omitting some particular
                      wide-character from 'iskeyword' shouldn't be hard.

                      Then why the 'iskeyword' supports only characters from 0-255?

                      --~--~---------~--~----~------------~-------~--~----~
                      You received this message from the "vim_use" maillist.
                      For more information, visit http://www.vim.org/maillist.php
                      -~----------~----~----~----~------~----~------~--~---
                    • Tony Mechelynck
                      ... I m not sure. I suppose that option was defined before Unicode became well-known, maybe even before it existed, when most charsets were of the 8-bit kind
                      Message 10 of 17 , Jan 3, 2009
                      • 0 Attachment
                        On 04/01/09 06:30, pansz wrote:
                        > Tony Mechelynck 写道:
                        >> If you want to be sure, try some Chinese text with both hanzi and
                        >> wide-punctuation and see where the yiw (yank inner word) or viw (visual
                        >> inner word) stops. Here's a sample for you: 道可道、非常道。名可名、非常
                        >> 名。 ;-)
                        >
                        > Interesting, I see the wide punctuation characters are recognized, so
                        > vim is using wide character internally, and omitting some particular
                        > wide-character from 'iskeyword' shouldn't be hard.
                        >
                        > Then why the 'iskeyword' supports only characters from 0-255?

                        I'm not sure. I suppose that option was defined before Unicode became
                        well-known, maybe even before it existed, when most charsets were of the
                        8-bit kind except for East-Asian scripts, which required "special" MBCS
                        versions of the OSes anyway (such as MS-DOS 2.25).

                        Once the Unicode standard was published, it included not only mappings
                        of codepoints to glyphs but also quite a lot of metadata about these
                        codepoints (such as wide vs. narrow vs. ambiguous, LTR vs. RTL vs.
                        ambiguous, lower/ upper/ titlecase, punctuation, number systems, etc.).
                        However, Vim versions with -multi_byte must still be supported, and they
                        don't have access to that wealth of meta-information. Also, IIUC it's in
                        the ASCII range that there is most variation between programming
                        languages, operating systems, human languages, etc. concerning which
                        characters may be used in which circumstances.


                        Best regards,
                        Tony.
                        --
                        If there are epigrams, there must be meta-epigrams.

                        --~--~---------~--~----~------------~-------~--~----~
                        You received this message from the "vim_use" maillist.
                        For more information, visit http://www.vim.org/maillist.php
                        -~----------~----~----~----~------~----~------~--~---
                      • pansz
                        ... Human languages of CJK are not in the ASCII range at all and I bet CJK have more than 30% of the world population. Vim is for programmers, is it _only_ for
                        Message 11 of 17 , Jan 3, 2009
                        • 0 Attachment
                          Tony Mechelynck 写道:
                          > I'm not sure. I suppose that option was defined before Unicode became
                          > well-known, maybe even before it existed, when most charsets were of the
                          > 8-bit kind except for East-Asian scripts, which required "special" MBCS
                          > versions of the OSes anyway (such as MS-DOS 2.25).
                          >
                          > Once the Unicode standard was published, it included not only mappings
                          > of codepoints to glyphs but also quite a lot of metadata about these
                          > codepoints (such as wide vs. narrow vs. ambiguous, LTR vs. RTL vs.
                          > ambiguous, lower/ upper/ titlecase, punctuation, number systems, etc.).
                          > However, Vim versions with -multi_byte must still be supported, and they
                          > don't have access to that wealth of meta-information. Also, IIUC it's in
                          > the ASCII range that there is most variation between programming
                          > languages, operating systems, human languages, etc. concerning which
                          > characters may be used in which circumstances.

                          Human languages of CJK are not in the ASCII range at all and I bet CJK
                          have more than 30% of the world population. Vim is for programmers, is
                          it _only_ for programmers?

                          The difficulties may be that 'iskeyword' is a whitelist, not a
                          blacklist, we cannot easily blacklist a single Unicode character in
                          'iskeyword' without knowing *all* the Unicode characters which matches
                          iswalpha().

                          Perhaps the simplest approach is to add an option 'isnkeyword' which
                          supports any Unicode character and we can blacklist some Unicode
                          characters while still retain the 'iskeyword' option functioning.



                          --~--~---------~--~----~------------~-------~--~----~
                          You received this message from the "vim_use" maillist.
                          For more information, visit http://www.vim.org/maillist.php
                          -~----------~----~----~----~------~----~------~--~---
                        • bill lam
                          On Sun, 04 Jan 2009, pansz wrote: Interesting, I see the wide punctuation characters are recognized, so vim is using wide character internally, and
                          Message 12 of 17 , Jan 3, 2009
                          • 0 Attachment
                            On Sun, 04 Jan 2009, pansz wrote:
                            > Interesting, I see the wide punctuation characters are recognized, so
                            > vim is using wide character internally, and omitting some particular
                            > wide-character from 'iskeyword' shouldn't be hard.
                            >
                            > Then why the 'iskeyword' supports only characters from 0-255?

                            Just wild guess since I've never looked into vim's source code. I
                            think that iskeyword or spellcheck for that matter use FSM to
                            implement the parser. It's ok to have a table of 256 characters but
                            not so easy to work with a table of millions of unicode characters.
                            A quick and dirty workaround is to coerce all non 8-bit characters as
                            white space.

                            --
                            regards,
                            ====================================================
                            GPG key 1024D/4434BAB3 2008-08-24
                            gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
                            唐詩202 盧綸 晚次鄂州
                            雲開遠見漢陽城 猶是孤帆一日程 估客晝眠知浪靜 舟人夜語覺潮生
                            三湘愁鬢逢秋色 萬里歸心對月明 舊業已隨征戰盡 更堪江上鼓鼙聲

                            --~--~---------~--~----~------------~-------~--~----~
                            You received this message from the "vim_use" maillist.
                            For more information, visit http://www.vim.org/maillist.php
                            -~----------~----~----~----~------~----~------~--~---
                          • Tony Mechelynck
                            ... No, but each hanzi (not fullwidth punct) is supposed to be a word or word part of some kind, with punctuation, whitespace and diacritics all totally
                            Message 13 of 17 , Jan 4, 2009
                            • 0 Attachment
                              On 04/01/09 07:53, pansz wrote:
                              > Tony Mechelynck 写道:
                              >> I'm not sure. I suppose that option was defined before Unicode became
                              >> well-known, maybe even before it existed, when most charsets were of the
                              >> 8-bit kind except for East-Asian scripts, which required "special" MBCS
                              >> versions of the OSes anyway (such as MS-DOS 2.25).
                              >>
                              >> Once the Unicode standard was published, it included not only mappings
                              >> of codepoints to glyphs but also quite a lot of metadata about these
                              >> codepoints (such as wide vs. narrow vs. ambiguous, LTR vs. RTL vs.
                              >> ambiguous, lower/ upper/ titlecase, punctuation, number systems, etc.).
                              >> However, Vim versions with -multi_byte must still be supported, and they
                              >> don't have access to that wealth of meta-information. Also, IIUC it's in
                              >> the ASCII range that there is most variation between programming
                              >> languages, operating systems, human languages, etc. concerning which
                              >> characters may be used in which circumstances.
                              >
                              > Human languages of CJK are not in the ASCII range at all and I bet CJK
                              > have more than 30% of the world population. Vim is for programmers, is
                              > it _only_ for programmers?

                              No, but each hanzi (not fullwidth punct) is supposed to be a "word" or
                              "word part" of some kind, with punctuation, whitespace and diacritics
                              all totally outside the "word" range. "Not" is a word in English,
                              regardless of whether it's used alone or in "cannot" or
                              "notwithstanding". These two uses sound almost Chinese-like to me... who
                              don't really know more than a handful of Chinese words. I suppose that
                              if English, like Japanese, used Han-script, "notwithstanding" might be
                              written not-against-stay-now with four glyphs? But I'm daydreaming.

                              >
                              > The difficulties may be that 'iskeyword' is a whitelist, not a
                              > blacklist, we cannot easily blacklist a single Unicode character in
                              > 'iskeyword' without knowing *all* the Unicode characters which matches
                              > iswalpha().

                              A more important difficulty is that 'iskeyword' applies only to Unicode
                              codepoints U+0000 to U+007F when 'encoding' is UTF-8 (or any Unicode
                              value aliased to UTF-8 for internal memory), and to characters 0x00 to
                              0xFF when it isn't. Otherwise we might perhaps use ":setlocal isk-=不
                              isk-=之" or some such. This would also mean several arrays of 2 gigabits
                              rather than 256 bits to remember the settings (Vim treats the Unicode
                              range as 0 to 7FFFFFFF. Even if it limited itself to the current
                              official maximum of 10FFFD it would still mean a big increase.)

                              >
                              > Perhaps the simplest approach is to add an option 'isnkeyword' which
                              > supports any Unicode character and we can blacklist some Unicode
                              > characters while still retain the 'iskeyword' option functioning.

                              Hm. Don't know if Bram would accept that, but you can always try to
                              publish (and maintain) an unofficial patch to the C source. Don't know
                              how easy (and foolproof) it would be. For a single option, a has()
                              feature might be useful but it's less needed than for a whole batch of
                              them: we would always be able to test ":if exists('+isnkeyword')".


                              Best regards,
                              Tony.
                              --
                              A truly wise man never plays leapfrog with a unicorn.

                              --~--~---------~--~----~------------~-------~--~----~
                              You received this message from the "vim_use" maillist.
                              For more information, visit http://www.vim.org/maillist.php
                              -~----------~----~----~----~------~----~------~--~---
                            • Tony Mechelynck
                              ... Actually Vim uses a different method (a table of ranges, I think) for Unicode codepoints which require two or more UTF-8 bytes, since we ve established
                              Message 14 of 17 , Jan 4, 2009
                              • 0 Attachment
                                On 04/01/09 08:10, bill lam wrote:
                                > On Sun, 04 Jan 2009, pansz wrote:
                                >> Interesting, I see the wide punctuation characters are recognized, so
                                >> vim is using wide character internally, and omitting some particular
                                >> wide-character from 'iskeyword' shouldn't be hard.
                                >>
                                >> Then why the 'iskeyword' supports only characters from 0-255?
                                >
                                > Just wild guess since I've never looked into vim's source code. I
                                > think that iskeyword or spellcheck for that matter use FSM to
                                > implement the parser. It's ok to have a table of 256 characters but
                                > not so easy to work with a table of millions of unicode characters.
                                > A quick and dirty workaround is to coerce all non 8-bit characters as
                                > white space.
                                >

                                Actually Vim uses a different method (a table of ranges, I think) for
                                Unicode codepoints which require two or more UTF-8 bytes, since we've
                                established that fullwith comma and fullwidth fullstop are (properly)
                                recognized as breaking "word" selection, and that "ordinary" hanzi aren't.

                                Best regards,
                                Tony.
                                --
                                Hippogriff, n.:
                                An animal (now extinct) which was half horse and half griffin.
                                The griffin was itself a compound creature, half lion and half eagle.
                                The hippogriff was actually, therefore, only one quarter eagle, which
                                is two dollars and fifty cents in gold. The study of zoology is full
                                of surprises.
                                -- Ambrose Bierce, "The Devil's Dictionary"

                                --~--~---------~--~----~------------~-------~--~----~
                                You received this message from the "vim_use" maillist.
                                For more information, visit http://www.vim.org/maillist.php
                                -~----------~----~----~----~------~----~------~--~---
                              Your message has been successfully submitted and would be delivered to recipients shortly.