Loading ...
Sorry, an error occurred while loading the content.

Re: Is vim really fully unicoded?

Expand Messages
  • Matt Wozniski
    On Tue, Jan 6, 2009 at 6:10 PM, Tony Mechelynck wrote: On 06/01/09 12:31, anhnmncb wrote: Hi, list, as title, if so, why can t many functions still
    Message 1 of 13 , Jan 6, 2009
    • 0 Attachment
      On Tue, Jan 6, 2009 at 6:10 PM, Tony Mechelynck wrote:
      >
      > On 06/01/09 12:31, anhnmncb wrote:
      >> Hi, list, as title, if so, why can't many functions
      >> still handle correctly with unicode? For example the func:
      >>
      >> getline('.')[col('.')-1]
      >>
      >> Can't return a charactor outside the range of ascii.
      >>
      >
      > because string[index] returns a byte value, not a character value: see
      > ":help expr8".

      *Nod*

      > If the character at the cursor is > U+007F, you'll get
      > the first byte (in the range 0xC0-0xFD, or in practice in the range
      > 0xC0-0xF4) of its UTF-8 representation.

      No, you could get some byte of some entirely different character. Ie,
      on a line with two 2-byte characters, getline('.')[col('.')-1] on the
      second character would return the 2nd byte of the first character.

      > The _character_ at the cursor is obtained as follows:
      > let i0 = byteidx(getline('.'), virtcol('.') - 1)
      > let i1 = byteidx(getline('.'), virtcol('.'))
      > let character = strpart(getline('.'), i0, i1 - 10)

      Using virtcol() there seems broken... what if you're in the middle of
      a tab, for example, with virtualedit=all?

      :echo join(split("áéíóú", '\zs')[1:3], '')

      is how I would do it... but, is there any real reason why indexing
      into a string *should* be byte oriented instead of character oriented,
      apart from backwards compatibility? It seems drastically less easy to
      use the thing that more people want to use more of the time; and in
      fact some of the snippets in the vim help (like the example given at
      :help expr-8) won't work on multibyte lines given the way that string
      indexing works now. It seems like a place where the cost of losing
      backwards compatibility might be outweighed by the cost of keeping
      things the way they are...

      ~Matt

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_dev" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Tony Mechelynck
      ... col() gives a one-based byte ordinal. [] takes a zero-based argument. I stand by what I said. ... OK, I didn t think of virtual editing, nor even, it
      Message 2 of 13 , Jan 6, 2009
      • 0 Attachment
        On 07/01/09 00:39, Matt Wozniski wrote:
        > On Tue, Jan 6, 2009 at 6:10 PM, Tony Mechelynck wrote:
        >> On 06/01/09 12:31, anhnmncb wrote:
        >>> Hi, list, as title, if so, why can't many functions
        >>> still handle correctly with unicode? For example the func:
        >>>
        >>> getline('.')[col('.')-1]
        >>>
        >>> Can't return a charactor outside the range of ascii.
        >>>
        >> because string[index] returns a byte value, not a character value: see
        >> ":help expr8".
        >
        > *Nod*
        >
        >> If the character at the cursor is> U+007F, you'll get
        >> the first byte (in the range 0xC0-0xFD, or in practice in the range
        >> 0xC0-0xF4) of its UTF-8 representation.
        >
        > No, you could get some byte of some entirely different character. Ie,
        > on a line with two 2-byte characters, getline('.')[col('.')-1] on the
        > second character would return the 2nd byte of the first character.

        col() gives a one-based byte ordinal. [] takes a zero-based argument. I
        stand by what I said.

        >
        >> The _character_ at the cursor is obtained as follows:
        >> let i0 = byteidx(getline('.'), virtcol('.') - 1)
        >> let i1 = byteidx(getline('.'), virtcol('.'))
        >> let character = strpart(getline('.'), i0, i1 - 10)
        >
        > Using virtcol() there seems broken... what if you're in the middle of
        > a tab, for example, with virtualedit=all?
        >
        > :echo join(split("áéíóú", '\zs')[1:3], '')

        OK, I didn't think of virtual editing, nor even, it seems, of
        multi-column characters such as tabs and fullwidth CJK. However, [1:3]
        wouldn't work because the idea is that we're in a script, we don't know
        that we're in the 1st, 2nd or 3rd column, just that we want "whatever is
        at the cursor". I might do it with

        function CursorChar()
        normal yl
        return @@
        endfunction

        >
        > is how I would do it... but, is there any real reason why indexing
        > into a string *should* be byte oriented instead of character oriented,
        > apart from backwards compatibility? It seems drastically less easy to
        > use the thing that more people want to use more of the time; and in
        > fact some of the snippets in the vim help (like the example given at
        > :help expr-8) won't work on multibyte lines given the way that string
        > indexing works now. It seems like a place where the cost of losing
        > backwards compatibility might be outweighed by the cost of keeping
        > things the way they are...
        >
        > ~Matt

        Changing an existing construct from byte-oriented to
        multibyte-character-oriented would probably break a lot of existing
        scripts. I don't believe Bram would ever accept that.

        Best regards,
        Tony.
        --
        "A programmer is a person who passes as an exacting expert on the basis
        of being able to turn out, after innumerable punching, an infinite
        series of incomprehensive answers calculated with micrometric
        precisions from vague assumptions based on debatable figures taken from
        inconclusive documents and carried out on instruments of problematical
        accuracy by persons of dubious reliability and questionable mentality
        for the avowed purpose of annoying and confounding a hopelessly
        defenseless department that was unfortunate enough to ask for the
        information in the first place."
        -- IEEE Grid news magazine

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_dev" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Yue Wu
        ... Hmm, I think I got the point. btw, I tested your func on a line with 测试 (test) let i0 = byteidx(getline( . ), virtcol( . ) - 1) let i1 =
        Message 3 of 13 , Jan 6, 2009
        • 0 Attachment
          On Wed, 07 Jan 2009 08:25:35 +0800, Tony Mechelynck wrote:

          >
          > On 07/01/09 00:39, Matt Wozniski wrote:
          >> On Tue, Jan 6, 2009 at 6:10 PM, Tony Mechelynck wrote:
          >>> On 06/01/09 12:31, anhnmncb wrote:
          >>>> Hi, list, as title, if so, why can't many functions
          >>>> still handle correctly with unicode? For example the func:
          >>>>
          >>>> getline('.')[col('.')-1]
          >>>>
          >>>> Can't return a charactor outside the range of ascii.
          >>>>
          >>> because string[index] returns a byte value, not a character value: see
          >>> ":help expr8".
          >>
          >> *Nod*
          >>
          >>> If the character at the cursor is> U+007F, you'll get
          >>> the first byte (in the range 0xC0-0xFD, or in practice in the range
          >>> 0xC0-0xF4) of its UTF-8 representation.
          >>
          >> No, you could get some byte of some entirely different character. Ie,
          >> on a line with two 2-byte characters, getline('.')[col('.')-1] on the
          >> second character would return the 2nd byte of the first character.
          >
          > col() gives a one-based byte ordinal. [] takes a zero-based argument. I
          > stand by what I said.
          >
          >>
          >>> The _character_ at the cursor is obtained as follows:
          >>> let i0 = byteidx(getline('.'), virtcol('.') - 1)
          >>> let i1 = byteidx(getline('.'), virtcol('.'))
          >>> let character = strpart(getline('.'), i0, i1 - 10)
          >>
          >> Using virtcol() there seems broken... what if you're in the middle of
          >> a tab, for example, with virtualedit=all?
          >>
          >> :echo join(split("áéíóú", '\zs')[1:3], '')
          >
          > OK, I didn't think of virtual editing, nor even, it seems, of
          > multi-column characters such as tabs and fullwidth CJK. However, [1:3]
          > wouldn't work because the idea is that we're in a script, we don't know
          > that we're in the 1st, 2nd or 3rd column, just that we want "whatever is
          > at the cursor". I might do it with
          >
          > function CursorChar()
          > normal yl
          > return @@
          > endfunction
          >
          >>
          >> is how I would do it... but, is there any real reason why indexing
          >> into a string *should* be byte oriented instead of character oriented,
          >> apart from backwards compatibility? It seems drastically less easy to
          >> use the thing that more people want to use more of the time; and in
          >> fact some of the snippets in the vim help (like the example given at
          >> :help expr-8) won't work on multibyte lines given the way that string
          >> indexing works now. It seems like a place where the cost of losing
          >> backwards compatibility might be outweighed by the cost of keeping
          >> things the way they are...
          >>
          >> ~Matt
          >
          > Changing an existing construct from byte-oriented to
          > multibyte-character-oriented would probably break a lot of existing
          > scripts. I don't believe Bram would ever accept that.
          >
          > Best regards,
          > Tony.

          Hmm, I think I got the point.

          btw, I tested your func on a line with "测试"(test)

          let i0 = byteidx(getline('.'), virtcol('.') - 1)
          let i1 = byteidx(getline('.'), virtcol('.'))
          let character = strpart(getline('.'), i0, i1 - 10)

          Then echo character got nothing.

          --
          Regards,
          Van.

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_dev" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • Matt Wozniski
          On 1/6/09, Tony Mechelynck wrote: On 07/01/09 00:39, Matt Wozniski wrote: On Tue, Jan 6, 2009 at 6:10 PM, Tony Mechelynck wrote: On 06/01/09
          Message 4 of 13 , Jan 6, 2009
          • 0 Attachment
            On 1/6/09, Tony Mechelynck wrote:
            >
            > On 07/01/09 00:39, Matt Wozniski wrote:
            > > On Tue, Jan 6, 2009 at 6:10 PM, Tony Mechelynck wrote:
            > >> On 06/01/09 12:31, anhnmncb wrote:
            > >>> Hi, list, as title, if so, why can't many functions
            > >>> still handle correctly with unicode? For example the func:
            > >>>
            > >>> getline('.')[col('.')-1]
            > >>>
            > >>> Can't return a charactor outside the range of ascii.
            > >>>
            > >> because string[index] returns a byte value, not a character value: see
            > >> ":help expr8".
            > >
            > > *Nod*
            > >
            > >> If the character at the cursor is> U+007F, you'll get
            > >> the first byte (in the range 0xC0-0xFD, or in practice in the range
            > >> 0xC0-0xF4) of its UTF-8 representation.
            > >
            > > No, you could get some byte of some entirely different character. Ie,
            > > on a line with two 2-byte characters, getline('.')[col('.')-1] on the
            > > second character would return the 2nd byte of the first character.
            >
            > col() gives a one-based byte ordinal. [] takes a zero-based argument. I
            > stand by what I said.

            Ooh, you're right - I forgot col() returned a byte index, and not the
            column as its name would imply...

            > >> The _character_ at the cursor is obtained as follows:
            > >> let i0 = byteidx(getline('.'), virtcol('.') - 1)
            > >> let i1 = byteidx(getline('.'), virtcol('.'))
            > >> let character = strpart(getline('.'), i0, i1 - 10)
            > >
            > > Using virtcol() there seems broken... what if you're in the middle of
            > > a tab, for example, with virtualedit=all?
            > >
            > > :echo join(split("áéíóú", '\zs')[1:3], '')
            >
            > OK, I didn't think of virtual editing, nor even, it seems, of
            > multi-column characters such as tabs and fullwidth CJK. However, [1:3]
            > wouldn't work because the idea is that we're in a script, we don't know
            > that we're in the 1st, 2nd or 3rd column, just that we want "whatever is
            > at the cursor". I might do it with
            >
            > function CursorChar()
            > normal yl
            > return @@
            > endfunction

            echo matchstr(getline('.'), '\%' . col('.') . 'c.')

            does the same thing without clobbering the unnamed register...
            slightly more elegant, imho.

            > > is how I would do it... but, is there any real reason why indexing
            > > into a string *should* be byte oriented instead of character oriented,
            > > apart from backwards compatibility? It seems drastically less easy to
            > > use the thing that more people want to use more of the time; and in
            > > fact some of the snippets in the vim help (like the example given at
            > > :help expr-8) won't work on multibyte lines given the way that string
            > > indexing works now. It seems like a place where the cost of losing
            > > backwards compatibility might be outweighed by the cost of keeping
            > > things the way they are...
            >
            > Changing an existing construct from byte-oriented to
            > multibyte-character-oriented would probably break a lot of existing
            > scripts. I don't believe Bram would ever accept that.

            But sometimes, breaking things is required to make progress. The fact
            that we're having a conversation with both of us suggesting (fairly
            complicated) things that haven't worked is a perfect proof for the
            fact that the current system is counterintuitive and hard to use...

            ~Matt

            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_dev" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • Tony Mechelynck
            ... Again, col( . ) is a byte index, not a column. What about virtcol( . ) instead? To avoid clobbering @@ I could save/restore it. ... That s no reason for
            Message 5 of 13 , Jan 6, 2009
            • 0 Attachment
              On 07/01/09 02:14, Matt Wozniski wrote:
              > On 1/6/09, Tony Mechelynck wrote:
              >> On 07/01/09 00:39, Matt Wozniski wrote:
              >> > On Tue, Jan 6, 2009 at 6:10 PM, Tony Mechelynck wrote:
              >> >> On 06/01/09 12:31, anhnmncb wrote:
              >> >>> Hi, list, as title, if so, why can't many functions
              >> >>> still handle correctly with unicode? For example the func:
              >> >>>
              >> >>> getline('.')[col('.')-1]
              >> >>>
              >> >>> Can't return a charactor outside the range of ascii.
              >> >>>
              >> >> because string[index] returns a byte value, not a character value: see
              >> >> ":help expr8".
              >> >
              >> > *Nod*
              >> >
              >> >> If the character at the cursor is> U+007F, you'll get
              >> >> the first byte (in the range 0xC0-0xFD, or in practice in the range
              >> >> 0xC0-0xF4) of its UTF-8 representation.
              >> >
              >> > No, you could get some byte of some entirely different character. Ie,
              >> > on a line with two 2-byte characters, getline('.')[col('.')-1] on the
              >> > second character would return the 2nd byte of the first character.
              >>
              >> col() gives a one-based byte ordinal. [] takes a zero-based argument. I
              >> stand by what I said.
              >
              > Ooh, you're right - I forgot col() returned a byte index, and not the
              > column as its name would imply...
              >
              >> >> The _character_ at the cursor is obtained as follows:
              >> >> let i0 = byteidx(getline('.'), virtcol('.') - 1)
              >> >> let i1 = byteidx(getline('.'), virtcol('.'))
              >> >> let character = strpart(getline('.'), i0, i1 - 10)
              >> >
              >> > Using virtcol() there seems broken... what if you're in the middle of
              >> > a tab, for example, with virtualedit=all?
              >> >
              >> > :echo join(split("áéíóú", '\zs')[1:3], '')
              >>
              >> OK, I didn't think of virtual editing, nor even, it seems, of
              >> multi-column characters such as tabs and fullwidth CJK. However, [1:3]
              >> wouldn't work because the idea is that we're in a script, we don't know
              >> that we're in the 1st, 2nd or 3rd column, just that we want "whatever is
              >> at the cursor". I might do it with
              >>
              >> function CursorChar()
              >> normal yl
              >> return @@
              >> endfunction
              >
              > echo matchstr(getline('.'), '\%' . col('.') . 'c.')

              Again, col('.') is a byte index, not a column. What about virtcol('.')
              instead?

              To avoid clobbering @@ I could save/restore it.

              >
              > does the same thing without clobbering the unnamed register...
              > slightly more elegant, imho.
              >
              >> > is how I would do it... but, is there any real reason why indexing
              >> > into a string *should* be byte oriented instead of character oriented,
              >> > apart from backwards compatibility? It seems drastically less easy to
              >> > use the thing that more people want to use more of the time; and in
              >> > fact some of the snippets in the vim help (like the example given at
              >> > :help expr-8) won't work on multibyte lines given the way that string
              >> > indexing works now. It seems like a place where the cost of losing
              >> > backwards compatibility might be outweighed by the cost of keeping
              >> > things the way they are...
              >>
              >> Changing an existing construct from byte-oriented to
              >> multibyte-character-oriented would probably break a lot of existing
              >> scripts. I don't believe Bram would ever accept that.
              >
              > But sometimes, breaking things is required to make progress. The fact
              > that we're having a conversation with both of us suggesting (fairly
              > complicated) things that haven't worked is a perfect proof for the
              > fact that the current system is counterintuitive and hard to use...
              >
              > ~Matt

              That's no reason for breaking what does work. I don't mind
              counterintuitive as long as it's documented.


              Best regards,
              Tony.
              --
              They told me you had proven it When they discovered our results
              About a month before. Their hair began to curl
              The proof was valid, more or less Instead of understanding it
              But rather less than more. We'd run the thing through PRL.

              He sent them word that we would try Don't tell a soul about all this
              To pass where they had failed For it must ever be
              And after we were done, to them A secret, kept from all the rest
              The new proof would be mailed. Between yourself and me.

              My notion was to start again
              Ignoring all they'd done
              We quickly turned it into code
              To see if it would run.

              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_dev" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • Tony Mechelynck
              ... Try the function in my next post. If you don t want to clobber the unnamed register, here is a variant: function CursorChar() let unnamed = @@ normal yl
              Message 6 of 13 , Jan 6, 2009
              • 0 Attachment
                On 07/01/09 02:10, Yue Wu wrote:
                > On Wed, 07 Jan 2009 08:25:35 +0800, Tony Mechelynck wrote:
                >
                >> On 07/01/09 00:39, Matt Wozniski wrote:
                >>> On Tue, Jan 6, 2009 at 6:10 PM, Tony Mechelynck wrote:
                >>>> On 06/01/09 12:31, anhnmncb wrote:
                >>>>> Hi, list, as title, if so, why can't many functions
                >>>>> still handle correctly with unicode? For example the func:
                >>>>>
                >>>>> getline('.')[col('.')-1]
                >>>>>
                >>>>> Can't return a charactor outside the range of ascii.
                >>>>>
                >>>> because string[index] returns a byte value, not a character value: see
                >>>> ":help expr8".
                >>> *Nod*
                >>>
                >>>> If the character at the cursor is> U+007F, you'll get
                >>>> the first byte (in the range 0xC0-0xFD, or in practice in the range
                >>>> 0xC0-0xF4) of its UTF-8 representation.
                >>> No, you could get some byte of some entirely different character. Ie,
                >>> on a line with two 2-byte characters, getline('.')[col('.')-1] on the
                >>> second character would return the 2nd byte of the first character.
                >> col() gives a one-based byte ordinal. [] takes a zero-based argument. I
                >> stand by what I said.
                >>
                >>>> The _character_ at the cursor is obtained as follows:
                >>>> let i0 = byteidx(getline('.'), virtcol('.') - 1)
                >>>> let i1 = byteidx(getline('.'), virtcol('.'))
                >>>> let character = strpart(getline('.'), i0, i1 - 10)
                >>> Using virtcol() there seems broken... what if you're in the middle of
                >>> a tab, for example, with virtualedit=all?
                >>>
                >>> :echo join(split("áéíóú", '\zs')[1:3], '')
                >> OK, I didn't think of virtual editing, nor even, it seems, of
                >> multi-column characters such as tabs and fullwidth CJK. However, [1:3]
                >> wouldn't work because the idea is that we're in a script, we don't know
                >> that we're in the 1st, 2nd or 3rd column, just that we want "whatever is
                >> at the cursor". I might do it with
                >>
                >> function CursorChar()
                >> normal yl
                >> return @@
                >> endfunction
                >>
                >>> is how I would do it... but, is there any real reason why indexing
                >>> into a string *should* be byte oriented instead of character oriented,
                >>> apart from backwards compatibility? It seems drastically less easy to
                >>> use the thing that more people want to use more of the time; and in
                >>> fact some of the snippets in the vim help (like the example given at
                >>> :help expr-8) won't work on multibyte lines given the way that string
                >>> indexing works now. It seems like a place where the cost of losing
                >>> backwards compatibility might be outweighed by the cost of keeping
                >>> things the way they are...
                >>>
                >>> ~Matt
                >> Changing an existing construct from byte-oriented to
                >> multibyte-character-oriented would probably break a lot of existing
                >> scripts. I don't believe Bram would ever accept that.
                >>
                >> Best regards,
                >> Tony.
                >
                > Hmm, I think I got the point.
                >
                > btw, I tested your func on a line with "测试"(test)
                >
                > let i0 = byteidx(getline('.'), virtcol('.') - 1)
                > let i1 = byteidx(getline('.'), virtcol('.'))
                > let character = strpart(getline('.'), i0, i1 - 10)
                >
                > Then echo character got nothing.
                >

                Try the function in my next post. If you don't want to clobber the
                unnamed register, here is a variant:

                function CursorChar()
                let unnamed = @@
                normal yl
                let retval = @@
                let @@ = unnamed
                return retval
                endfunction


                Best regards,
                Tony.
                --
                If you had any brains, you'd be dangerous.


                Best regards,
                Tony.

                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_dev" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---
              • Yue Wu
                ... Yes, it works, but I don t like a function that contains normal operators, I always think that a normal operator is only used for normal mode by keyboard,
                Message 7 of 13 , Jan 6, 2009
                • 0 Attachment
                  On Wed, 07 Jan 2009 10:24:30 +0800, Tony Mechelynck wrote:

                  >
                  > On 07/01/09 02:10, Yue Wu wrote:
                  >> On Wed, 07 Jan 2009 08:25:35 +0800, Tony Mechelynck wrote:
                  >>
                  >>> On 07/01/09 00:39, Matt Wozniski wrote:
                  >>>> On Tue, Jan 6, 2009 at 6:10 PM, Tony Mechelynck wrote:
                  >>>>> On 06/01/09 12:31, anhnmncb wrote:
                  >>>>>> Hi, list, as title, if so, why can't many functions
                  >>>>>> still handle correctly with unicode? For example the func:
                  >>>>>>
                  >>>>>> getline('.')[col('.')-1]
                  >>>>>>
                  >>>>>> Can't return a charactor outside the range of ascii.
                  >>>>>>
                  >>>>> because string[index] returns a byte value, not a character value:
                  >>>>> see
                  >>>>> ":help expr8".
                  >>>> *Nod*
                  >>>>
                  >>>>> If the character at the cursor is> U+007F, you'll get
                  >>>>> the first byte (in the range 0xC0-0xFD, or in practice in the range
                  >>>>> 0xC0-0xF4) of its UTF-8 representation.
                  >>>> No, you could get some byte of some entirely different character. Ie,
                  >>>> on a line with two 2-byte characters, getline('.')[col('.')-1] on the
                  >>>> second character would return the 2nd byte of the first character.
                  >>> col() gives a one-based byte ordinal. [] takes a zero-based argument. I
                  >>> stand by what I said.
                  >>>
                  >>>>> The _character_ at the cursor is obtained as follows:
                  >>>>> let i0 = byteidx(getline('.'), virtcol('.') - 1)
                  >>>>> let i1 = byteidx(getline('.'), virtcol('.'))
                  >>>>> let character = strpart(getline('.'), i0, i1 - 10)
                  >>>> Using virtcol() there seems broken... what if you're in the middle of
                  >>>> a tab, for example, with virtualedit=all?
                  >>>>
                  >>>> :echo join(split("áéíóú", '\zs')[1:3], '')
                  >>> OK, I didn't think of virtual editing, nor even, it seems, of
                  >>> multi-column characters such as tabs and fullwidth CJK. However, [1:3]
                  >>> wouldn't work because the idea is that we're in a script, we don't know
                  >>> that we're in the 1st, 2nd or 3rd column, just that we want "whatever
                  >>> is
                  >>> at the cursor". I might do it with
                  >>>
                  >>> function CursorChar()
                  >>> normal yl
                  >>> return @@
                  >>> endfunction
                  >>>
                  >>>> is how I would do it... but, is there any real reason why indexing
                  >>>> into a string *should* be byte oriented instead of character oriented,
                  >>>> apart from backwards compatibility? It seems drastically less easy to
                  >>>> use the thing that more people want to use more of the time; and in
                  >>>> fact some of the snippets in the vim help (like the example given at
                  >>>> :help expr-8) won't work on multibyte lines given the way that string
                  >>>> indexing works now. It seems like a place where the cost of losing
                  >>>> backwards compatibility might be outweighed by the cost of keeping
                  >>>> things the way they are...
                  >>>>
                  >>>> ~Matt
                  >>> Changing an existing construct from byte-oriented to
                  >>> multibyte-character-oriented would probably break a lot of existing
                  >>> scripts. I don't believe Bram would ever accept that.
                  >>>
                  >>> Best regards,
                  >>> Tony.
                  >>
                  >> Hmm, I think I got the point.
                  >>
                  >> btw, I tested your func on a line with "测试"(test)
                  >>
                  >> let i0 = byteidx(getline('.'), virtcol('.') - 1)
                  >> let i1 = byteidx(getline('.'), virtcol('.'))
                  >> let character = strpart(getline('.'), i0, i1 - 10)
                  >>
                  >> Then echo character got nothing.
                  >>
                  >
                  > Try the function in my next post. If you don't want to clobber the
                  > unnamed register, here is a variant:
                  >
                  > function CursorChar()
                  > let unnamed = @@
                  > normal yl
                  > let retval = @@
                  > let @@ = unnamed
                  > return retval
                  > endfunction

                  Yes, it works, but I don't like a function that contains normal
                  operators, I always think that a normal operator is only used for
                  normal mode by keyboard, if write a function, it's better to use
                  the function coressponding to the operator.

                  This version works fine:

                  matchstr(getline('.'), '\%' . col('.') . 'c.')

                  whereas this one doesn't:

                  matchstr(getline('.'), '\%' . virtcol('.') . 'c.')

                  >
                  >
                  > Best regards,
                  > Tony.



                  --
                  Regards,
                  Van.

                  --~--~---------~--~----~------------~-------~--~----~
                  You received this message from the "vim_dev" maillist.
                  For more information, visit http://www.vim.org/maillist.php
                  -~----------~----~----~----~------~----~------~--~---
                • Tony Mechelynck
                  On 07/01/09 03:38, Yue Wu wrote: [...] ... Oh? I have the opposite impression. For normal mode by keyboard, I don t use ... but yl To me, the :normal command
                  Message 8 of 13 , Jan 6, 2009
                  • 0 Attachment
                    On 07/01/09 03:38, Yue Wu wrote:
                    [...]
                    > I always think that a normal operator is only used for
                    > normal mode by keyboard,[...]

                    Oh? I have the opposite impression. For normal mode by keyboard, I don't use

                    :normal yl<Enter>

                    but

                    yl

                    To me, the ":normal" command is _only_ useful in scripts, in order to
                    run in Ex mode the key sequences meant for Normal mode.


                    Best regards,
                    Tony.
                    --
                    If bankers can count, how come they have eight windows and only four
                    tellers?

                    --~--~---------~--~----~------------~-------~--~----~
                    You received this message from the "vim_dev" maillist.
                    For more information, visit http://www.vim.org/maillist.php
                    -~----------~----~----~----~------~----~------~--~---
                  • Yue Wu
                    ... I mean I prevent using yl from :normal if there is a function :yank :) -- Regards, Van. --~--~---------~--~----~------------~-------~--~----~ You received
                    Message 9 of 13 , Jan 6, 2009
                    • 0 Attachment
                      On Wed, 07 Jan 2009 10:55:33 +0800, Tony Mechelynck wrote:

                      >
                      > On 07/01/09 03:38, Yue Wu wrote:
                      > [...]
                      >> I always think that a normal operator is only used for
                      >> normal mode by keyboard,[...]
                      >
                      > Oh? I have the opposite impression. For normal mode by keyboard, I don't
                      > use
                      >
                      > :normal yl<Enter>
                      >
                      > but
                      >
                      > yl
                      >
                      > To me, the ":normal" command is _only_ useful in scripts, in order to
                      > run in Ex mode the key sequences meant for Normal mode.

                      I mean I prevent using yl from :normal if there is a function :yank :)

                      --
                      Regards,
                      Van.

                      --~--~---------~--~----~------------~-------~--~----~
                      You received this message from the "vim_dev" maillist.
                      For more information, visit http://www.vim.org/maillist.php
                      -~----------~----~----~----~------~----~------~--~---
                    • Tony Mechelynck
                      ... There is a :yank command but it acts linewise. Here we want a characterwise yank, so we cannot use :yank. The function you proposed is so complex I would
                      Message 10 of 13 , Jan 6, 2009
                      • 0 Attachment
                        On 07/01/09 04:17, Yue Wu wrote:
                        > On Wed, 07 Jan 2009 10:55:33 +0800, Tony Mechelynck wrote:
                        >
                        >> On 07/01/09 03:38, Yue Wu wrote:
                        >> [...]
                        >>> I always think that a normal operator is only used for
                        >>> normal mode by keyboard,[...]
                        >> Oh? I have the opposite impression. For normal mode by keyboard, I don't
                        >> use
                        >>
                        >> :normal yl<Enter>
                        >>
                        >> but
                        >>
                        >> yl
                        >>
                        >> To me, the ":normal" command is _only_ useful in scripts, in order to
                        >> run in Ex mode the key sequences meant for Normal mode.
                        >
                        > I mean I prevent using yl from :normal if there is a function :yank :)
                        >

                        There is a ":yank" command but it acts linewise. Here we want a
                        characterwise yank, so we cannot use :yank.

                        The function you proposed is so complex I would run much more risk when
                        trying to construct it than with ":normal yl".

                        If the complexity is similar, I use the ex-command in scripts, for
                        instance ":wincmd k" rather than ":normal ^Wk" where ^W would be
                        obtained by hitting Ctrl-V followed by Ctrl-W.

                        Best regards,
                        Tony.
                        --
                        ARTHUR: Shut up! Will you shut up!
                        DENNIS: Ah, now we see the violence inherent in the system.
                        ARTHUR: Shut up!
                        DENNIS: Oh! Come and see the violence inherent in the system!
                        HELP! HELP! I'm being repressed!
                        The Quest for the Holy Grail (Monty
                        Python)

                        --~--~---------~--~----~------------~-------~--~----~
                        You received this message from the "vim_dev" maillist.
                        For more information, visit http://www.vim.org/maillist.php
                        -~----------~----~----~----~------~----~------~--~---
                      • Matt Wozniski
                        ... Nope. %15c is also a byte index, not a column (which is also counter-intuitive, and brings us back to the problem - that however well documented it is,
                        Message 11 of 13 , Jan 6, 2009
                        • 0 Attachment
                          On 1/6/09, Tony Mechelynck wrote:
                          > On 1/6/09, Matt Wozniski wrote:
                          >> echo matchstr(getline('.'), '\%' . col('.') . 'c.')
                          >
                          > Again, col('.') is a byte index, not a column. What about virtcol('.')
                          > instead?

                          Nope. \%15c is also a byte index, not a column (which is also
                          counter-intuitive, and brings us back to the problem - that however
                          well documented it is, even experienced vimscript programmers get this
                          stuff wrong regularly.)

                          >>> Changing an existing construct from byte-oriented to
                          >>> multibyte-character-oriented would probably break a lot of existing
                          >>> scripts. I don't believe Bram would ever accept that.
                          >>
                          >> But sometimes, breaking things is required to make progress. The fact
                          >> that we're having a conversation with both of us suggesting (fairly
                          >> complicated) things that haven't worked is a perfect proof for the
                          >> fact that the current system is counterintuitive and hard to use...
                          >
                          > That's no reason for breaking what does work. I don't mind
                          > counterintuitive as long as it's documented.

                          See above. If no one can remember how to use it, or the workarounds
                          to make it work are worth more trouble to the author than the trouble
                          of not having it work on multibyte input, I'd say that it _doesn't_
                          work as is.

                          In fact, I'd argue that having string indexing be byte-oriented after
                          multibyte was added was a regression that broke things that did work:
                          before, getline('.')[col('.')-1] was a valid way to get the character
                          under the cursor, and afterwards it was not. Changing this behavior
                          would probably break very few scripts, since I doubt most scripters
                          are defensive about doing it correctly, and would mean that all the
                          broken code that already exists, and even the code that was written
                          before proper multibyte support was added (I believe it was added
                          after vimscript, right?), would continue to work *unless* it was
                          written intentionally to work around this issue. And I think that
                          authors who knew enough to work around this would, by and large, be
                          happy to see it fixed. I think that the advantages of having new
                          scripts work the way that they should, instead of the way that they
                          do, would greatly outweigh the disadvantages of breaking scripts
                          depending on the broken behavior. But, Bram's opinion is the final
                          answer, so we'll see if he weighs in.

                          ~Matt

                          --~--~---------~--~----~------------~-------~--~----~
                          You received this message from the "vim_dev" maillist.
                          For more information, visit http://www.vim.org/maillist.php
                          -~----------~----~----~----~------~----~------~--~---
                        Your message has been successfully submitted and would be delivered to recipients shortly.