Loading ...
Sorry, an error occurred while loading the content.

52925Re: Is vim really fully unicoded?

Expand Messages
  • Tony Mechelynck
    Jan 6, 2009
    • 0 Attachment
      On 07/01/09 02:10, Yue Wu wrote:
      > On Wed, 07 Jan 2009 08:25:35 +0800, Tony Mechelynck wrote:
      >> On 07/01/09 00:39, Matt Wozniski wrote:
      >>> On Tue, Jan 6, 2009 at 6:10 PM, Tony Mechelynck wrote:
      >>>> On 06/01/09 12:31, anhnmncb wrote:
      >>>>> Hi, list, as title, if so, why can't many functions
      >>>>> still handle correctly with unicode? For example the func:
      >>>>> getline('.')[col('.')-1]
      >>>>> Can't return a charactor outside the range of ascii.
      >>>> because string[index] returns a byte value, not a character value: see
      >>>> ":help expr8".
      >>> *Nod*
      >>>> If the character at the cursor is> U+007F, you'll get
      >>>> the first byte (in the range 0xC0-0xFD, or in practice in the range
      >>>> 0xC0-0xF4) of its UTF-8 representation.
      >>> No, you could get some byte of some entirely different character. Ie,
      >>> on a line with two 2-byte characters, getline('.')[col('.')-1] on the
      >>> second character would return the 2nd byte of the first character.
      >> col() gives a one-based byte ordinal. [] takes a zero-based argument. I
      >> stand by what I said.
      >>>> The _character_ at the cursor is obtained as follows:
      >>>> let i0 = byteidx(getline('.'), virtcol('.') - 1)
      >>>> let i1 = byteidx(getline('.'), virtcol('.'))
      >>>> let character = strpart(getline('.'), i0, i1 - 10)
      >>> Using virtcol() there seems broken... what if you're in the middle of
      >>> a tab, for example, with virtualedit=all?
      >>> :echo join(split("áéíóú", '\zs')[1:3], '')
      >> OK, I didn't think of virtual editing, nor even, it seems, of
      >> multi-column characters such as tabs and fullwidth CJK. However, [1:3]
      >> wouldn't work because the idea is that we're in a script, we don't know
      >> that we're in the 1st, 2nd or 3rd column, just that we want "whatever is
      >> at the cursor". I might do it with
      >> function CursorChar()
      >> normal yl
      >> return @@
      >> endfunction
      >>> is how I would do it... but, is there any real reason why indexing
      >>> into a string *should* be byte oriented instead of character oriented,
      >>> apart from backwards compatibility? It seems drastically less easy to
      >>> use the thing that more people want to use more of the time; and in
      >>> fact some of the snippets in the vim help (like the example given at
      >>> :help expr-8) won't work on multibyte lines given the way that string
      >>> indexing works now. It seems like a place where the cost of losing
      >>> backwards compatibility might be outweighed by the cost of keeping
      >>> things the way they are...
      >>> ~Matt
      >> Changing an existing construct from byte-oriented to
      >> multibyte-character-oriented would probably break a lot of existing
      >> scripts. I don't believe Bram would ever accept that.
      >> Best regards,
      >> Tony.
      > Hmm, I think I got the point.
      > btw, I tested your func on a line with "测试"(test)
      > let i0 = byteidx(getline('.'), virtcol('.') - 1)
      > let i1 = byteidx(getline('.'), virtcol('.'))
      > let character = strpart(getline('.'), i0, i1 - 10)
      > Then echo character got nothing.

      Try the function in my next post. If you don't want to clobber the
      unnamed register, here is a variant:

      function CursorChar()
      let unnamed = @@
      normal yl
      let retval = @@
      let @@ = unnamed
      return retval

      Best regards,
      If you had any brains, you'd be dangerous.

      Best regards,

      You received this message from the "vim_dev" maillist.
      For more information, visit http://www.vim.org/maillist.php
    • Show all 13 messages in this topic