Loading ...
Sorry, an error occurred while loading the content.

52922Re: Is vim really fully unicoded?

Expand Messages
  • Matt Wozniski
    Jan 6, 2009
    • 0 Attachment
      On 1/6/09, Tony Mechelynck wrote:
      >
      > On 07/01/09 00:39, Matt Wozniski wrote:
      > > On Tue, Jan 6, 2009 at 6:10 PM, Tony Mechelynck wrote:
      > >> On 06/01/09 12:31, anhnmncb wrote:
      > >>> Hi, list, as title, if so, why can't many functions
      > >>> still handle correctly with unicode? For example the func:
      > >>>
      > >>> getline('.')[col('.')-1]
      > >>>
      > >>> Can't return a charactor outside the range of ascii.
      > >>>
      > >> because string[index] returns a byte value, not a character value: see
      > >> ":help expr8".
      > >
      > > *Nod*
      > >
      > >> If the character at the cursor is> U+007F, you'll get
      > >> the first byte (in the range 0xC0-0xFD, or in practice in the range
      > >> 0xC0-0xF4) of its UTF-8 representation.
      > >
      > > No, you could get some byte of some entirely different character. Ie,
      > > on a line with two 2-byte characters, getline('.')[col('.')-1] on the
      > > second character would return the 2nd byte of the first character.
      >
      > col() gives a one-based byte ordinal. [] takes a zero-based argument. I
      > stand by what I said.

      Ooh, you're right - I forgot col() returned a byte index, and not the
      column as its name would imply...

      > >> The _character_ at the cursor is obtained as follows:
      > >> let i0 = byteidx(getline('.'), virtcol('.') - 1)
      > >> let i1 = byteidx(getline('.'), virtcol('.'))
      > >> let character = strpart(getline('.'), i0, i1 - 10)
      > >
      > > Using virtcol() there seems broken... what if you're in the middle of
      > > a tab, for example, with virtualedit=all?
      > >
      > > :echo join(split("áéíóú", '\zs')[1:3], '')
      >
      > OK, I didn't think of virtual editing, nor even, it seems, of
      > multi-column characters such as tabs and fullwidth CJK. However, [1:3]
      > wouldn't work because the idea is that we're in a script, we don't know
      > that we're in the 1st, 2nd or 3rd column, just that we want "whatever is
      > at the cursor". I might do it with
      >
      > function CursorChar()
      > normal yl
      > return @@
      > endfunction

      echo matchstr(getline('.'), '\%' . col('.') . 'c.')

      does the same thing without clobbering the unnamed register...
      slightly more elegant, imho.

      > > is how I would do it... but, is there any real reason why indexing
      > > into a string *should* be byte oriented instead of character oriented,
      > > apart from backwards compatibility? It seems drastically less easy to
      > > use the thing that more people want to use more of the time; and in
      > > fact some of the snippets in the vim help (like the example given at
      > > :help expr-8) won't work on multibyte lines given the way that string
      > > indexing works now. It seems like a place where the cost of losing
      > > backwards compatibility might be outweighed by the cost of keeping
      > > things the way they are...
      >
      > Changing an existing construct from byte-oriented to
      > multibyte-character-oriented would probably break a lot of existing
      > scripts. I don't believe Bram would ever accept that.

      But sometimes, breaking things is required to make progress. The fact
      that we're having a conversation with both of us suggesting (fairly
      complicated) things that haven't worked is a perfect proof for the
      fact that the current system is counterintuitive and hard to use...

      ~Matt

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_dev" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Show all 13 messages in this topic