Loading ...
Sorry, an error occurred while loading the content.

Re: UTF-8 and Alt-key

Expand Messages
  • DervishD
    Hi Ben :) ... This is exactly the explanation I gave to Mario yesterday night. The example with Alt-h is as follows: - The system is in UTF-8 mode, so the
    Message 1 of 37 , Feb 19, 2008
    View Source
    • 0 Attachment
      Hi Ben :)

      * Ben Schmidt <mail_ben_schmidt@...> dixit:
      > > This is what happens on my the Linux console (the non-X terminal)
      > >
      > > ================================================ System in
      > > iso-8859-15 ================================================ export
      > > LC_ALL=es_ES.iso-8859-15 loadkeys es
      > >
      > > setmetamode bit -----------------------------> For alt not
      > > generate ESC vim
      >
      > This explains your behaviour. Using setmetamode bit will not work with
      > UTF-8. I am not familiar with the command, but am sure that
      > setmetamode bit will be using the highest bit of a byte to signify
      > that the Meta key is used. Unfortunately, UTF-8 also uses this bit,
      > but uses it to indicate that a sequence of characters longer than one
      > byte is being used. So it is not surprising that it takes more than
      > one key to make Vim see something with Alt when you have this mode. If
      > you use setmetamode esc and the appropriate Vim options, I expect it
      > will work. There might be some other convention that uses bits rather
      > than Esc to do it for UTF-8, but I don't know of it, and wouldn't
      > count on it! I think in this case setmetamode esc and appropriate
      > mappings in Vim will be better (or use gvim which handles keys itself
      > rather than relying on the Terminal interpreting them).

      This is exactly the explanation I gave to Mario yesterday night. The
      example with "Alt-h" is as follows:

      - The system is in UTF-8 mode, so the console expects UTF-8 sequences
      that will be passed down to the kernel.

      - You set the metamode to bit, forcing Alt-whatever combos to issue
      bytes with the high bit set.

      - You hit Alt-h, producing byte "0xE8". In UTF-8 that means you are
      sending a THREE byte sequence: 1110xxxx 10yyyyyy 10zzzzzz. You already
      have sent "11101000", so the Unicode codepoint you're sending is
      "00000000000000001000yyyyyyzzzzzz".

      - Until you press more keys, you are NOT sending the other two bytes, so
      the console is waiting for you to complete the UTF-8 thingie you just
      started.

      - As soon as you send another byte whose first two bits are NOT "10",
      the console discards your initial "0xE8" (Alt-h) and the key you
      pressed after that (if you hit a key whose keycode doesn't start with
      "10b", of course).

      - You press another key and that is inserted, as long as it is a valid
      UTF-8 code.

      Things can go very weird if after "Alt-h" you press, for example, "ñ",
      which gets send to the console as 0xc3+0xb1, because the 0xc3 will be
      discarded (it starts with "11", so it cannot be the continuation byte of
      the first 0xe8) and the 0xb1 is incorrect UTF-8 too...

      Raúl Núñez de Arenas Coronado
      --
      Linux Registered User 88736 | http://www.dervishd.net
      It's my PC and I'll cry if I want to... RAmen!
      We are waiting for 13 Feb 2009 23:31:30 +0000 ...

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_use" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • John Thomas
      Hello, ... The terminal interprets the keypresses to make up byte sequences that it then sends to the underlying application. As you say, if you (console) are
      Message 37 of 37 , Mar 19, 2008
      View Source
      • 0 Attachment
        Hello,

        --- Tony Mechelynck <antoine.mechelynck@...> wrote:

        >
        > John Thomas wrote:
        > > Hello people,
        > >
        > > A little late but perhaps I could comment on this.
        > >
        > > -- QUOTED TEXT --
        > >
        > > Hi Ben :)
        > >
        > > * Ben Schmidt<mail_ben_schmidt@...> dixit:
        > >>> This is what happens on my the Linux console (the non-X terminal)
        > >>>
        > >>> ================================================ System in
        > >>> iso-8859-15 ================================================ export
        > >>> LC_ALL=es_ES.iso-8859-15 loadkeys es
        > >>>
        > >>> setmetamode bit -----------------------------> For alt not
        > >>> generate ESC vim
        > >> This explains your behaviour. Using setmetamode bit will not work with
        > >> UTF-8. I am not familiar with the command, but am sure that
        > >> setmetamode bit will be using the highest bit of a byte to signify
        > >> that the Meta key is used. Unfortunately, UTF-8 also uses this bit,
        > >> but uses it to indicate that a sequence of characters longer than one
        > >> byte is being used. So it is not surprising that it takes more than
        > >> one key to make Vim see something with Alt when you have this mode. If
        > >> you use setmetamode esc and the appropriate Vim options, I expect it
        > >> will work. There might be some other convention that uses bits rather
        > >> than Esc to do it for UTF-8, but I don't know of it, and wouldn't
        > >> count on it! I think in this case setmetamode esc and appropriate
        > >> mappings in Vim will be better (or use gvim which handles keys itself
        > >> rather than relying on the Terminal interpreting them).
        > >
        > > This is exactly the explanation I gave to Mario yesterday night. The
        > > example with "Alt-h" is as follows:
        > >
        > > - The system is in UTF-8 mode, so the console expects UTF-8 sequences
        > > that will be passed down to the kernel.
        > >
        > > - You set the metamode to bit, forcing Alt-whatever combos to issue
        > > bytes with the high bit set.
        > >
        > > - You hit Alt-h, producing byte "0xE8". In UTF-8 that means you are
        > > sending a THREE byte sequence: 1110xxxx 10yyyyyy 10zzzzzz. You already
        > > have sent "11101000", so the Unicode codepoint you're sending is
        > > "00000000000000001000yyyyyyzzzzzz".
        > >
        > > - Until you press more keys, you are NOT sending the other two bytes, so
        > > the console is waiting for you to complete the UTF-8 thingie you just
        > > started.
        > >
        > > - As soon as you send another byte whose first two bits are NOT "10",
        > > the console discards your initial "0xE8" (Alt-h) and the key you
        > > pressed after that (if you hit a key whose keycode doesn't start with
        > > "10b", of course).
        > >
        > > - You press another key and that is inserted, as long as it is a valid
        > > UTF-8 code.
        > >
        > > Things can go very weird if after "Alt-h" you press, for example, "ñ",
        > > which gets send to the console as 0xc3+0xb1, because the 0xc3 will be
        > > discarded (it starts with "11", so it cannot be the continuation byte of
        > > the first 0xe8) and the 0xb1 is incorrect UTF-8 too...
        > >
        > > Raúl Núñez de Arenas Coronado
        > >
        > > -- END OF QUOTED TEXT --
        > >
        > > What happens here is that vim and xterm have a special "agreement"
        > > as to how they should function in UTF-8 mode. Xterm translates the
        > > would-be ISO-8859 high-bit-char to its UTF-8 representation, and vim
        > > catches that.
        > >
        > > Problem is, other apps haven't adhered to this convention. Bash for
        > > instance could behave the same way regarding convert-meta, but it doesn't,
        > > so it's useless in UTF-8 mode. It seems the same applies to the linux
        > > console, it doesn't follow this agreement and so vim and it can't
        > > communicate properly.
        > >
        > > It's a matter of convincing other maintainers to adhere to this standard,
        > > if indeed it's proven itself solid already.
        > >
        > > Regards
        >
        > With setmetamode bit, the keyboard is obviously not sending UTF-8, at
        > least for Alt-something combos, for the reasons you explained. If you
        > can find out what the keyboard is actually sending (such as Latin1), you
        > can tell it to Vim. Here are a few examples:
        >
        > 1) 'encoding' is set at startup "correctly for the keyboard", which is
        > UTF-8 in gvim but not in Console Vim
        >
        > if has('multi_byte')
        > if &enc ~=? '^u'
        > if &tenc == ''
        > let &tenc = &enc
        > endif
        > set enc=utf-8
        > endif
        > set fencs=ucs-bom,utf-8,latin1
        > setg bomb fenc=latin1
        > endif
        >
        > 2) 'encoding' is always UTF-8 at startup (even before your vimrc touches
        > it) but this is not what the Console (not GUI) keyboard sends -- and
        > 'termencoding' is usually empty, which in that case it shouldn't be
        >
        > if !has('gui_running') && &tenc == ''
        > set tenc=latin1
        > endif
        >
        > Note: You cannot tell Console Vim that the keyboard and the screen
        > represent the data using two different charsets. If your keyboard sends
        > Latin1 (or something) with high bit set for Alt, and your screen expects
        > UTF-8, you're stuck: anything else than 7-bit US-ASCII will give you
        > insuperable problems then. In such circumstances you're better off with
        > "Esc for Alt", however clumsy that might be otherwise; the timeout
        > options can help you though (e.g. ":set timeout timeoutlen=2000
        > ttimeoutlen=100").

        The terminal interprets the keypresses to make up byte sequences that
        it then sends to the underlying application. As you say, if you (console)
        are functioning in UTF-8 mode, sending a single byte for meta-key combinations
        is not viable; and that's exactly what's happening here. We have this
        interface that vim and xterm agreed to. Bash's convert-meta is in the
        receiving end, like vim. The linux console is in the sending end, like
        xterm. My opinion, agreeing with what you said above, is that such behaviour
        for the console is useless and thus it should probably be hacked to function
        the way xterm already does in UTF-8. Same goes for bash and convert-meta,
        it is useless interpreting a single byte as a Meta-keypress when that byte
        could be happening in a UTF-8 sequence, so it should expect the
        "UTF-8 Meta-sequences".

        The one downside of this convention is that you can't differentiate genuine
        characters made up by key compositions from meta-keypresses, so you need to
        use Ctrl-v whenever you want a mapped character to actually appear on the
        screen and not be catched as a meta-keypress. Still, I think this is the best
        option we have.

        > Gvim doesn't have that limitation, since it displays its output
        > graphically without going through a text terminal.
        >
        >
        > Best regards,
        > Tony.

        Best regards,
        John.


        ____________________________________________________________________________________
        Looking for last minute shopping deals?
        Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping


        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_use" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      Your message has been successfully submitted and would be delivered to recipients shortly.