Loading ...
Sorry, an error occurred while loading the content.

Re: utf-8, combining characters, and 'x' -- a workaround for Hebrew/A rabic, etc...

Expand Messages
  • Bram Moolenaar
    ... The current choice to have x delete both the starting character and the following composing characters wasn t really a deliberate choice. But it does do
    Message 1 of 2 , Dec 27, 2000
      Ron Aaron wrote:

      > Some of you might have started using vim with utf-8, and languages like
      > Hebrew or Arabic which have 'combining class' characters. What this means
      > is that in these languages, there are characters which 'overprint' the
      > preceding character. In Hebrew, for example, all the vowels are marks which
      > overprint the consonant. In normal cp1255 encoded Hebrew with vowels, one
      > sees the consonants followed by the vowels, which looks weird and is very
      > hard to read.
      > Now in vim, with utf-8 support (and an appropriate font!), one can simply:
      > :set cc=utf-8
      > :e ++cc=cp1255 myfile_with_vowels.txt
      > and voila! the text appears, with the vowels correctly displayed overtop the
      > consonants! This is really wonderful!
      > Sadly, the normal-mode 'x' command deletes the consonant and the vowels,
      > which is not what someone editing voweled Hebrew or Arabic would like to
      > have happen. Rather, one expects the delete to affect the last added vowel.
      > So here is a function/mapping which takes a pile-up of characters and
      > combining-characters, and removes the last one, and overwrites the original
      > character. It works pretty much as I would expect; the only problem is that
      > if the character is the last one on the line, it doesn't work correctly (:-<
      > but I can live with that. (it puts the pasted character one char before
      > where it belongs, as one can see by how I put the char back in!).

      The current choice to have "x" delete both the starting character and the
      following composing characters wasn't really a deliberate choice. But it
      does do the obvious thing for characters with an overprinting accent.
      That's probably wrong for Hebrew. And in some situations you would like to be
      able to delete the accent only (Thai?).

      Defining new commands like "gx" won't be sufficient. You would also want this
      to work when using <BS> in Insert mode and perhaps a few other commands.
      Since this probably depends on the language you are using, wouldn't it be
      better to set an option for this behavior? If you still want to delete the
      base character and composing characters you would have to hit backspace two or
      three times.

      If you really want to use "x" and "gx" in the same text, an option is not the
      right solution.

      hundred-and-one symptoms of being an internet addict:
      115. You are late picking up your kid from school and try to explain
      to the teacher you were stuck in Web traffic.

      /// Bram Moolenaar -- Bram@... -- http://www.moolenaar.net \\\
      ((( Creator of Vim - http://www.vim.org -- ftp://ftp.vim.org/pub/vim )))
      \\\ Help me helping AIDS orphans in Uganda - http://iccf-holland.org ///
    Your message has been successfully submitted and would be delivered to recipients shortly.