Re: utf-8, combining characters, and 'x' -- a workaround for Hebrew/A rabic, etc...
- Ron Aaron wrote:
> Some of you might have started using vim with utf-8, and languages likeThe current choice to have "x" delete both the starting character and the
> Hebrew or Arabic which have 'combining class' characters. What this means
> is that in these languages, there are characters which 'overprint' the
> preceding character. In Hebrew, for example, all the vowels are marks which
> overprint the consonant. In normal cp1255 encoded Hebrew with vowels, one
> sees the consonants followed by the vowels, which looks weird and is very
> hard to read.
> Now in vim, with utf-8 support (and an appropriate font!), one can simply:
> :set cc=utf-8
> :e ++cc=cp1255 myfile_with_vowels.txt
> and voila! the text appears, with the vowels correctly displayed overtop the
> consonants! This is really wonderful!
> Sadly, the normal-mode 'x' command deletes the consonant and the vowels,
> which is not what someone editing voweled Hebrew or Arabic would like to
> have happen. Rather, one expects the delete to affect the last added vowel.
> So here is a function/mapping which takes a pile-up of characters and
> combining-characters, and removes the last one, and overwrites the original
> character. It works pretty much as I would expect; the only problem is that
> if the character is the last one on the line, it doesn't work correctly (:-<
> but I can live with that. (it puts the pasted character one char before
> where it belongs, as one can see by how I put the char back in!).
following composing characters wasn't really a deliberate choice. But it
does do the obvious thing for characters with an overprinting accent.
That's probably wrong for Hebrew. And in some situations you would like to be
able to delete the accent only (Thai?).
Defining new commands like "gx" won't be sufficient. You would also want this
to work when using <BS> in Insert mode and perhaps a few other commands.
Since this probably depends on the language you are using, wouldn't it be
better to set an option for this behavior? If you still want to delete the
base character and composing characters you would have to hit backspace two or
If you really want to use "x" and "gx" in the same text, an option is not the
hundred-and-one symptoms of being an internet addict:
115. You are late picking up your kid from school and try to explain
to the teacher you were stuck in Web traffic.
/// Bram Moolenaar -- Bram@... -- http://www.moolenaar.net \\\
((( Creator of Vim - http://www.vim.org -- ftp://ftp.vim.org/pub/vim )))
\\\ Help me helping AIDS orphans in Uganda - http://iccf-holland.org ///