Re: multibyte and 'encoding' and 'fileencoding'
- Antoine J. Mechelynck wrote:
> Benji Fisher <benji@...> wrote:Thanks for the details. That's already more than I think I need
>> I am surprised that a file is not supposed to contain a raw
>>"\xe4". What is to stop me from doing
> If your file is in UTF-8, then obviously it must obey UTF-8 encoding rules;
> and these rules say (among other things) that:
> - Codepoints from 0000 to 007F are compatible with us-ascii and are
> encoded as one byte, with high bit off
> - Codepoints from 0080 upwards are encoded as a string of 2 or more
> bytes; the first of those is greater than 0xC0, the other(s) lie in the
> range 0x80-0xBF. The number of highbits in the first byte determines the
> number of following bytes
> So there is a strict separation between single-bytes (0x00-0x7F),
> first-bytes (0xC0-0xFF, and not all values in that range are legal) and
> following-bytes (0x80-0xBF) to avoid context ambiguity.
> Details can be found somewhere on the Unicode site, whose entry page is at
> http://www.unicode.org/ . And don't forget that if 'encoding' is set to
> utf-8, then all files will be internally represented as UTF-8 while editing,
> with translation when reading or writing non-UTF-8 files. So typing (in
> Insert mode) Ctrl-V followed by xE4 will enter the 00E4 codepoint into
> memory as two bytes, 0xC3 0xA4, but show it as one character, small a with
> umlaut; and pressing x once in Normal mode with the cursor on that
> chatracter deletes both bytes.
to know (for now) so I am not going to follow the link.
Perhaps my :put command should also insert the 00E4 codepoint, the
same as <C-V>xE4 in Insert mode.
On another thread (multibyte in patterns) Bram suggests a new "\uab"
instead of "\xab". Maybe that is the way to go...
- Benji Fisher <benji@...> wrote:
> Perhaps my :put command should also insert the 00E4 codepoint, theThat would, if done correctly, avoid putting invalid byte-sequences into
> same as <C-V>xE4 in Insert mode.
> <later>I saw that message from Bram, and noticed a patch that went with it. I think
> On another thread (multibyte in patterns) Bram suggests a new "\uab"
> instead of "\xab". Maybe that is the way to go...
it's a good idea; but since I lack a vim-compile facility, I shall wait
until it is incorporated into a (supposedly stable) binary distribution. (At
the moment I am using gvim 6.1.243 +win32 +ole.)
> --Benji Fisher