Loading ...
Sorry, an error occurred while loading the content.

Multibyte bugs

Expand Messages
  • Tony Mechelynck
    Hi Bram, 1. (Minor bug): On this system (gvim 7.2.411, Huge version with GTK2-GNOME GUI), typing Ctrl-K in Insert mode followed by two spaces doesn t give the
    Message 1 of 8 , Apr 2, 2010
    • 0 Attachment
      Hi Bram,

      1. (Minor bug): On this system (gvim 7.2.411, Huge version with
      GTK2-GNOME GUI), typing Ctrl-K in Insert mode followed by two spaces
      doesn't give the expected result: instead of U+00A0 ("Alt-space", the
      non-breaking space) I get U+E000, a CJK character. Ctrl-K NS works
      correctly.

      2. U+E000 is displayed in gvim as CJK halfwidth. Shouldn't it be fullwidth?

      3. "\<Char-nnnn>" gives wrong results for some Unicode codepoints. I
      tried to find examples and counterexamples, as follows (in the comment
      after the :echo statements, the UTF-8 expansion in hex):

      :echo "«\<Char-0x40>" | " 40
      «@»
      :echo "«\<Char-0x80>" | " C2 80
      «<80><fe>X»
      :echo "«\<Char-0x100>»" | " C4 80
      «Ā<fe>X»
      :echo "«\<Char-0x101>»" | " C4 81
      «ā»
      :echo "«\<Char-0x180>»" | " C6 80
      «ƀ<fe>X»
      :echo "«\<Char-0x190>»" | " C6 90
      «Ɛ»
      :echo "«\<Char-0x1A0>»" | " C6 A0
      «Ơ»
      :echo "«\<Char-0x1C0>»" | " C7 80
      «ǀ<fe>X»
      :echo "«\<Char-0x4E00>»" | " E4 B8 80
      «一<fe>X»
      :echo "«\<Char-0x4E01>»" | " E4 B8 81
      «丁»
      :echo "«\<Char-0x4E20>»" | " E4 B8 A0
      «丠»
      :echo "«\<Char-0x4E40>»" | " E4 B9 80
      «乀<fe>X»
      :echo "«\<Char-0xE000>»" | " EE 80 80
      «<ee><80><fe>X<80><fe>X»
      :echo "«\<Char-57344>»" | " EE 80 80
      «<ee><80><fe>X<80><fe>X»
      :echo "«\<Char-0xE001>»" | " EE 80 81
      «<ee><80><fe>X<81>»"
      :echo "«\<Char-0xE040>»" | " EE 81 80
      «<fe>X»

      This seems to indicate that the extra bytes 0xFE 0x58 appear after any
      0x80 in the UTF-8 expansion of the character. (I added the « »
      characters to "bound" the display so that any extra whitespace would be
      visible but they change nothing to the bug.)

      The bug does not occur after Ctrl-V u in Insert mode or when using
      <Char-...> in an Insert-mode mapping. It does when using "\<Char-...>"
      in other commands than :echo. Note the following:

      :let j = "\<Char-0xE000>"
      :let j
      j <ee><80><fe>X<80><fe>X
      i<Ctrl-R>=j<Enter>
      î<t_þ>X<t_þ>X

      (where <Ctrl-R> and <Enter> are one keystroke each, not counting
      modifiers). Apparently gvim tries to interpret 0x80 0xFE as a "special
      key", and "resolves" it (incorrectly) as <t_þ>.

      Two very big files were loaded when I first noticed bug #3, but
      restarting gvim without them reproduced the bug again with the same
      spurious bytes.


      Best regards,
      Tony.
      --
      Alimony is a system by which, when two people make a mistake, one of
      them keeps paying for it.
      -- Peggy Joyce

      --
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php

      To unsubscribe, reply using "remove me" as the subject.
    • Tony Mechelynck
      ... Update: There is a second case which triggers incorrect behaviour in when encoding is UTF-8: - As noted above, after every 0x80 byte in
      Message 2 of 8 , Apr 7, 2010
      • 0 Attachment
        On 03/04/10 06:36, Tony Mechelynck wrote:
        > Hi Bram,
        >
        > 1. (Minor bug): On this system (gvim 7.2.411, Huge version with
        > GTK2-GNOME GUI), typing Ctrl-K in Insert mode followed by two spaces
        > doesn't give the expected result: instead of U+00A0 ("Alt-space", the
        > non-breaking space) I get U+E000, a CJK character. Ctrl-K NS works
        > correctly.
        >
        > 2. U+E000 is displayed in gvim as CJK halfwidth. Shouldn't it be fullwidth?
        >
        > 3. "\<Char-nnnn>" gives wrong results for some Unicode codepoints. I
        > tried to find examples and counterexamples, as follows (in the comment
        > after the :echo statements, the UTF-8 expansion in hex):
        >
        > :echo "«\<Char-0x40>" | " 40
        > «@»
        > :echo "«\<Char-0x80>" | " C2 80
        > «<80><fe>X»
        > :echo "«\<Char-0x100>»" | " C4 80
        > «Ā<fe>X»
        > :echo "«\<Char-0x101>»" | " C4 81
        > «ā»
        > :echo "«\<Char-0x180>»" | " C6 80
        > «ƀ<fe>X»
        > :echo "«\<Char-0x190>»" | " C6 90
        > «Ɛ»
        > :echo "«\<Char-0x1A0>»" | " C6 A0
        > «Ơ»
        > :echo "«\<Char-0x1C0>»" | " C7 80
        > «ǀ<fe>X»
        > :echo "«\<Char-0x4E00>»" | " E4 B8 80
        > «一<fe>X»
        > :echo "«\<Char-0x4E01>»" | " E4 B8 81
        > «丁»
        > :echo "«\<Char-0x4E20>»" | " E4 B8 A0
        > «丠»
        > :echo "«\<Char-0x4E40>»" | " E4 B9 80
        > «乀<fe>X»
        > :echo "«\<Char-0xE000>»" | " EE 80 80
        > «<ee><80><fe>X<80><fe>X»
        > :echo "«\<Char-57344>»" | " EE 80 80
        > «<ee><80><fe>X<80><fe>X»
        > :echo "«\<Char-0xE001>»" | " EE 80 81
        > «<ee><80><fe>X<81>»"
        > :echo "«\<Char-0xE040>»" | " EE 81 80
        > «<fe>X»
        >
        > This seems to indicate that the extra bytes 0xFE 0x58 appear after any
        > 0x80 in the UTF-8 expansion of the character. (I added the « »
        > characters to "bound" the display so that any extra whitespace would be
        > visible but they change nothing to the bug.)
        >
        > The bug does not occur after Ctrl-V u in Insert mode or when using
        > <Char-...> in an Insert-mode mapping. It does when using "\<Char-...>"
        > in other commands than :echo. Note the following:
        >
        > :let j = "\<Char-0xE000>"
        > :let j
        > j <ee><80><fe>X<80><fe>X
        > i<Ctrl-R>=j<Enter>
        > î<t_þ>X<t_þ>X
        >
        > (where <Ctrl-R> and <Enter> are one keystroke each, not counting
        > modifiers). Apparently gvim tries to interpret 0x80 0xFE as a "special
        > key", and "resolves" it (incorrectly) as <t_þ>.
        >
        > Two very big files were loaded when I first noticed bug #3, but
        > restarting gvim without them reproduced the bug again with the same
        > spurious bytes.
        >
        >
        > Best regards,
        > Tony.

        Update: There is a second case which triggers incorrect behaviour in
        "\<Char-nnnn>" when 'encoding' is UTF-8:

        - As noted above, after every 0x80 byte in the UTF-8 representation, the
        bytes 0xFE 0x58 are spuriously added: after the UTF-8 string if the 0x80
        is its last byte (giving two invalid bytes after the correct multibyte
        glyph), and/or in the middle of it if there is a 0x80 byte other than
        the last (making the whole multibyte sequence invalid; the 0x80 can
        never be the first byte, because the first byte of a multibyte UTF-8
        sequence is >= 0xC0 [0xC2 actually, except for "overlong" sequences
        representing ASCII bytes], and it can not be an "only byte" because
        single-byte sequences are <= 0x7F).

        - In addition, after every 0x9B byte, the bytes 0xFD 0x4F are added,
        also immediately after that byte, breaking the UTF-8 sequence if it
        isn't the last byte.

        - The above are repeatable "every time", even from one run of gvim to
        the next, and I always get 0x80 0xFE 0x58 instead of 0x80, and 0x9B 0xFD
        0x4F instead of 0x9B, in all the UTF-8 sequences generated by the
        "\<Char-nnnn>" construct.

        - Removing the spurious bytes (including those in the middle of a byte
        sequence) make the correct multibyte glyph appear immediately (I'm
        assuming, of course, that 'encoding' is still set to UTF-8).

        - The fact that those two byte values, 0x80 aka Alt-Null and 0x9B aka
        Alt-Escape aka CSI, play special roles in gvim's representation of
        special keys, might help to spot where the bug comes from. (Yes, did I
        say it? I tested all this in GUI mode, in my usual "Huge" gvim with
        GTK2/Gnome GUI, and, of course, with +multi_byte among others. Currently
        at patchlevel 7.2.411)


        I'm crossposting this update to vim_dev because my first post (in
        vim_multibyte) got no reply whatsoever; but it was only four days ago,
        and the Easter holiday is upon us; maybe I wasn't patient enough.


        Have a nice holiday, and Happy Vimming!
        Tony.
        --
        Immortality -- a fate worse than death.
        -- Edgar A. Shoaff

        --
        You received this message from the "vim_multibyte" maillist.
        For more information, visit http://www.vim.org/maillist.php

        To unsubscribe, reply using "remove me" as the subject.
      • Bram Moolenaar
        ... Why do you expect CTRL-K to produce 0xa0? According to http://www.faqs.org/rfcs/rfc1345.html it s 0xe000. ... Why would it be a
        Message 3 of 8 , Apr 10, 2010
        • 0 Attachment
          Tony Mechelynck wrote:

          > 1. (Minor bug): On this system (gvim 7.2.411, Huge version with
          > GTK2-GNOME GUI), typing Ctrl-K in Insert mode followed by two spaces
          > doesn't give the expected result: instead of U+00A0 ("Alt-space", the
          > non-breaking space) I get U+E000, a CJK character. Ctrl-K NS works
          > correctly.

          Why do you expect CTRL-K <space> <space> to produce 0xa0? According to
          http://www.faqs.org/rfcs/rfc1345.html it's 0xe000.

          > 2. U+E000 is displayed in gvim as CJK halfwidth. Shouldn't it be fullwidth?

          Why would it be a double-width character? In
          http://unicode.org/Public/UNIDATA/EastAsianWidth.txt it's marked as
          "private use".

          > 3. "\<Char-nnnn>" gives wrong results for some Unicode codepoints. I
          > tried to find examples and counterexamples, as follows (in the comment
          > after the :echo statements, the UTF-8 expansion in hex):
          >
          > :echo "«\<Char-0x40>" | " 40
          > «@»
          > :echo "«\<Char-0x80>" | " C2 80
          > «<80><fe>X»
          > :echo "«\<Char-0x100>»" | " C4 80
          > «Ā<fe>X»
          > :echo "«\<Char-0x101>»" | " C4 81
          > Â«Ä Â»
          > :echo "«\<Char-0x180>»" | " C6 80
          > «ƀ<fe>X»
          > :echo "«\<Char-0x190>»" | " C6 90
          > Â«Æ Â»
          > :echo "«\<Char-0x1A0>»" | " C6 A0
          > Â«Æ Â»
          > :echo "«\<Char-0x1C0>»" | " C7 80
          > «ǀ<fe>X»
          > :echo "«\<Char-0x4E00>»" | " E4 B8 80
          > «一<fe>X»
          > :echo "«\<Char-0x4E01>»" | " E4 B8 81
          > Â«ä¸ Â»
          > :echo "«\<Char-0x4E20>»" | " E4 B8 A0
          > Â«ä¸ Â»
          > :echo "«\<Char-0x4E40>»" | " E4 B9 80
          > «乀<fe>X»
          > :echo "«\<Char-0xE000>»" | " EE 80 80
          > «<ee><80><fe>X<80><fe>X»
          > :echo "«\<Char-57344>»" | " EE 80 80
          > «<ee><80><fe>X<80><fe>X»
          > :echo "«\<Char-0xE001>»" | " EE 80 81
          > «<ee><80><fe>X<81>»"
          > :echo "«\<Char-0xE040>»" | " EE 81 80
          > Â«î €<fe>X»
          >
          > This seems to indicate that the extra bytes 0xFE 0x58 appear after any
          > 0x80 in the UTF-8 expansion of the character. (I added the « »
          > characters to "bound" the display so that any extra whitespace would be
          > visible but they change nothing to the bug.)

          The form "\<xxx>" is for special keys, not characters. For the character
          itself use \x or \u or \U. See ":help expr-string".
          The special keys are escaped for use in a mapping.

          > The bug does not occur after Ctrl-V u in Insert mode or when using
          > <Char-...> in an Insert-mode mapping. It does when using "\<Char-...>"
          > in other commands than :echo. Note the following:
          >
          > :let j = "\<Char-0xE000>"
          > :let j
          > j <ee><80><fe>X<80><fe>X
          > i<Ctrl-R>=j<Enter>
          > î<t_þ>X<t_þ>X
          >
          > (where <Ctrl-R> and <Enter> are one keystroke each, not counting
          > modifiers). Apparently gvim tries to interpret 0x80 0xFE as a "special
          > key", and "resolves" it (incorrectly) as <t_þ>.
          >
          > Two very big files were loaded when I first noticed bug #3, but
          > restarting gvim without them reproduced the bug again with the same
          > spurious bytes.

          --
          SUPERIMPOSE "England AD 787". After a few more seconds we hear hoofbeats in
          the distance. They come slowly closer. Then out of the mist comes KING
          ARTHUR followed by a SERVANT who is banging two half coconuts together.
          "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

          /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
          /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
          \\\ download, build and distribute -- http://www.A-A-P.org ///
          \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

          --
          You received this message from the "vim_multibyte" maillist.
          For more information, visit http://www.vim.org/maillist.php

          To unsubscribe, reply using "remove me" as the subject.
        • Tony Mechelynck
          ... When {char} is 0x20 i.e. , the above tells me that CTRL-K gives 0xA0 i.e. the non-breaking space, which is useful to enter the
          Message 4 of 8 , Apr 10, 2010
          • 0 Attachment
            On 10/04/10 23:43, Bram Moolenaar wrote:
            >
            > Tony Mechelynck wrote:
            >
            >> 1. (Minor bug): On this system (gvim 7.2.411, Huge version with
            >> GTK2-GNOME GUI), typing Ctrl-K in Insert mode followed by two spaces
            >> doesn't give the expected result: instead of U+00A0 ("Alt-space", the
            >> non-breaking space) I get U+E000, a CJK character. Ctrl-K NS works
            >> correctly.
            >
            > Why do you expect CTRL-K<space> <space> to produce 0xa0? According to
            > http://www.faqs.org/rfcs/rfc1345.html it's 0xe000.

            Because of the following paragraph at lines 99-100 of digraph.txt:

            ----8<----
            > For CTRL-K, there is one general digraph: CTRL-K <Space> {char} will enter
            > {char} with the highest bit set. You can use this to enter meta-characters.
            ---->8----

            When {char} is 0x20 i.e. <Space>, the above tells me that CTRL-K <Space>
            <Space> gives 0xA0 i.e. the non-breaking space, which is useful to enter
            the "meta-character" Meta-Space if I don't remember the NS digraph. If
            U+E000 is a "private use" character, I don't see why it needs a digraph
            of its own anyway.

            On reading that RFC, which states in its beginning paragraph that it has
            no normative value whatsoever, I see (at the very end of section 3)
            quite a number of digraphs and trigraphs assigned to U+E000 to U+E028,
            in what Unicode calls a "private use area": see for instance the very
            start of http://www.unicode.org/charts/pdf/UE000.pdf:

            ----8<----
            Private Use Area
            Range: E000–F8FF
            The Private Use Area does not contain any character assignments,
            consequently no character code charts or namelists are provided for this
            area.
            ---->8----

            At least some of the characters listed there in the RFC have a different
            Unicode codepoint assigned to them, but maybe Unicode assigned them
            after the RFC (dated June 1992) was published. Personally I have strong
            doubts as to the usefulness of any Vim digraph for a "private use"
            character. U+E000 is listed as "indicates unfinished (Mnemonic)". I'm
            not sure what that means, unless maybe that a blank space in a charset
            chart (further down in the same RFC) indicates that the chart is unfinished?

            >
            >> 2. U+E000 is displayed in gvim as CJK halfwidth. Shouldn't it be fullwidth?
            >
            > Why would it be a double-width character? In
            > http://unicode.org/Public/UNIDATA/EastAsianWidth.txt it's marked as
            > "private use".

            Ah, I see. FWIW my usual 'guifont' has a glyph for it, which AFAICT is a
            fullwidth CJK glyph. OTOH the Unihan database does not mention it.

            >
            >> 3. "\<Char-nnnn>" gives wrong results for some Unicode codepoints.
            [...]
            >
            > The form "\<xxx>" is for special keys, not characters. For the character
            > itself use \x or \u or \U. See ":help expr-string".
            > The special keys are escaped for use in a mapping.

            The example given at |expr-string| is "\<C-W>" which is the "<control>"
            character defined by ASCII as 0x17 ("\x17") and by Unicode as U+0017
            ("\u0017"), not a "special" non-ASCII key like <F8>, <Home> or
            <PageDown>. I had always thought that _every_ <> name could be used in a
            double-quoted string with a backslash prefix, and indeed I have verified
            that it works for all the <Char-nnnn> or <Char-0xnnnn> that I tested
            _except_ those whose UTF-8 expansion includes either or both of the
            bytes 0x80 and 0x9B, in which case two spurious bytes are inserted
            immediately after every occurrence of a 0x80 or 0x9B byte.

            If this bug is WONTFIX, I suggest to mention explicitly at the bottom of
            the list under |expr-quote| that the \<xxx> form does not apply if xxx
            is Char-nnnn or Char-0xnnnn.


            Best regards,
            Tony.
            --
            "To whoever finds this note -
            I have been imprisoned by my father who wishes me to marry
            against my will. Please please please please come and rescue me.
            I am in the tall tower of Swamp Castle."
            SIR LAUNCELOT's eyes light up with holy inspiration.
            "Monty Python and the Holy Grail" PYTHON (MONTY)
            PICTURES LTD

            --
            You received this message from the "vim_multibyte" maillist.
            For more information, visit http://www.vim.org/maillist.php
          • Bram Moolenaar
            ... Ah, OK. ... It s weird that digraphs are defined for an area that doesn t have characters assigned to it. I wonder what happened here. Perhaps this
            Message 5 of 8 , Apr 11, 2010
            • 0 Attachment
              Tony Mechelynck wrote:

              > >> 1. (Minor bug): On this system (gvim 7.2.411, Huge version with
              > >> GTK2-GNOME GUI), typing Ctrl-K in Insert mode followed by two spaces
              > >> doesn't give the expected result: instead of U+00A0 ("Alt-space", the
              > >> non-breaking space) I get U+E000, a CJK character. Ctrl-K NS works
              > >> correctly.
              > >
              > > Why do you expect CTRL-K<space> <space> to produce 0xa0? According to
              > > http://www.faqs.org/rfcs/rfc1345.html it's 0xe000.
              >
              > Because of the following paragraph at lines 99-100 of digraph.txt:
              >
              > ----8<----
              > > For CTRL-K, there is one general digraph: CTRL-K <Space> {char} will enter
              > > {char} with the highest bit set. You can use this to enter meta-characters.
              > ---->8----
              >
              > When {char} is 0x20 i.e. <Space>, the above tells me that CTRL-K <Space>
              > <Space> gives 0xA0 i.e. the non-breaking space, which is useful to enter
              > the "meta-character" Meta-Space if I don't remember the NS digraph. If
              > U+E000 is a "private use" character, I don't see why it needs a digraph
              > of its own anyway.

              Ah, OK.

              > On reading that RFC, which states in its beginning paragraph that it has
              > no normative value whatsoever, I see (at the very end of section 3)
              > quite a number of digraphs and trigraphs assigned to U+E000 to U+E028,
              > in what Unicode calls a "private use area": see for instance the very
              > start of http://www.unicode.org/charts/pdf/UE000.pdf:
              >
              > ----8<----
              > Private Use Area
              > Range: E000–F8FF
              > The Private Use Area does not contain any character assignments,
              > consequently no character code charts or namelists are provided for this
              > area.
              > ---->8----
              >
              > At least some of the characters listed there in the RFC have a different
              > Unicode codepoint assigned to them, but maybe Unicode assigned them
              > after the RFC (dated June 1992) was published. Personally I have strong
              > doubts as to the usefulness of any Vim digraph for a "private use"
              > character. U+E000 is listed as "indicates unfinished (Mnemonic)". I'm
              > not sure what that means, unless maybe that a blank space in a charset
              > chart (further down in the same RFC) indicates that the chart is unfinished?

              It's weird that digraphs are defined for an area that doesn't have
              characters assigned to it. I wonder what happened here. Perhaps this
              changed at some point in time? If we know the reason we may want to
              drop all the dibgraphs for 0xexxx.


              > >> 2. U+E000 is displayed in gvim as CJK halfwidth. Shouldn't it be fullwidth?
              > >
              > > Why would it be a double-width character? In
              > > http://unicode.org/Public/UNIDATA/EastAsianWidth.txt it's marked as
              > > "private use".
              >
              > Ah, I see. FWIW my usual 'guifont' has a glyph for it, which AFAICT is a
              > fullwidth CJK glyph. OTOH the Unihan database does not mention it.
              >
              > >
              > >> 3. "\<Char-nnnn>" gives wrong results for some Unicode codepoints.
              > [...]
              > >
              > > The form "\<xxx>" is for special keys, not characters. For the character
              > > itself use \x or \u or \U. See ":help expr-string".
              > > The special keys are escaped for use in a mapping.
              >
              > The example given at |expr-string| is "\<C-W>" which is the "<control>"
              > character defined by ASCII as 0x17 ("\x17") and by Unicode as U+0017
              > ("\u0017"), not a "special" non-ASCII key like <F8>, <Home> or
              > <PageDown>. I had always thought that _every_ <> name could be used in a
              > double-quoted string with a backslash prefix, and indeed I have verified
              > that it works for all the <Char-nnnn> or <Char-0xnnnn> that I tested
              > _except_ those whose UTF-8 expansion includes either or both of the
              > bytes 0x80 and 0x9B, in which case two spurious bytes are inserted
              > immediately after every occurrence of a 0x80 or 0x9B byte.
              >
              > If this bug is WONTFIX, I suggest to mention explicitly at the bottom of
              > the list under |expr-quote| that the \<xxx> form does not apply if xxx
              > is Char-nnnn or Char-0xnnnn.

              Yes.

              --
              SOLDIER: Where did you get the coconuts?
              ARTHUR: Through ... We found them.
              SOLDIER: Found them? In Mercea. The coconut's tropical!
              "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

              /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
              /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
              \\\ download, build and distribute -- http://www.A-A-P.org ///
              \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

              --
              You received this message from the "vim_multibyte" maillist.
              For more information, visit http://www.vim.org/maillist.php

              To unsubscribe, reply using "remove me" as the subject.
            • Tony Mechelynck
              On 11/04/10 16:33, Bram Moolenaar wrote: [...] ... [...] My guess is that when that RFC was drafted in 1992, some of the charsets they wanted to list used a
              Message 6 of 8 , Apr 11, 2010
              • 0 Attachment
                On 11/04/10 16:33, Bram Moolenaar wrote:
                [...]
                > It's weird that digraphs are defined for an area that doesn't have
                > characters assigned to it. I wonder what happened here. Perhaps this
                > changed at some point in time? If we know the reason we may want to
                > drop all the dibgraphs for 0xexxx.
                [...]

                My guess is that when that RFC was drafted in 1992, some of the charsets
                they wanted to list used a few characters which, at that time, weren't
                clearly assigned to one Unicode codepoint, and that the RFC authors
                arbitrarily (and maybe temporarily) placed these characters in a
                "private use area", which is the only place where "characters not yet
                assigned a Unicode codepoint" may go. This is only a guess, however. I'm
                not sure how many people are reading this (extremely low-volume) ML, but
                maybe someone knows the history of those mnemonics from RFC 1345 better
                than you and I do? If someone with that knowledge is reading this,
                please speak up.

                IMHO it makes no sense to have digraphs in Vim for "private use"
                characters. I propose to drop any of them that cannot be usefully
                reassigned to some "official" Unicode codepoint elsewhere. E000 to E028
                means forty-one codepoints, it ought not to be a big problem.


                Best regards,
                Tony.
                --
                LAUNCELOT: At last! A call! A cry of distress ...
                (he draws his sword, and turns to CONCORDE)
                Concorde! Brave, Concorde ... you shall not have died in vain!
                CONCORDE: I'm not quite dead, sir ...
                "Monty Python and the Holy Grail" PYTHON (MONTY)
                PICTURES LTD

                --
                You received this message from the "vim_multibyte" maillist.
                For more information, visit http://www.vim.org/maillist.php

                To unsubscribe, reply using "remove me" as the subject.
              • Bram Moolenaar
                ... Searching revealed a few proposals for these character ranges. And this page has a confusing summary:
                Message 7 of 8 , Apr 11, 2010
                • 0 Attachment
                  Tony Mechelynck wrote:

                  > On 11/04/10 16:33, Bram Moolenaar wrote:
                  > [...]
                  > > It's weird that digraphs are defined for an area that doesn't have
                  > > characters assigned to it. I wonder what happened here. Perhaps this
                  > > changed at some point in time? If we know the reason we may want to
                  > > drop all the dibgraphs for 0xexxx.
                  > [...]
                  >
                  > My guess is that when that RFC was drafted in 1992, some of the charsets
                  > they wanted to list used a few characters which, at that time, weren't
                  > clearly assigned to one Unicode codepoint, and that the RFC authors
                  > arbitrarily (and maybe temporarily) placed these characters in a
                  > "private use area", which is the only place where "characters not yet
                  > assigned a Unicode codepoint" may go. This is only a guess, however. I'm
                  > not sure how many people are reading this (extremely low-volume) ML, but
                  > maybe someone knows the history of those mnemonics from RFC 1345 better
                  > than you and I do? If someone with that knowledge is reading this,
                  > please speak up.
                  >
                  > IMHO it makes no sense to have digraphs in Vim for "private use"
                  > characters. I propose to drop any of them that cannot be usefully
                  > reassigned to some "official" Unicode codepoint elsewhere. E000 to E028
                  > means forty-one codepoints, it ought not to be a big problem.

                  Searching revealed a few proposals for these character ranges. And
                  this page has a confusing summary:
                  http://en.wikibooks.org/wiki/Unicode/Character_reference/E000-EFFF
                  "private use" but it does have a table with characters.

                  Let's remove these digraphs. I can't imagine anyone is using them.

                  --
                  Clothes make the man. Naked people have little or no influence on society.
                  -- Mark Twain (Samuel Clemens) (1835-1910)

                  /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                  /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                  \\\ download, build and distribute -- http://www.A-A-P.org ///
                  \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                  --
                  You received this message from the "vim_multibyte" maillist.
                  For more information, visit http://www.vim.org/maillist.php

                  To unsubscribe, reply using "remove me" as the subject.
                • Tony Mechelynck
                  ... [...] ... Yes; in my browser and with my usual font most (but not all) of them are CJK fullwidth ideograms and full-width counterparts of halfwidth math
                  Message 8 of 8 , Apr 12, 2010
                  • 0 Attachment
                    On 11/04/10 17:33, Bram Moolenaar wrote:
                    >
                    > Tony Mechelynck wrote:
                    [...]
                    >> IMHO it makes no sense to have digraphs in Vim for "private use"
                    >> characters. I propose to drop any of them that cannot be usefully
                    >> reassigned to some "official" Unicode codepoint elsewhere. E000 to E028
                    >> means forty-one codepoints, it ought not to be a big problem.
                    >
                    > Searching revealed a few proposals for these character ranges. And
                    > this page has a confusing summary:
                    > http://en.wikibooks.org/wiki/Unicode/Character_reference/E000-EFFF
                    > "private use" but it does have a table with characters.

                    Yes; in my browser and with my usual font most (but not all) of them are
                    CJK fullwidth ideograms and full-width counterparts of halfwidth math
                    symbols etc. A few are (halfwidth) Latin accented letters which even
                    exist in Latin1 i.e. below U+0100 !!! For instance (in my browser)
                    U+E023 to U+E081 look like duplicates of ASCII 0x21 to 0x7E in the same
                    order. Note however the last sentence immediately before the table:

                    «The repertoire seen with your computer's font will most likely not be
                    the same as with other computers or fonts.»

                    And indeed I see a different glyph for those codepoints in gvim with my
                    usual 'guifont', which is not the same as my browser's usual serif and
                    sans-serif fonts.

                    >
                    > Let's remove these digraphs. I can't imagine anyone is using them.
                    >

                    Neither can I.


                    Best regards,
                    Tony.
                    --
                    LAUNCELOT leaps into SHOT with a mighty cry and runs the GUARD
                    through and
                    hacks him to the floor. Blood. Swashbuckling music (perhaps).
                    LAUNCELOT races through into the castle screaming.
                    SECOND SENTRY: Hey!
                    "Monty Python and the Holy Grail" PYTHON (MONTY)
                    PICTURES LTD

                    --
                    You received this message from the "vim_multibyte" maillist.
                    For more information, visit http://www.vim.org/maillist.php
                  Your message has been successfully submitted and would be delivered to recipients shortly.