Loading ...
Sorry, an error occurred while loading the content.

Re: Combining characters U+035x are not supported?

Expand Messages
  • Tony Mechelynck
    ... According to the current version (5.2) of the Unicode Standard, all codepoints U+0300 to U+036F are indeed combining characters AFAICT, see
    Message 1 of 11 , Jan 1, 2010
    • 0 Attachment
      On 02/01/10 04:47, Iông Chun wrote:
      > Happy new year, everyone!
      >
      > I use Vim to edit an input method table these days, and find that
      > it doesn't work well with one combining character, Unicode U+0358.
      > Using g8 on "capital O with dot above right" shows "4f" only,
      > and the dot is another character ("cd 98").
      >
      > I look from the source to see how Vim check if a character is a
      > combining
      > character, and find the following code:
      >
      > <code>
      > /*
      > * Return TRUE if "c" is a composing UTF-8 character. This means it
      > will be
      > * drawn on top of the preceding character.
      > * Based on code from Markus Kuhn.
      > */
      > int
      > utf_iscomposing(c)
      > int c;
      > {
      > /* sorted list of non-overlapping intervals */
      > static struct interval combining[] =
      > {
      > {0x0300, 0x034f}, {0x0360, 0x036f}, {0x0483, 0x0486}, {0x0488,
      > 0x0489},
      > </code>
      >
      > Is there a reason why characters from 0x0350 to 0x035f are skipped?
      > It looks like to me that characters in Unicode block
      > 'Combining Diacritical Marks', range from 0x0300 to 0x036f,
      > should all be combing characters.
      >
      > A small patch likes this works well to me,
      > using g8 on "capital O with dot above right" shows "4f + cd 98" now:
      >
      > <patch>
      > --- work/vim72/src/mbyte.c~ 2010-01-02 10:18:01.000000000 +0800
      > +++ work/vim72/src/mbyte.c 2010-01-02 11:19:24.000000000 +0800
      > @@ -1976,7 +1976,7 @@
      > /* sorted list of non-overlapping intervals */
      > static struct interval combining[] =
      > {
      > - {0x0300, 0x034f}, {0x0360, 0x036f}, {0x0483, 0x0486}, {0x0488,
      > 0x0489},
      > + {0x0300, 0x036f}, {0x0483, 0x0486}, {0x0488, 0x0489},
      > {0x0591, 0x05a1}, {0x05a3, 0x05b9}, {0x05bb, 0x05bd}, {0x05bf,
      > 0x05bf},
      > {0x05c1, 0x05c2}, {0x05c4, 0x05c4}, {0x0610, 0x0615}, {0x064b,
      > 0x0658},
      > {0x0670, 0x0670}, {0x06d6, 0x06dc}, {0x06de, 0x06e4}, {0x06e7,
      > 0x06e8},
      > </patch>
      >
      > Regards,
      > Iông Chun
      >

      According to the current version (5.2) of the Unicode Standard, all
      codepoints U+0300 to U+036F are indeed combining characters AFAICT, see
      http://www.unicode.org/charts/PDF/U0300.pdf

      However, those in the range U+0350 to U+035F are particularly
      "esoteric", and I believe that it is possible that they were added in a
      relatively recent version of the Standard; previous versions (including,
      maybe, the one which was current when that module was written) would
      then have these codepoints "undefined".

      BTW, I notice in a comment at lines 29-30 of that same module:

      > * To make things complicated, up to two composing characters
      > * are allowed. These are drawn on top of the first char.

      This is now only true with the default settings. The 'maxcombine' option
      was added (relatively recently) to allow displaying (if the user sets a
      non-default value) up to 6 combining characters on top of each spacing
      character; even more than that can be "edited but not displayed".
      Shouldn't that comment be updated?


      Best regards,
      Tony.
      --
      Children are unpredictable. You never know what inconsistency they're
      going to catch you in next.
      -- Franklin P. Jones

      --
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
    • Iông Chun
      Hi Tony and list, I understand now. Characters ranged from U+0350 to U+0357, and from U+035D to U+035F, are added in Unicode 4.0. Characters ranged from U+0358
      Message 2 of 11 , Jan 2, 2010
      • 0 Attachment
        Hi Tony and list,

        I understand now.
        Characters ranged from U+0350 to U+0357, and from U+035D to U+035F,
        are added in Unicode 4.0.
        Characters ranged from U+0358 (which I used) to U+035C are added even
        later in Unicode 4.1.
        Check this from this docuent: http://www.unicode.org/Public/UNIDATA/DerivedAge.txt

        I use Vim version 7.2.323, its combining character table seems contain
        most of Unicode 4.0,
        but without U+0350 to U+0357 and U+035D to U+035F.

        I will make a patch to add those additional combining characters,
        according to Unicode 5.2.

        Iông Chun

        On 1月2日, 下午1時25分, Tony Mechelynck <antoine.mechely...@...>
        wrote:
        > On 02/01/10 04:47, I ng Chun wrote:
        >
        >
        >
        > > Happy new year, everyone!
        >
        > > I use Vim to edit an input method table these days, and find that
        > > it doesn't work well with one combining character, Unicode U+0358.
        > > Using g8 on "capital O with dot above right" shows "4f" only,
        > > and the dot is another character ("cd 98").
        >
        > > I look from the source to see how Vim check if a character is a
        > > combining
        > > character, and find the following code:
        >
        > > <code>
        > > /*
        > >   * Return TRUE if "c" is a composing UTF-8 character.  This means it
        > > will be
        > >   * drawn on top of the preceding character.
        > >   * Based on code from Markus Kuhn.
        > >   */
        > >      int
        > > utf_iscomposing(c)
        > >      int         c;
        > > {
        > >      /* sorted list of non-overlapping intervals */
        > >      static struct interval combining[] =
        > >      {
        > >          {0x0300, 0x034f}, {0x0360, 0x036f}, {0x0483, 0x0486}, {0x0488,
        > > 0x0489},
        > > </code>
        >
        > > Is there a reason why characters from 0x0350 to 0x035f are skipped?
        > > It looks like to me that characters in Unicode block
        > > 'Combining Diacritical Marks', range from 0x0300 to 0x036f,
        > > should all be combing characters.
        >
        > > A small patch likes this works well to me,
        > > using g8 on "capital O with dot above right" shows "4f + cd 98" now:
        >
        > > <patch>
        > > --- work/vim72/src/mbyte.c~     2010-01-02 10:18:01.000000000 +0800
        > > +++ work/vim72/src/mbyte.c      2010-01-02 11:19:24.000000000 +0800
        > > @@ -1976,7 +1976,7 @@
        > >       /* sorted list of non-overlapping intervals */
        > >       static struct interval combining[] =
        > >       {
        > > -       {0x0300, 0x034f}, {0x0360, 0x036f}, {0x0483, 0x0486}, {0x0488,
        > > 0x0489},
        > > +       {0x0300, 0x036f}, {0x0483, 0x0486}, {0x0488, 0x0489},
        > >          {0x0591, 0x05a1}, {0x05a3, 0x05b9}, {0x05bb, 0x05bd}, {0x05bf,
        > > 0x05bf},
        > >          {0x05c1, 0x05c2}, {0x05c4, 0x05c4}, {0x0610, 0x0615}, {0x064b,
        > > 0x0658},
        > >          {0x0670, 0x0670}, {0x06d6, 0x06dc}, {0x06de, 0x06e4}, {0x06e7,
        > > 0x06e8},
        > > </patch>
        >
        > > Regards,
        > > I ng Chun
        >
        > According to the current version (5.2) of the Unicode Standard, all
        > codepoints U+0300 to U+036F are indeed combining characters AFAICT, seehttp://www.unicode.org/charts/PDF/U0300.pdf
        >
        > However, those in the range U+0350 to U+035F are particularly
        > "esoteric", and I believe that it is possible that they were added in a
        > relatively recent version of the Standard; previous versions (including,
        > maybe, the one which was current when that module was written) would
        > then have these codepoints "undefined".
        >
        > BTW, I notice in a comment at lines 29-30 of that same module:
        >
        > >  *             To make things complicated, up to two composing characters
        > >  *             are allowed.  These are drawn on top of the first char.
        >
        > This is now only true with the default settings. The 'maxcombine' option
        > was added (relatively recently) to allow displaying (if the user sets a
        > non-default value) up to 6 combining characters on top of each spacing
        > character; even more than that can be "edited but not displayed".
        > Shouldn't that comment be updated?
        >
        > Best regards,
        > Tony.
        > --
        > Children are unpredictable.  You never know what inconsistency they're
        > going to catch you in next.
        >                 -- Franklin P. Jones

        --
        You received this message from the "vim_multibyte" maillist.
        For more information, visit http://www.vim.org/maillist.php
      • Iông Chun
        Hi all, This is a patch to update combining character table from Unicode 5.2. I add these by eyes and hands, so there might be errors ;) ... +++
        Message 3 of 11 , Jan 2, 2010
        • 0 Attachment
          Hi all,

          This is a patch to update combining character table from Unicode 5.2.
          I add these by eyes and hands, so there might be errors ;)

          <patch>
          --- work/vim72/src/mbyte.c.bak 2010-01-02 13:33:40.000000000 +0800
          +++ work/vim72/src/mbyte.c 2010-01-02 18:33:49.000000000 +0800
          @@ -1976,35 +1976,64 @@
          /* sorted list of non-overlapping intervals */
          static struct interval combining[] =
          {
          - {0x0300, 0x034f}, {0x0360, 0x036f}, {0x0483, 0x0486}, {0x0488,
          0x0489},
          - {0x0591, 0x05a1}, {0x05a3, 0x05b9}, {0x05bb, 0x05bd}, {0x05bf,
          0x05bf},
          - {0x05c1, 0x05c2}, {0x05c4, 0x05c4}, {0x0610, 0x0615}, {0x064b,
          0x0658},
          - {0x0670, 0x0670}, {0x06d6, 0x06dc}, {0x06de, 0x06e4}, {0x06e7,
          0x06e8},
          - {0x06ea, 0x06ed}, {0x0711, 0x0711}, {0x0730, 0x074a}, {0x07a6,
          0x07b0},
          - {0x0901, 0x0903}, {0x093c, 0x093c}, {0x093e, 0x094d}, {0x0951,
          0x0954},
          + {0x0300, 0x036f},
          + {0x0483, 0x0487}, {0x0488, 0x0489},
          + {0x0591, 0x05bd}, {0x05bf, 0x05bf}, {0x05c1, 0x05c2}, {0x05c4,
          0x05c5},
          + {0x05c7, 0x05c7},
          + {0x0610, 0x061a}, {0x064b, 0x065e}, {0x0670, 0x0670}, {0x06d6,
          0x06dc},
          + {0x06de, 0x06e4}, {0x06e7, 0x06e8}, {0x06ea, 0x06ed},
          + {0x0711, 0x0711}, {0x0730, 0x074a}, {0x07a6, 0x07b0}, {0x07eb,
          0x07f3},
          + {0x0816, 0x0819}, {0x081b, 0x0823}, {0x0825, 0x0827}, {0x0829,
          0x082d},
          + {0x0900, 0x0903}, {0x093c, 0x093c}, {0x093e, 0x094e}, {0x0951,
          0x0955},
          {0x0962, 0x0963}, {0x0981, 0x0983}, {0x09bc, 0x09bc}, {0x09be,
          0x09c4},
          {0x09c7, 0x09c8}, {0x09cb, 0x09cd}, {0x09d7, 0x09d7}, {0x09e2,
          0x09e3},
          {0x0a01, 0x0a03}, {0x0a3c, 0x0a3c}, {0x0a3e, 0x0a42}, {0x0a47,
          0x0a48},
          - {0x0a4b, 0x0a4d}, {0x0a70, 0x0a71}, {0x0a81, 0x0a83}, {0x0abc,
          0x0abc},
          - {0x0abe, 0x0ac5}, {0x0ac7, 0x0ac9}, {0x0acb, 0x0acd}, {0x0ae2,
          0x0ae3},
          - {0x0b01, 0x0b03}, {0x0b3c, 0x0b3c}, {0x0b3e, 0x0b43}, {0x0b47,
          0x0b48},
          - {0x0b4b, 0x0b4d}, {0x0b56, 0x0b57}, {0x0b82, 0x0b82}, {0x0bbe,
          0x0bc2},
          - {0x0bc6, 0x0bc8}, {0x0bca, 0x0bcd}, {0x0bd7, 0x0bd7}, {0x0c01,
          0x0c03},
          - {0x0c3e, 0x0c44}, {0x0c46, 0x0c48}, {0x0c4a, 0x0c4d}, {0x0c55,
          0x0c56},
          - {0x0c82, 0x0c83}, {0x0cbc, 0x0cbc}, {0x0cbe, 0x0cc4}, {0x0cc6,
          0x0cc8},
          - {0x0cca, 0x0ccd}, {0x0cd5, 0x0cd6}, {0x0d02, 0x0d03}, {0x0d3e,
          0x0d43},
          - {0x0d46, 0x0d48}, {0x0d4a, 0x0d4d}, {0x0d57, 0x0d57}, {0x0d82,
          0x0d83},
          - {0x0dca, 0x0dca}, {0x0dcf, 0x0dd4}, {0x0dd6, 0x0dd6}, {0x0dd8,
          0x0ddf},
          - {0x0df2, 0x0df3}, {0x0e31, 0x0e31}, {0x0e34, 0x0e3a}, {0x0e47,
          0x0e4e},
          - {0x0eb1, 0x0eb1}, {0x0eb4, 0x0eb9}, {0x0ebb, 0x0ebc}, {0x0ec8,
          0x0ecd},
          + {0x0a4b, 0x0a4d}, {0x0a51, 0x0a51}, {0x0a70, 0x0a71}, {0x0a75,
          0x0a75},
          + {0x0a81, 0x0a83}, {0x0abc, 0x0abc}, {0x0abe, 0x0ac5}, {0x0ac7,
          0x0ac9},
          + {0x0acb, 0x0acd}, {0x0ae2, 0x0ae3},
          + {0x0b01, 0x0b03}, {0x0b3c, 0x0b3c}, {0x0b3e, 0x0b44}, {0x0b47,
          0x0b48},
          + {0x0b4b, 0x0b4d}, {0x0b56, 0x0b57}, {0x0b62, 0x0b63}, {0x0b82,
          0x0b82},
          + {0x0bbe, 0x0bc2}, {0x0bc6, 0x0bc8}, {0x0bca, 0x0bcd}, {0x0bd7,
          0x0bd7},
          + {0x0c01, 0x0c03}, {0x0c3e, 0x0c44}, {0x0c46, 0x0c48}, {0x0c4a,
          0x0c4d},
          + {0x0c55, 0x0c56}, {0x0c62, 0x0c63}, {0x0c82, 0x0c83}, {0x0cbc,
          0x0cbc},
          + {0x0cbe, 0x0cc4}, {0x0cc6, 0x0cc8}, {0x0cca, 0x0ccd}, {0x0cd5,
          0x0cd6},
          + {0x0ce2, 0x0ce3},
          + {0x0d02, 0x0d03}, {0x0d3e, 0x0d43}, {0x0d46, 0x0d48}, {0x0d4a,
          0x0d4d},
          + {0x0d57, 0x0d57}, {0x0d82, 0x0d83}, {0x0dca, 0x0dca}, {0x0dcf,
          0x0dd4},
          + {0x0dd6, 0x0dd6}, {0x0dd8, 0x0ddf}, {0x0df2, 0x0df3},
          + {0x0e31, 0x0e31}, {0x0e34, 0x0e3a}, {0x0e47, 0x0e4e}, {0x0eb1,
          0x0eb1},
          + {0x0eb4, 0x0eb9}, {0x0ebb, 0x0ebc}, {0x0ec8, 0x0ecd},
          {0x0f18, 0x0f19}, {0x0f35, 0x0f35}, {0x0f37, 0x0f37}, {0x0f39,
          0x0f39},
          {0x0f3e, 0x0f3f}, {0x0f71, 0x0f84}, {0x0f86, 0x0f87}, {0x0f90,
          0x0f97},
          - {0x0f99, 0x0fbc}, {0x0fc6, 0x0fc6}, {0x102c, 0x1032}, {0x1036,
          0x1039},
          - {0x1056, 0x1059}, {0x1712, 0x1714}, {0x1732, 0x1734}, {0x1752,
          0x1753},
          - {0x1772, 0x1773}, {0x17b6, 0x17d3}, {0x17dd, 0x17dd}, {0x180b,
          0x180d},
          - {0x18a9, 0x18a9}, {0x1920, 0x192b}, {0x1930, 0x193b}, {0x20d0,
          0x20ea},
          - {0x302a, 0x302f}, {0x3099, 0x309a}, {0xfb1e, 0xfb1e}, {0xfe00,
          0xfe0f},
          - {0xfe20, 0xfe23},
          + {0x0f99, 0x0fbc}, {0x0fc6, 0x0fc6},
          + {0x102b, 0x103e}, {0x1056, 0x1059}, {0x105e, 0x1060}, {0x1062,
          0x1064},
          + {0x1067, 0x106d}, {0x1071, 0x1074}, {0x1082, 0x108d}, {0x108f,
          0x108f},
          + {0x109a, 0x109d},
          + {0x135f, 0x135f},
          + {0x1712, 0x1714}, {0x1732, 0x1734}, {0x1752, 0x1753}, {0x1772,
          0x1773},
          + {0x17b6, 0x17d3}, {0x17dd, 0x17dd},
          + {0x180b, 0x180d}, {0x18a9, 0x18a9},
          + {0x1920, 0x192b}, {0x1930, 0x193b}, {0x19b0, 0x19c0}, {0x19c8,
          0x19c9},
          + {0x1a17, 0x1a1b}, {0x1a55, 0x1a5e}, {0x1a60, 0x1a7c}, {0x1a7f,
          0x1a7f},
          + {0x1b00, 0x1b04}, {0x1b34, 0x1b44}, {0x1b6b, 0x1b73}, {0x1b80,
          0x1b82},
          + {0x1ba1, 0x1baa},
          + {0x1c24, 0x1c37}, {0x1cd0, 0x1cd2}, {0x1cd4, 0x1ce8}, {0x1ced,
          0x1ced},
          + {0x1cf2, 0x1cf2},
          + {0x1dc0, 0x1de6}, {0x1dfd, 0x1dff},
          + {0x20d0, 0x20f0},
          + {0x2cef, 0x2cf1},
          + {0x2de0, 0x2dff},
          + {0x302a, 0x302f}, {0x3099, 0x309a},
          + {0xa66f, 0xa672}, {0xa67c, 0xa67d}, {0xa6f0, 0xa6f1},
          + {0xa802, 0xa802}, {0xa806, 0xa806}, {0xa80b, 0xa80b}, {0xa823,
          0xa827},
          + {0xa880, 0xa881}, {0xa8b4, 0xa8c4}, {0xa8e0, 0xa8f1},
          + {0xa926, 0xa92d}, {0xa947, 0xa953}, {0xa980, 0xa983}, {0xa9b3,
          0xa9c0},
          + {0xaa29, 0xaa36}, {0xaa43, 0xaa43}, {0xaa4c, 0xaa4d}, {0xaa7b,
          0xaa7b},
          + {0xaab0, 0xaab0}, {0xaab2, 0xaab4}, {0xaab7, 0xaab8}, {0xaabe,
          0xaabf},
          + {0xaac1, 0xaac1},
          + {0xabe3, 0xabea}, {0xabec, 0xabed},
          + {0xfb1e, 0xfb1e},
          + {0xfe00, 0xfe0f}, {0xfe20, 0xfe26},
          };

          return intable(combining, sizeof(combining), c);
          </patch>


          Regards,
          Iông Chun

          On 1月2日, 下午4時30分, Iông Chun <yongj...@...> wrote:
          > Hi Tony and list,
          >
          > I understand now.
          > Characters ranged from U+0350 to U+0357, and from U+035D to U+035F,
          > are added in Unicode 4.0.
          > Characters ranged from U+0358 (which I used) to U+035C are added even
          > later in Unicode 4.1.
          > Check this from this docuent:http://www.unicode.org/Public/UNIDATA/DerivedAge.txt
          >
          > I use Vim version 7.2.323, its combining character table seems contain
          > most of Unicode 4.0,
          > but without U+0350 to U+0357 and U+035D to U+035F.
          >
          > I will make a patch to add those additional combining characters,
          > according to Unicode 5.2.
          >
          > Iông Chun
          >
          > On 1月2日, 下午1時25分, Tony Mechelynck <antoine.mechely...@...>
          > wrote:
          >
          > > On 02/01/10 04:47, I ng Chun wrote:
          >
          > > > Happy new year, everyone!
          >
          > > > I use Vim to edit an input method table these days, and find that
          > > > it doesn't work well with one combining character, Unicode U+0358.
          > > > Using g8 on "capital O with dot above right" shows "4f" only,
          > > > and the dot is another character ("cd 98").
          >
          > > > I look from the source to see how Vim check if a character is a
          > > > combining
          > > > character, and find the following code:
          >
          > > > <code>
          > > > /*
          > > >   * Return TRUE if "c" is a composing UTF-8 character.  This means it
          > > > will be
          > > >   * drawn on top of the preceding character.
          > > >   * Based on code from Markus Kuhn.
          > > >   */
          > > >      int
          > > > utf_iscomposing(c)
          > > >      int         c;
          > > > {
          > > >      /* sorted list of non-overlapping intervals */
          > > >      static struct interval combining[] =
          > > >      {
          > > >          {0x0300, 0x034f}, {0x0360, 0x036f}, {0x0483, 0x0486}, {0x0488,
          > > > 0x0489},
          > > > </code>
          >
          > > > Is there a reason why characters from 0x0350 to 0x035f are skipped?
          > > > It looks like to me that characters in Unicode block
          > > > 'Combining Diacritical Marks', range from 0x0300 to 0x036f,
          > > > should all be combing characters.
          >
          > > > A small patch likes this works well to me,
          > > > using g8 on "capital O with dot above right" shows "4f + cd 98" now:
          >
          > > > <patch>
          > > > --- work/vim72/src/mbyte.c~     2010-01-02 10:18:01.000000000 +0800
          > > > +++ work/vim72/src/mbyte.c      2010-01-02 11:19:24.000000000 +0800
          > > > @@ -1976,7 +1976,7 @@
          > > >       /* sorted list of non-overlapping intervals */
          > > >       static struct interval combining[] =
          > > >       {
          > > > -       {0x0300, 0x034f}, {0x0360, 0x036f}, {0x0483, 0x0486}, {0x0488,
          > > > 0x0489},
          > > > +       {0x0300, 0x036f}, {0x0483, 0x0486}, {0x0488, 0x0489},
          > > >          {0x0591, 0x05a1}, {0x05a3, 0x05b9}, {0x05bb, 0x05bd}, {0x05bf,
          > > > 0x05bf},
          > > >          {0x05c1, 0x05c2}, {0x05c4, 0x05c4}, {0x0610, 0x0615}, {0x064b,
          > > > 0x0658},
          > > >          {0x0670, 0x0670}, {0x06d6, 0x06dc}, {0x06de, 0x06e4}, {0x06e7,
          > > > 0x06e8},
          > > > </patch>
          >
          > > > Regards,
          > > > I ng Chun
          >
          > > According to the current version (5.2) of the Unicode Standard, all
          > > codepoints U+0300 to U+036F are indeed combining characters AFAICT, seehttp://www.unicode.org/charts/PDF/U0300.pdf
          >
          > > However, those in the range U+0350 to U+035F are particularly
          > > "esoteric", and I believe that it is possible that they were added in a
          > > relatively recent version of the Standard; previous versions (including,
          > > maybe, the one which was current when that module was written) would
          > > then have these codepoints "undefined".
          >
          > > BTW, I notice in a comment at lines 29-30 of that same module:
          >
          > > >  *             To make things complicated, up to two composing characters
          > > >  *             are allowed.  These are drawn on top of the first char.
          >
          > > This is now only true with the default settings. The 'maxcombine' option
          > > was added (relatively recently) to allow displaying (if the user sets a
          > > non-default value) up to 6 combining characters on top of each spacing
          > > character; even more than that can be "edited but not displayed".
          > > Shouldn't that comment be updated?
          >
          > > Best regards,
          > > Tony.
          > > --
          > > Children are unpredictable.  You never know what inconsistency they're
          > > going to catch you in next.
          > >                 -- Franklin P. Jones

          --
          You received this message from the "vim_multibyte" maillist.
          For more information, visit http://www.vim.org/maillist.php
        • Tony Mechelynck
          ... I m attaching an extract from the current UnicodeData.txt file where I ve extracted all codepoints with a nonzero Canonical_Combining_Class (field 3,
          Message 4 of 11 , Jan 2, 2010
          • 0 Attachment
            On 02/01/10 09:30, Iông Chun wrote:
            > Hi Tony and list,
            >
            > I understand now.
            > Characters ranged from U+0350 to U+0357, and from U+035D to U+035F,
            > are added in Unicode 4.0.
            > Characters ranged from U+0358 (which I used) to U+035C are added even
            > later in Unicode 4.1.
            > Check this from this docuent: http://www.unicode.org/Public/UNIDATA/DerivedAge.txt
            >
            > I use Vim version 7.2.323, its combining character table seems contain
            > most of Unicode 4.0,
            > but without U+0350 to U+0357 and U+035D to U+035F.
            >
            > I will make a patch to add those additional combining characters,
            > according to Unicode 5.2.
            >
            > Iông Chun
            >

            I'm attaching an extract from the current UnicodeData.txt file where
            I've extracted all codepoints with a nonzero Canonical_Combining_Class
            (field 3, counting the first field [codepoint number] as field 0). I'm
            *not* sure that this property coincides with the "combining character"
            property in the Vim sense, but it's the best I've found. You can check
            any discrepancies by means of
            http://www.unicode.org/Public/UNIDATA/UnicodeData.txt (where the first
            two fields are the codepoint number and name).

            This was obtained by applying :redir to the output of

            silent %g/^\%([^;]*;\)\{3}\%(0;\)\@!/p

            meaning: print all lines containing, at the start of a line, three times
            (zero or more non-semicolons plus one semicolon) not followed by (a zero
            then a semicolon).


            Best regards,
            Tony.
            --
            "I do not know myself, and God forbid that I should."
            -- Johann Wolfgang von Goethe

            --
            You received this message from the "vim_multibyte" maillist.
            For more information, visit http://www.vim.org/maillist.php
          • Iông Chun
            Hi Tony, ... I should also make use of UnicodeData.txt, instead of looking into every added code point, and check the code charts ;) About
            Message 5 of 11 , Jan 2, 2010
            • 0 Attachment
              Hi Tony,

              On 2010-01-02 07:01 ē-po͘, Tony Mechelynck wrote:
              > I'm attaching an extract from the current UnicodeData.txt file where
              > I've extracted all codepoints with a nonzero Canonical_Combining_Class
              > (field 3, counting the first field [codepoint number] as field 0). I'm
              > *not* sure that this property coincides with the "combining character"
              > property in the Vim sense, but it's the best I've found. You can check
              > any discrepancies by means of
              > http://www.unicode.org/Public/UNIDATA/UnicodeData.txt (where the first
              > two fields are the codepoint number and name).
              >
              > This was obtained by applying :redir to the output of
              >
              > silent %g/^\%([^;]*;\)\{3}\%(0;\)\@!/p
              >
              > meaning: print all lines containing, at the start of a line, three
              > times (zero or more non-semicolons plus one semicolon) not followed by
              > (a zero then a semicolon).
              >
              >
              > Best regards,
              > Tony.

              I should also make use of UnicodeData.txt, instead of looking into every
              added code point,
              and check the code charts ;)

              About Canonical_Combining_Class, from the Standard version 5.2, D52,
              item#2, I read:
              <quote>
              All characters with non-zero canonical combining class are combining charac-
              ters, but the reverse is not the case: there are combining characters
              with a zero
              canonical combining class.
              </quote>

              and item#1:
              <quote>
              Combining characters consist of all characters with the General Category
              val-
              ues of Spacing Combining Mark (Mc), Nonspacing Mark (Mn), and Enclosing
              Mark (Me).
              </quote>

              and D53:
              <quote>
              Nonspacing mark: A combining character with the General Category of
              Nonspacing
              Mark (Mn) or Enclosing Mark (Me).
              </quote>

              I don't know if Vim has different rule for display and semantic, in
              checking of
              combing characters. If no, I think the table could just contain those
              nonspacing ones now.

              I attach the list of those Mn and Me ones, without code points of value
              larger than U+FFFF.

              Regards,
              Iông Chun

              --
              You received this message from the "vim_multibyte" maillist.
              For more information, visit http://www.vim.org/maillist.php
            • Tony Mechelynck
              ... Why without codepoint values higher than U+FFFF? Nowadays gvim can diplay them (which wasn t the case when I started studying Unicode with gvim 6.x). Best
              Message 6 of 11 , Jan 2, 2010
              • 0 Attachment
                On 02/01/10 15:47, Iông Chun wrote:
                > Hi Tony,
                >
                > On 2010-01-02 07:01 ē-po͘, Tony Mechelynck wrote:
                >> I'm attaching an extract from the current UnicodeData.txt file where
                >> I've extracted all codepoints with a nonzero Canonical_Combining_Class
                >> (field 3, counting the first field [codepoint number] as field 0). I'm
                >> *not* sure that this property coincides with the "combining character"
                >> property in the Vim sense, but it's the best I've found. You can check
                >> any discrepancies by means of
                >> http://www.unicode.org/Public/UNIDATA/UnicodeData.txt (where the first
                >> two fields are the codepoint number and name).
                >>
                >> This was obtained by applying :redir to the output of
                >>
                >> silent %g/^\%([^;]*;\)\{3}\%(0;\)\@!/p
                >>
                >> meaning: print all lines containing, at the start of a line, three
                >> times (zero or more non-semicolons plus one semicolon) not followed by
                >> (a zero then a semicolon).
                >>
                >>
                >> Best regards,
                >> Tony.
                >
                > I should also make use of UnicodeData.txt, instead of looking into every
                > added code point,
                > and check the code charts ;)
                >
                > About Canonical_Combining_Class, from the Standard version 5.2, D52,
                > item#2, I read:
                > <quote>
                > All characters with non-zero canonical combining class are combining
                > charac-
                > ters, but the reverse is not the case: there are combining characters
                > with a zero
                > canonical combining class.
                > </quote>
                >
                > and item#1:
                > <quote>
                > Combining characters consist of all characters with the General Category
                > val-
                > ues of Spacing Combining Mark (Mc), Nonspacing Mark (Mn), and Enclosing
                > Mark (Me).
                > </quote>
                >
                > and D53:
                > <quote>
                > Nonspacing mark: A combining character with the General Category of
                > Nonspacing
                > Mark (Mn) or Enclosing Mark (Me).
                > </quote>
                >
                > I don't know if Vim has different rule for display and semantic, in
                > checking of
                > combing characters. If no, I think the table could just contain those
                > nonspacing ones now.
                >
                > I attach the list of those Mn and Me ones, without code points of value
                > larger than U+FFFF.
                >
                > Regards,
                > Iông Chun
                >

                Why without codepoint values higher than U+FFFF? Nowadays gvim can
                diplay them (which wasn't the case when I started studying Unicode with
                gvim 6.x).


                Best regards,
                Tony.
                --
                hundred-and-one symptoms of being an internet addict:
                236. You start saving URL's in your digital watch.

                --
                You received this message from the "vim_multibyte" maillist.
                For more information, visit http://www.vim.org/maillist.php
              • Iông Chun
                ... Because: struct interval { unsigned short first; unsigned short last; }; ;) I guess the type can be int instead of unsigned short now.
                Message 7 of 11 , Jan 2, 2010
                • 0 Attachment
                  On 2010/01/03 00:24, Tony Mechelynck wrote:
                  > Why without codepoint values higher than U+FFFF? Nowadays gvim can
                  > diplay them (which wasn't the case when I started studying Unicode
                  > with gvim 6.x).
                  >
                  >
                  > Best regards,
                  > Tony.

                  Because:
                  <code>
                  struct interval
                  {
                  unsigned short first;
                  unsigned short last;
                  };
                  </code>
                  ;)

                  I guess the type can be "int" instead of "unsigned short" now.
                  The patch with all Mn and Me character ranges is attached.

                  Regards,
                  Iông Chun

                  --
                  You received this message from the "vim_multibyte" maillist.
                  For more information, visit http://www.vim.org/maillist.php
                • Tony Mechelynck
                  ... I see. I suspect other size changes may have to be done then, not only where the structure is defined but possibly where it is used. I hope Bram is
                  Message 8 of 11 , Jan 2, 2010
                  • 0 Attachment
                    On 03/01/10 03:54, Iông Chun wrote:
                    > On 2010/01/03 00:24, Tony Mechelynck wrote:
                    >> Why without codepoint values higher than U+FFFF? Nowadays gvim can
                    >> diplay them (which wasn't the case when I started studying Unicode
                    >> with gvim 6.x).
                    >>
                    >>
                    >> Best regards,
                    >> Tony.
                    >
                    > Because:
                    > <code>
                    > struct interval
                    > {
                    > unsigned short first;
                    > unsigned short last;
                    > };
                    > </code>
                    > ;)
                    >
                    > I guess the type can be "int" instead of "unsigned short" now.
                    > The patch with all Mn and Me character ranges is attached.
                    >
                    > Regards,
                    > Iông Chun
                    >

                    I see. I suspect other size changes may have to be done then, not only
                    where the structure is defined but possibly where it is used. I hope
                    Bram is following this whole thread.

                    Best regards,
                    Tony.
                    --
                    "A Mormon is a man that has the bad taste and the religion to do what a
                    good many other people are restrained from doing by conscientious
                    scruples and the police."
                    -- Mr. Dooley

                    --
                    You received this message from the "vim_multibyte" maillist.
                    For more information, visit http://www.vim.org/maillist.php
                  • Bram Moolenaar
                    ... There is a script to generate these tables from the Unicode table. I think Markus Kuhn had this. But it should be easy to reproduce with Vim script.
                    Message 9 of 11 , Jan 4, 2010
                    • 0 Attachment
                      Tony Mechelynck wrote:

                      > On 03/01/10 03:54, Iông Chun wrote:
                      > > On 2010/01/03 00:24, Tony Mechelynck wrote:
                      > >> Why without codepoint values higher than U+FFFF? Nowadays gvim can
                      > >> diplay them (which wasn't the case when I started studying Unicode
                      > >> with gvim 6.x).
                      > >>
                      > >>
                      > >> Best regards,
                      > >> Tony.
                      > >
                      > > Because:
                      > > <code>
                      > > struct interval
                      > > {
                      > > unsigned short first;
                      > > unsigned short last;
                      > > };
                      > > </code>
                      > > ;)
                      > >
                      > > I guess the type can be "int" instead of "unsigned short" now.
                      > > The patch with all Mn and Me character ranges is attached.
                      > >
                      > > Regards,
                      > > Iông Chun
                      > >
                      >
                      > I see. I suspect other size changes may have to be done then, not only
                      > where the structure is defined but possibly where it is used. I hope
                      > Bram is following this whole thread.

                      There is a script to generate these tables from the Unicode table.
                      I think Markus Kuhn had this. But it should be easy to reproduce with
                      Vim script.

                      Changing all these tables from short to int makes the memory use higher.
                      But adding code to handle two tables won't be much smaller.

                      --
                      hundred-and-one symptoms of being an internet addict:
                      77. The phone company asks you to test drive their new PBX system

                      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                      /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                      \\\ download, build and distribute -- http://www.A-A-P.org ///
                      \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                      --
                      You received this message from the "vim_multibyte" maillist.
                      For more information, visit http://www.vim.org/maillist.php
                    • Tony Mechelynck
                      On 04/01/10 20:17, Bram Moolenaar wrote: [...] ... [...] Yes indeed: this Unidata.txt file is meant to be machine-readable, and with the power of Vim regexps
                      Message 10 of 11 , Jan 8, 2010
                      • 0 Attachment
                        On 04/01/10 20:17, Bram Moolenaar wrote:
                        [...]
                        >
                        > There is a script to generate these tables from the Unicode table.
                        > I think Markus Kuhn had this. But it should be easy to reproduce with
                        > Vim script.
                        >
                        [...]

                        Yes indeed: this Unidata.txt file is meant to be machine-readable, and
                        with the power of Vim regexps at our disposal, extracting the needed
                        data should be a breeze.


                        Best regards,
                        Tony.
                        --
                        Her locks an ancient lady gave
                        Her loving husband's life to save;
                        And men -- they honored so the dame --
                        Upon some stars bestowed her name.

                        But to our modern married fair,
                        Who'd give their lords to save their hair,
                        No stellar recognition's given.
                        There are not stars enough in heaven.
                      Your message has been successfully submitted and would be delivered to recipients shortly.