Loading ...
Sorry, an error occurred while loading the content.

Re: Combining characters U+035x are not supported?

Expand Messages
  • Iông Chun
    Hi Tony, ... I should also make use of UnicodeData.txt, instead of looking into every added code point, and check the code charts ;) About
    Message 1 of 11 , Jan 2, 2010
    • 0 Attachment
      Hi Tony,

      On 2010-01-02 07:01 ē-po͘, Tony Mechelynck wrote:
      > I'm attaching an extract from the current UnicodeData.txt file where
      > I've extracted all codepoints with a nonzero Canonical_Combining_Class
      > (field 3, counting the first field [codepoint number] as field 0). I'm
      > *not* sure that this property coincides with the "combining character"
      > property in the Vim sense, but it's the best I've found. You can check
      > any discrepancies by means of
      > http://www.unicode.org/Public/UNIDATA/UnicodeData.txt (where the first
      > two fields are the codepoint number and name).
      >
      > This was obtained by applying :redir to the output of
      >
      > silent %g/^\%([^;]*;\)\{3}\%(0;\)\@!/p
      >
      > meaning: print all lines containing, at the start of a line, three
      > times (zero or more non-semicolons plus one semicolon) not followed by
      > (a zero then a semicolon).
      >
      >
      > Best regards,
      > Tony.

      I should also make use of UnicodeData.txt, instead of looking into every
      added code point,
      and check the code charts ;)

      About Canonical_Combining_Class, from the Standard version 5.2, D52,
      item#2, I read:
      <quote>
      All characters with non-zero canonical combining class are combining charac-
      ters, but the reverse is not the case: there are combining characters
      with a zero
      canonical combining class.
      </quote>

      and item#1:
      <quote>
      Combining characters consist of all characters with the General Category
      val-
      ues of Spacing Combining Mark (Mc), Nonspacing Mark (Mn), and Enclosing
      Mark (Me).
      </quote>

      and D53:
      <quote>
      Nonspacing mark: A combining character with the General Category of
      Nonspacing
      Mark (Mn) or Enclosing Mark (Me).
      </quote>

      I don't know if Vim has different rule for display and semantic, in
      checking of
      combing characters. If no, I think the table could just contain those
      nonspacing ones now.

      I attach the list of those Mn and Me ones, without code points of value
      larger than U+FFFF.

      Regards,
      Iông Chun

      --
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
    • Tony Mechelynck
      ... Why without codepoint values higher than U+FFFF? Nowadays gvim can diplay them (which wasn t the case when I started studying Unicode with gvim 6.x). Best
      Message 2 of 11 , Jan 2, 2010
      • 0 Attachment
        On 02/01/10 15:47, Iông Chun wrote:
        > Hi Tony,
        >
        > On 2010-01-02 07:01 ē-po͘, Tony Mechelynck wrote:
        >> I'm attaching an extract from the current UnicodeData.txt file where
        >> I've extracted all codepoints with a nonzero Canonical_Combining_Class
        >> (field 3, counting the first field [codepoint number] as field 0). I'm
        >> *not* sure that this property coincides with the "combining character"
        >> property in the Vim sense, but it's the best I've found. You can check
        >> any discrepancies by means of
        >> http://www.unicode.org/Public/UNIDATA/UnicodeData.txt (where the first
        >> two fields are the codepoint number and name).
        >>
        >> This was obtained by applying :redir to the output of
        >>
        >> silent %g/^\%([^;]*;\)\{3}\%(0;\)\@!/p
        >>
        >> meaning: print all lines containing, at the start of a line, three
        >> times (zero or more non-semicolons plus one semicolon) not followed by
        >> (a zero then a semicolon).
        >>
        >>
        >> Best regards,
        >> Tony.
        >
        > I should also make use of UnicodeData.txt, instead of looking into every
        > added code point,
        > and check the code charts ;)
        >
        > About Canonical_Combining_Class, from the Standard version 5.2, D52,
        > item#2, I read:
        > <quote>
        > All characters with non-zero canonical combining class are combining
        > charac-
        > ters, but the reverse is not the case: there are combining characters
        > with a zero
        > canonical combining class.
        > </quote>
        >
        > and item#1:
        > <quote>
        > Combining characters consist of all characters with the General Category
        > val-
        > ues of Spacing Combining Mark (Mc), Nonspacing Mark (Mn), and Enclosing
        > Mark (Me).
        > </quote>
        >
        > and D53:
        > <quote>
        > Nonspacing mark: A combining character with the General Category of
        > Nonspacing
        > Mark (Mn) or Enclosing Mark (Me).
        > </quote>
        >
        > I don't know if Vim has different rule for display and semantic, in
        > checking of
        > combing characters. If no, I think the table could just contain those
        > nonspacing ones now.
        >
        > I attach the list of those Mn and Me ones, without code points of value
        > larger than U+FFFF.
        >
        > Regards,
        > Iông Chun
        >

        Why without codepoint values higher than U+FFFF? Nowadays gvim can
        diplay them (which wasn't the case when I started studying Unicode with
        gvim 6.x).


        Best regards,
        Tony.
        --
        hundred-and-one symptoms of being an internet addict:
        236. You start saving URL's in your digital watch.

        --
        You received this message from the "vim_multibyte" maillist.
        For more information, visit http://www.vim.org/maillist.php
      • Iông Chun
        ... Because: struct interval { unsigned short first; unsigned short last; }; ;) I guess the type can be int instead of unsigned short now.
        Message 3 of 11 , Jan 2, 2010
        • 0 Attachment
          On 2010/01/03 00:24, Tony Mechelynck wrote:
          > Why without codepoint values higher than U+FFFF? Nowadays gvim can
          > diplay them (which wasn't the case when I started studying Unicode
          > with gvim 6.x).
          >
          >
          > Best regards,
          > Tony.

          Because:
          <code>
          struct interval
          {
          unsigned short first;
          unsigned short last;
          };
          </code>
          ;)

          I guess the type can be "int" instead of "unsigned short" now.
          The patch with all Mn and Me character ranges is attached.

          Regards,
          Iông Chun

          --
          You received this message from the "vim_multibyte" maillist.
          For more information, visit http://www.vim.org/maillist.php
        • Tony Mechelynck
          ... I see. I suspect other size changes may have to be done then, not only where the structure is defined but possibly where it is used. I hope Bram is
          Message 4 of 11 , Jan 2, 2010
          • 0 Attachment
            On 03/01/10 03:54, Iông Chun wrote:
            > On 2010/01/03 00:24, Tony Mechelynck wrote:
            >> Why without codepoint values higher than U+FFFF? Nowadays gvim can
            >> diplay them (which wasn't the case when I started studying Unicode
            >> with gvim 6.x).
            >>
            >>
            >> Best regards,
            >> Tony.
            >
            > Because:
            > <code>
            > struct interval
            > {
            > unsigned short first;
            > unsigned short last;
            > };
            > </code>
            > ;)
            >
            > I guess the type can be "int" instead of "unsigned short" now.
            > The patch with all Mn and Me character ranges is attached.
            >
            > Regards,
            > Iông Chun
            >

            I see. I suspect other size changes may have to be done then, not only
            where the structure is defined but possibly where it is used. I hope
            Bram is following this whole thread.

            Best regards,
            Tony.
            --
            "A Mormon is a man that has the bad taste and the religion to do what a
            good many other people are restrained from doing by conscientious
            scruples and the police."
            -- Mr. Dooley

            --
            You received this message from the "vim_multibyte" maillist.
            For more information, visit http://www.vim.org/maillist.php
          • Bram Moolenaar
            ... There is a script to generate these tables from the Unicode table. I think Markus Kuhn had this. But it should be easy to reproduce with Vim script.
            Message 5 of 11 , Jan 4, 2010
            • 0 Attachment
              Tony Mechelynck wrote:

              > On 03/01/10 03:54, Iông Chun wrote:
              > > On 2010/01/03 00:24, Tony Mechelynck wrote:
              > >> Why without codepoint values higher than U+FFFF? Nowadays gvim can
              > >> diplay them (which wasn't the case when I started studying Unicode
              > >> with gvim 6.x).
              > >>
              > >>
              > >> Best regards,
              > >> Tony.
              > >
              > > Because:
              > > <code>
              > > struct interval
              > > {
              > > unsigned short first;
              > > unsigned short last;
              > > };
              > > </code>
              > > ;)
              > >
              > > I guess the type can be "int" instead of "unsigned short" now.
              > > The patch with all Mn and Me character ranges is attached.
              > >
              > > Regards,
              > > Iông Chun
              > >
              >
              > I see. I suspect other size changes may have to be done then, not only
              > where the structure is defined but possibly where it is used. I hope
              > Bram is following this whole thread.

              There is a script to generate these tables from the Unicode table.
              I think Markus Kuhn had this. But it should be easy to reproduce with
              Vim script.

              Changing all these tables from short to int makes the memory use higher.
              But adding code to handle two tables won't be much smaller.

              --
              hundred-and-one symptoms of being an internet addict:
              77. The phone company asks you to test drive their new PBX system

              /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
              /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
              \\\ download, build and distribute -- http://www.A-A-P.org ///
              \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

              --
              You received this message from the "vim_multibyte" maillist.
              For more information, visit http://www.vim.org/maillist.php
            • Tony Mechelynck
              On 04/01/10 20:17, Bram Moolenaar wrote: [...] ... [...] Yes indeed: this Unidata.txt file is meant to be machine-readable, and with the power of Vim regexps
              Message 6 of 11 , Jan 8, 2010
              • 0 Attachment
                On 04/01/10 20:17, Bram Moolenaar wrote:
                [...]
                >
                > There is a script to generate these tables from the Unicode table.
                > I think Markus Kuhn had this. But it should be easy to reproduce with
                > Vim script.
                >
                [...]

                Yes indeed: this Unidata.txt file is meant to be machine-readable, and
                with the power of Vim regexps at our disposal, extracting the needed
                data should be a breeze.


                Best regards,
                Tony.
                --
                Her locks an ancient lady gave
                Her loving husband's life to save;
                And men -- they honored so the dame --
                Upon some stars bestowed her name.

                But to our modern married fair,
                Who'd give their lords to save their hair,
                No stellar recognition's given.
                There are not stars enough in heaven.
              Your message has been successfully submitted and would be delivered to recipients shortly.