Loading ...
Sorry, an error occurred while loading the content.

Combining characters and case-changing

Expand Messages
  • Tony Mechelynck
    Using latest Vim 7.3c, compiled 2010-07-31 21:22:42 +0200 immediately after pulling. Hitting ~ (tilde) on a letter + combining character gives faulty result.
    Message 1 of 5 , Aug 1, 2010
    • 0 Attachment
      Using latest Vim 7.3c, compiled 2010-07-31 21:22:42 +0200 immediately
      after pulling.

      Hitting ~ (tilde) on a letter + combining character gives faulty result.


      Reproducible: Always.


      Steps to reproduce(1):
      1. Type text in some case-sensitive script (not Hebrew or Arabic) with
      some combining characters in it (e.g. Russian with combining acute
      accent(s)).
      2. In Normal mode, move the cursor over a letter with combining accent
      and hit ~ (tilde).

      Expected result:
      The letter should change case and keep its combining accent.

      Actual result:
      The accented letter is replaced by both its upper- and lower-case
      variants, without the accent.


      Steps to reproduce(2)
      1. Like before, type text with some combining characters in it.
      2. Select a section of text (I used V for single-line linewise-visual).
      3. Hit the tilde.

      Expected result:
      The text should change case, with the combining characters remaining
      where they belong.

      Actual result:
      - Accented characters become doubled, losing their accent.
      - At the end of the selection, the last characters (as many as there
      were accents) are not case-toggled.


      Additional info:
      I haven't tested what happens with _several_ combining characters on a
      single letter (e.g. non-precomposed Classical Greek with breathing,
      accent and/or iota-subscript/adscript on the same vowel), or with
      spacing and combining characters of different byte-length (Cyrillic
      letters and combining-acute are all two bytes per codepoint in UTF-8).
      Neither did I check that ~ in case-neutral text with combining
      characters (such as vocalised Semitic text) is a no-op.


      Best regards,
      Tony.
      --
      Brady's First Law of Problem Solving:
      When confronted by a difficult problem, you can solve it more
      easily by reducing it to the question, "How would the Lone Ranger have
      handled this?"

      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php
    • Jakson A. Aquino
      On Sun, Aug 1, 2010 at 5:59 AM, Tony Mechelynck ... Here it happens what was expected. I tested with tutor.utf-8 (fr, ru, el, vi). I m using Ubuntu 10.04 in
      Message 2 of 5 , Aug 1, 2010
      • 0 Attachment
        On Sun, Aug 1, 2010 at 5:59 AM, Tony Mechelynck
        <antoine.mechelynck@...> wrote:
        > Using latest Vim 7.3c, compiled 2010-07-31 21:22:42 +0200 immediately after
        > pulling.
        >
        > Hitting ~ (tilde) on a letter + combining character gives faulty result.
        >
        >
        > Reproducible: Always.
        >
        >
        > Steps to reproduce(1):
        > 1. Type text in some case-sensitive script (not Hebrew or Arabic) with some
        > combining characters in it (e.g. Russian with combining acute accent(s)).
        > 2. In Normal mode, move the cursor over a letter with combining accent and
        > hit ~ (tilde).
        >
        > Expected result:
        > The letter should change case and keep its combining accent.
        >
        > Actual result:
        > The accented letter is replaced by both its upper- and lower-case variants,
        > without the accent.
        >
        >
        > Steps to reproduce(2)
        > 1. Like before, type text with some combining characters in it.
        > 2. Select a section of text (I used V for single-line linewise-visual).
        > 3. Hit the tilde.
        >
        > Expected result:
        > The text should change case, with the combining characters remaining where
        > they belong.
        >
        > Actual result:
        > - Accented characters become doubled, losing their accent.
        > - At the end of the selection, the last characters (as many as there were
        > accents) are not case-toggled.
        >
        >
        > Additional info:
        > I haven't tested what happens with _several_ combining characters on a
        > single letter (e.g. non-precomposed Classical Greek with breathing, accent
        > and/or iota-subscript/adscript on the same vowel), or with spacing and
        > combining characters of different byte-length (Cyrillic letters and
        > combining-acute are all two bytes per codepoint in UTF-8). Neither did I
        > check that ~ in case-neutral text with combining characters (such as
        > vocalised Semitic text) is a no-op.

        Here it happens what was expected. I tested with tutor.utf-8 (fr, ru,
        el, vi). I'm using Ubuntu 10.04 in pt_BR.UTF-8 locale with Vim from
        current hg.

        Best regards,

        Jakson

        --
        You received this message from the "vim_dev" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php
      • Bram Moolenaar
        ... I ll fix this. Note that for the example it s very useful to give the actual text to reproduce it on. ... I can t reproduce doubling the characters. Do
        Message 3 of 5 , Aug 1, 2010
        • 0 Attachment
          Tony Mechelynck wrote:

          > Using latest Vim 7.3c, compiled 2010-07-31 21:22:42 +0200 immediately
          > after pulling.
          >
          > Hitting ~ (tilde) on a letter + combining character gives faulty result.
          >
          >
          > Reproducible: Always.
          >
          >
          > Steps to reproduce(1):
          > 1. Type text in some case-sensitive script (not Hebrew or Arabic) with
          > some combining characters in it (e.g. Russian with combining acute
          > accent(s)).
          > 2. In Normal mode, move the cursor over a letter with combining accent
          > and hit ~ (tilde).
          >
          > Expected result:
          > The letter should change case and keep its combining accent.
          >
          > Actual result:
          > The accented letter is replaced by both its upper- and lower-case
          > variants, without the accent.

          I'll fix this. Note that for the example it's very useful to give the
          actual text to reproduce it on.

          > Steps to reproduce(2)
          > 1. Like before, type text with some combining characters in it.
          > 2. Select a section of text (I used V for single-line linewise-visual).
          > 3. Hit the tilde.
          >
          > Expected result:
          > The text should change case, with the combining characters remaining
          > where they belong.
          >
          > Actual result:
          > - Accented characters become doubled, losing their accent.
          > - At the end of the selection, the last characters (as many as there
          > were accents) are not case-toggled.

          I can't reproduce doubling the characters. Do you have 'delcombine' set
          perhaps?

          > Additional info:
          > I haven't tested what happens with _several_ combining characters on a
          > single letter (e.g. non-precomposed Classical Greek with breathing,
          > accent and/or iota-subscript/adscript on the same vowel), or with
          > spacing and combining characters of different byte-length (Cyrillic
          > letters and combining-acute are all two bytes per codepoint in UTF-8).
          > Neither did I check that ~ in case-neutral text with combining
          > characters (such as vocalised Semitic text) is a no-op.

          I have fixed one problem. Please check if you can still reproduce the
          others. If so, please include the text.

          --
          Scientists decoded the first message from an alien civilization:
          SIMPLY SEND 6 TIMES 10 TO THE 50 ATOMS OF HYDROGEN TO THE STAR
          SYSTEM AT THE TOP OF THE LIST, CROSS OFF THAT STAR SYSTEM, THEN PUT
          YOUR STAR SYSTEM AT THE BOTTOM OF THE LIST AND SEND IT TO 100 OTHER
          STAR SYSTEMS. WITHIN ONE TENTH GALACTIC ROTATION YOU WILL RECEIVE
          ENOUGH HYDROGREN TO POWER YOUR CIVILIZATION UNTIL ENTROPY REACHES ITS
          MAXIMUM! IT REALLY WORKS!

          /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
          /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
          \\\ download, build and distribute -- http://www.A-A-P.org ///
          \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

          --
          You received this message from the "vim_dev" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php
        • Tony Mechelynck
          ... The problem I saw existed only in text containing combining characters (i.e. Unicode codepoints which are drawn superimposed on the preceding character,
          Message 4 of 5 , Aug 1, 2010
          • 0 Attachment
            On 01/08/10 13:43, Jakson A. Aquino wrote:
            > On Sun, Aug 1, 2010 at 5:59 AM, Tony Mechelynck
            > <antoine.mechelynck@...> wrote:
            >> Using latest Vim 7.3c, compiled 2010-07-31 21:22:42 +0200 immediately after
            >> pulling.
            >>
            >> Hitting ~ (tilde) on a letter + combining character gives faulty result.
            >>
            >>
            >> Reproducible: Always.
            >>
            >>
            >> Steps to reproduce(1):
            >> 1. Type text in some case-sensitive script (not Hebrew or Arabic) with some
            >> combining characters in it (e.g. Russian with combining acute accent(s)).
            >> 2. In Normal mode, move the cursor over a letter with combining accent and
            >> hit ~ (tilde).
            >>
            >> Expected result:
            >> The letter should change case and keep its combining accent.
            >>
            >> Actual result:
            >> The accented letter is replaced by both its upper- and lower-case variants,
            >> without the accent.
            >>
            >>
            >> Steps to reproduce(2)
            >> 1. Like before, type text with some combining characters in it.
            >> 2. Select a section of text (I used V for single-line linewise-visual).
            >> 3. Hit the tilde.
            >>
            >> Expected result:
            >> The text should change case, with the combining characters remaining where
            >> they belong.
            >>
            >> Actual result:
            >> - Accented characters become doubled, losing their accent.
            >> - At the end of the selection, the last characters (as many as there were
            >> accents) are not case-toggled.
            >>
            >>
            >> Additional info:
            >> I haven't tested what happens with _several_ combining characters on a
            >> single letter (e.g. non-precomposed Classical Greek with breathing, accent
            >> and/or iota-subscript/adscript on the same vowel), or with spacing and
            >> combining characters of different byte-length (Cyrillic letters and
            >> combining-acute are all two bytes per codepoint in UTF-8). Neither did I
            >> check that ~ in case-neutral text with combining characters (such as
            >> vocalised Semitic text) is a no-op.
            >
            > Here it happens what was expected. I tested with tutor.utf-8 (fr, ru,
            > el, vi). I'm using Ubuntu 10.04 in pt_BR.UTF-8 locale with Vim from
            > current hg.
            >
            > Best regards,
            >
            > Jakson
            >

            The problem I saw existed only in text containing combining characters
            (i.e. Unicode codepoints which are drawn superimposed on the preceding
            character, such as e.g. U+0301 COMBINING ACUTE ACCENT). When the accent
            is part of a spacing codepoint (e.g. é U+00E9 LATIN SMALL LETTER E WITH
            ACUTE) there was no problem. With Vim, combining characters have to be
            entered each separately, and if 'delcombine' is on they can also be
            removed one-by-one, separately from the letter over which they are drawn.

            I was using gvim; if you were using Vim in an mlterm console (with
            'termbidi' on) the drawing of the full line (including displaying words
            in RTL scripts right-to-left even in LTR sentences, and displaying
            combining characters superimposed if there are any) is handled by the
            terminal, not by Vim: this might make a difference.

            In Russian, for instance, the combining acute accent is the only way to
            add an accent over a Cyrillic letter; it is used to indicate stress in
            dictionaries, in reading books for children or foreigners learning the
            language, and sometimes to avoid ambiguity as in "у белых теперь бо́льшая
            выгода" "u byelyh teper' BOLshaya vygoda", "White now has a bigger
            advantage", which I saw once in a Russian book about chess; without the
            written accent (у белых теперь большая выгода) it would be read "u
            byelyh teper' bolSHAya vygoda", "White now has a big advantage". In
            SeaMonkey I see this accent over the letter л, which is wrong; Vim
            correctly displays it over the о; but SeaMonkey, unlike Vim, has the
            possibility to place the cursor between the combining character and the
            spacing character preceding it.

            Similarly, Hebrew and Arabic (where case differences don't exist)
            optionally use combining characters to denote short vowels etc. (again,
            in dictionaries, reading books for children or foreigners, sacred
            writings, and in any text to avoid ambiguity); in many scripts the
            possibility exists to use combining characters, but precombined
            characters, when they exist, are often preferred in practice.

            Bram corrected the problem in a changeset dated 14:22:48 +0200 today
            (that would be shortly after your reply), and it works. Thanks Bram!


            Best regards,
            Tony.
            --
            Snoring is prohibited unless all bedroom windows are closed and securely
            locked.
            [real standing law in Massachusetts, United States of America]

            --
            You received this message from the "vim_dev" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php
          • Tony Mechelynck
            ... Well, previously (in the Ctrl-R case) I had done it, but the Russian characters in my text quoted in your answer were totally garbled (wrong Content-Type
            Message 5 of 5 , Aug 1, 2010
            • 0 Attachment
              On 01/08/10 16:06, Bram Moolenaar wrote:
              >
              > Tony Mechelynck wrote:
              >
              >> Using latest Vim 7.3c, compiled 2010-07-31 21:22:42 +0200 immediately
              >> after pulling.
              >>
              >> Hitting ~ (tilde) on a letter + combining character gives faulty result.
              >>
              >>
              >> Reproducible: Always.
              >>
              >>
              >> Steps to reproduce(1):
              >> 1. Type text in some case-sensitive script (not Hebrew or Arabic) with
              >> some combining characters in it (e.g. Russian with combining acute
              >> accent(s)).
              >> 2. In Normal mode, move the cursor over a letter with combining accent
              >> and hit ~ (tilde).
              >>
              >> Expected result:
              >> The letter should change case and keep its combining accent.
              >>
              >> Actual result:
              >> The accented letter is replaced by both its upper- and lower-case
              >> variants, without the accent.
              >
              > I'll fix this. Note that for the example it's very useful to give the
              > actual text to reproduce it on.

              Well, previously (in the Ctrl-R case) I had done it, but the Russian
              characters in my text quoted in your answer were totally garbled (wrong
              Content-Type charset maybe?). You could have reused the same example
              text, or any text with any combining character; I used the combining
              acute accent U+0301; there are other combining codepoints in the same
              Unicode block.

              >
              >> Steps to reproduce(2)
              >> 1. Like before, type text with some combining characters in it.
              >> 2. Select a section of text (I used V for single-line linewise-visual).
              >> 3. Hit the tilde.
              >>
              >> Expected result:
              >> The text should change case, with the combining characters remaining
              >> where they belong.
              >>
              >> Actual result:
              >> - Accented characters become doubled, losing their accent.
              >> - At the end of the selection, the last characters (as many as there
              >> were accents) are not case-toggled.
              >
              > I can't reproduce doubling the characters. Do you have 'delcombine' set
              > perhaps?

              Yes I do. This way I can easily correct an error if I notice that I have
              put the accent on the wrong syllable, or the wrong diacritic on the
              right letter.

              >
              >> Additional info:
              >> I haven't tested what happens with _several_ combining characters on a
              >> single letter (e.g. non-precomposed Classical Greek with breathing,
              >> accent and/or iota-subscript/adscript on the same vowel), or with
              >> spacing and combining characters of different byte-length (Cyrillic
              >> letters and combining-acute are all two bytes per codepoint in UTF-8).
              >> Neither did I check that ~ in case-neutral text with combining
              >> characters (such as vocalised Semitic text) is a no-op.
              >
              > I have fixed one problem. Please check if you can still reproduce the
              > others. If so, please include the text.
              >

              No, the tilde now works correctly, both in Normal mode (when toggling
              one character at a time, with 'tildeop' at its default off setting), and
              in linewise Visual mode (when toggling a full line in one operation).
              The length of the Visual text operated upon is now also correct.


              Best regards,
              Tony.
              --
              "It took me fifteen years to discover that I had no talent for writing,
              but I couldn't give up because by that time I was too famous."
              -- Robert Benchly

              --
              You received this message from the "vim_dev" maillist.
              Do not top-post! Type your reply below the text you are replying to.
              For more information, visit http://www.vim.org/maillist.php
            Your message has been successfully submitted and would be delivered to recipients shortly.