Loading ...
Sorry, an error occurred while loading the content.

Re: [patch] improved equivalent classes in regular expressions

Expand Messages
  • Christian Brabandt
    Hi Dominique! ... Indeed, that looks like a useful addition. I have another idea with regards to equivalence classes: When searching for /[[=ß=]] this should
    Message 1 of 11 , Jan 21, 2013
    • 0 Attachment
      Hi Dominique!

      On Mi, 16 Jan 2013, Dominique Pellé wrote:

      > When using equivalent class [[=x=]], I realized that what I
      > generally want, is to use it on the full strings rather than on
      > a single characters. Searching for "foobar" with...
      >
      > /[[=f=]][[=o=]][[=o=]][[=b=]][[=a=]][[=r=]]
      >
      > ... works but is rather unpleasant. I wish there was a flag
      > such as \q switch on equivalent class, which would
      > work like \c for case insensitivity. So instead of the above
      > regexp, I could search for:
      >
      > /\qfoobar
      >
      > As far as I know \q is unused in Vim regexp, so
      > that should not break compatibility.
      >
      > Maybe there could also be a function normalize({expr}}
      > (any better name?) that given a string with diacritics
      > "fňóbâr" returns "foobar" in similar way to tolower({expr}})
      > which returns a lowercase version of the string.
      >
      > Before I spend time trying to do that, would it be useful
      > and accepted?

      Indeed, that looks like a useful addition.

      I have another idea with regards to equivalence classes:
      When searching for /[[=ß=]] this should translate into /sz. But that is
      more complicated, since a search for /[s][z] wouldn't match ß (eszet)
      anymore.

      > Regarding the few characters that are no longer equivalent,
      > I find it odd from a user point of view. For example U+01e4
      > (LATIN CAPITAL LETTER G WITH STROKE) was equivalent
      > to uppercase G but it is no longer equivalent to G.
      > Yet some other letters with stroke are still equivalent.
      > For example, U+0141 (LATIN CAPITAL LETTER L WITH STROKE)
      > is still equivalent to L. It seems inconsistent, even if that's
      > what the ISO standard says. Previous behavior made more
      > sense to me for U+1e4 at least.

      Fixed with the latest patch.

      Mit freundlichen Grüßen
      Christian
      --
      Alkoholismus: Gift und Gegengift sind identisch.
      -- Gerhard Uhlenbruck

      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php
    • Dominique Pellé
      ... I have no time now for that unfortunately, but maybe in a few weeks. ... You obviously speak better German than me, but isn t the German ess-zett
      Message 2 of 11 , Jan 21, 2013
      • 0 Attachment
        Christian Brabandt wrote:

        > Hi Dominique!
        >
        > On Mi, 16 Jan 2013, Dominique Pellé wrote:
        >
        >> When using equivalent class [[=x=]], I realized that what I
        >> generally want, is to use it on the full strings rather than on
        >> a single characters. Searching for "foobar" with...
        >>
        >> /[[=f=]][[=o=]][[=o=]][[=b=]][[=a=]][[=r=]]
        >>
        >> ... works but is rather unpleasant. I wish there was a flag
        >> such as \q switch on equivalent class, which would
        >> work like \c for case insensitivity. So instead of the above
        >> regexp, I could search for:
        >>
        >> /\qfoobar
        >>
        >> As far as I know \q is unused in Vim regexp, so
        >> that should not break compatibility.
        >>
        >> Maybe there could also be a function normalize({expr}}
        >> (any better name?) that given a string with diacritics
        >> "fňóbâr" returns "foobar" in similar way to tolower({expr}})
        >> which returns a lowercase version of the string.
        >>
        >> Before I spend time trying to do that, would it be useful
        >> and accepted?
        >
        > Indeed, that looks like a useful addition.

        I have no time now for that unfortunately, but maybe in a few weeks.

        > I have another idea with regards to equivalence classes:
        > When searching for /[[=ß=]] this should translate into /sz. But that is
        > more complicated, since a search for /[s][z] wouldn't match ß (eszet)
        > anymore.

        You obviously speak better German than me, but isn't the German
        ess-zett equivalent to ss rather than sz? I'm curious why /sz.

        >> Regarding the few characters that are no longer equivalent,
        >> I find it odd from a user point of view. For example U+01e4
        >> (LATIN CAPITAL LETTER G WITH STROKE) was equivalent
        >> to uppercase G but it is no longer equivalent to G.
        >> Yet some other letters with stroke are still equivalent.
        >> For example, U+0141 (LATIN CAPITAL LETTER L WITH STROKE)
        >> is still equivalent to L. It seems inconsistent, even if that's
        >> what the ISO standard says. Previous behavior made more
        >> sense to me for U+1e4 at least.
        >
        > Fixed with the latest patch.

        Yes, I saw that. Thanks!

        --
        You received this message from the "vim_dev" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php
      • Christian Brabandt
        Hi Dominique! ... You got me ;) Of course esszett is, despite its name, equivalent to ss and that is what the standard actually demands (Although the Swiss
        Message 3 of 11 , Jan 23, 2013
        • 0 Attachment
          Hi Dominique!

          On Mo, 21 Jan 2013, Dominique Pellé wrote:

          > You obviously speak better German than me, but isn't the German
          > ess-zett equivalent to ss rather than sz? I'm curious why /sz.

          You got me ;)
          Of course esszett is, despite its name, equivalent to ss and that is
          what the standard actually demands (Although the Swiss might think
          otherwise). Sorry for the confusion.

          regards,
          Christian
          --
          Zeit ist das, was man an der Uhr abliest.
          -- Albert Einstein

          --
          --
          You received this message from the "vim_dev" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php
        • Joachim Schmitz
          ... But still, while ß is equivalent to ss, the oposite is not true, only few ss are equivalent to ß. Same for ä,ö,ü and ae, oe, ue, equivalent in one
          Message 4 of 11 , Jan 24, 2013
          • 0 Attachment
            Christian Brabandt wrote:
            > Hi Dominique!
            >
            > On Mo, 21 Jan 2013, Dominique Pellé wrote:
            >
            >> You obviously speak better German than me, but isn't the German
            >> ess-zett equivalent to ss rather than sz? I'm curious why /sz.
            >
            > You got me ;)
            > Of course esszett is, despite its name, equivalent to ss and that is
            > what the standard actually demands (Although the Swiss might think
            > otherwise). Sorry for the confusion.


            But still, while ß is equivalent to ss, the oposite is not true, only few ss
            are equivalent to ß.
            Same for ä,ö,ü and ae, oe, ue, equivalent in one direction but not the
            other.

            Bye, Jojo


            --
            --
            You received this message from the "vim_dev" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php
          • Christian Brabandt
            Hi Joachim! ... Indeed, but when we are talking about equivalence classes regarding regular expressions, then ss and ß are equal. regards, Christian -- Der
            Message 5 of 11 , Jan 24, 2013
            • 0 Attachment
              Hi Joachim!

              On Do, 24 Jan 2013, Joachim Schmitz wrote:

              > But still, while ß is equivalent to ss, the oposite is not true,
              > only few ss are equivalent to ß.
              > Same for ä,ö,ü and ae, oe, ue, equivalent in one direction but not
              > the other.

              Indeed, but when we are talking about equivalence classes regarding
              regular expressions, then ss and ß are equal.

              regards,
              Christian
              --
              Der beste Teil der Schönheit ist der, den ein Bild nicht wiedergeben
              kann.
              -- Francis Bacon

              --
              --
              You received this message from the "vim_dev" maillist.
              Do not top-post! Type your reply below the text you are replying to.
              For more information, visit http://www.vim.org/maillist.php
            • Tony Mechelynck
              ... What do you mean, the Swiss may think otherwise ? IIUC, in the de_CH standard the eszett is not used, it is always replaced by ss, because the Swiss have
              Message 6 of 11 , Jan 24, 2013
              • 0 Attachment
                On 23/01/13 22:08, Christian Brabandt wrote:
                > Hi Dominique!
                >
                > On Mo, 21 Jan 2013, Dominique Pellé wrote:
                >
                >> You obviously speak better German than me, but isn't the German
                >> ess-zett equivalent to ss rather than sz? I'm curious why /sz.
                >
                > You got me ;)
                > Of course esszett is, despite its name, equivalent to ss and that is
                > what the standard actually demands (Although the Swiss might think
                > otherwise). Sorry for the confusion.
                >
                > regards,
                > Christian
                >

                What do you mean, "the Swiss may think otherwise"? IIUC, in the de_CH
                standard the eszett is not used, it is always replaced by ss, because
                the Swiss have no room for it on their trilingual (well, quadrilingual,
                even) typewriter keyboards. Hence the well-known slur against them:

                — Wie trinken die Schweizer Bier?
                ("How do the Swiss drink beer?")
                — In Masse.
                ("massively", where for any other German-speaking country except maybe
                Liechtenstein it would of course be "in Maße", "in moderation").


                Best regards,
                Tony.
                --
                Speak roughly to your little boy,
                And beat him when he sneezes:
                He only does it to annoy
                Because he knows it teases.

                Wow! wow! wow!

                I speak severely to my boy,
                And beat him when he sneezes:
                For he can thoroughly enjoy
                The pepper when he pleases!

                Wow! wow! wow!
                -- Lewis Carrol, "Alice in Wonderland"

                --
                --
                You received this message from the "vim_dev" maillist.
                Do not top-post! Type your reply below the text you are replying to.
                For more information, visit http://www.vim.org/maillist.php
              • Christian Brabandt
                Hi Tony! ... I thought the Swiss used to replace ß by sz but that is apparently wrong, as you pointed out correctly. Mit freundlichen Grüßen Christian -- --
                Message 7 of 11 , Jan 24, 2013
                • 0 Attachment
                  Hi Tony!

                  On Do, 24 Jan 2013, Tony Mechelynck wrote:

                  > What do you mean, "the Swiss may think otherwise"? IIUC, in the
                  > de_CH standard the eszett is not used, it is always replaced by ss,
                  > because the Swiss have no room for it on their trilingual (well,
                  > quadrilingual, even) typewriter keyboards. Hence the well-known slur
                  > against them:
                  >
                  > — Wie trinken die Schweizer Bier?
                  > ("How do the Swiss drink beer?")
                  > — In Masse.
                  > ("massively", where for any other German-speaking country except
                  > maybe Liechtenstein it would of course be "in Maße", "in
                  > moderation").

                  I thought the Swiss used to replace ß by sz but that is apparently
                  wrong, as you pointed out correctly.

                  Mit freundlichen Grüßen
                  Christian
                  --

                  --
                  --
                  You received this message from the "vim_dev" maillist.
                  Do not top-post! Type your reply below the text you are replying to.
                  For more information, visit http://www.vim.org/maillist.php
                Your message has been successfully submitted and would be delivered to recipients shortly.