Loading ...
Sorry, an error occurred while loading the content.

Finding matching parentheses

Expand Messages
  • Axel Berger
    In my documents I have this string: foreignlanguage{greek}{Qr onos} To convert that from TeX to HTML I need to find the greek word Qr onos Easy: ^!Find
    Message 1 of 10 , Dec 1, 2012
    • 0 Attachment
      In my documents I have this string:

      \foreignlanguage{greek}{Qr'onos}

      To convert that from TeX to HTML I need to find the greek word "Qr'onos"
      Easy:

      ^!Find "\\foreignlanguage\{greek\}\{([^\}]+)\}" RS1

      But what if the word or phrase contains an inner {} pair? How to I find
      and select from the starting outer { up to its matching closing } ?

      Danke
      Axel
    • Eb
      When matching tag paris in html, I use a loop and a stack to match up pairs. For nested brackets a simlar technique might work. Except there is no need to use
      Message 2 of 10 , Dec 1, 2012
      • 0 Attachment
        When matching tag paris in html, I use a loop and a stack to match up pairs. For nested brackets a simlar technique might work. Except there is no need to use a stack, since the target is always the same. Just use a counter.


        Start with the 1st opening bracket,
        set the counter to 1
        save your cursor position GetRowStart:GetColStart.
        Search for the next bracket (\{)|(\}) (open or close).
        If it is another opening brace, increment a counter
        If a closing brace, decrement the counter.
        When the counter is back to zero, you're done
        Save the END cursor position, and select between
        Start and end.

        You can track the depth of the nesting with a second counter, that you set to max(counte1,counter2). After counter1 returns to zero, counter2 reteins the depth of the deepest nesting.

        This technique also finds the closing brace, if there are parallel nested braces: {...{...}...{...}...}

        You'll need help from a regexpert for doing this without a loop. I suspect it's possible, but haven't a clue how.



        Cheers


        Eb




        --- In ntb-clips@yahoogroups.com, Axel Berger <Axel-Berger@...> wrote:
        >
        > In my documents I have this string:
        >
        > \foreignlanguage{greek}{Qr'onos}
        >
        > To convert that from TeX to HTML I need to find the greek word "Qr'onos"
        > Easy:
        >
        > ^!Find "\\foreignlanguage\{greek\}\{([^\}]+)\}" RS1
        >
        > But what if the word or phrase contains an inner {} pair? How to I find
        > and select from the starting outer { up to its matching closing } ?
        >
        > Danke
        > Axel
        >
      • Axel Berger
        ... Yes, I was afraid I might have to resort to somthing like that. The shame is, NT already has everything needed in Search-- Match Brackets, but no Clips
        Message 3 of 10 , Dec 1, 2012
        • 0 Attachment
          Eb wrote:
          > When the counter is back to zero, you're done
          > Save the END cursor position, and select between
          > Start and end.

          Yes, I was afraid I might have to resort to somthing like that. The
          shame is, NT already has everything needed in Search-->Match Brackets,
          but no Clips function for it.

          Still this works:

          ^!Find "\\foreignlanguage\{greek\}(\{)" RS1
          ^!Menu Search/"Match Brackets"
          ^!Set %var%=^$GetSelection$
          ^!Info ^%var%

          on this minimal file:

          Dummes Gelaber \foreignlanguage{greek}{Qr'onos} mehr Gelaber

          Axel
        • m.feichtinger
          From the help file of NTP62 Regular Expressions, Recursive Pattern : ( ( [^()]++ | (?R) )* ) Or in a lager pattern: ( ( ( [^()]++ | (?1) )* ) ) In your
          Message 4 of 10 , Dec 2, 2012
          • 0 Attachment
            From the help file of NTP62 "Regular Expressions, Recursive Pattern":
            \( ( [^()]++ | (?R) )* \)
            Or in a lager pattern:
            ( \( ( [^()]++ | (?1) )* \) )

            In your case:
            ^!Find "\\foreignlanguage\{greek\}(\{([^{}]++|(?1))*\})" rs1

            The same as named recursion (long line):
            ^!Find "(?x)\\foreignlanguage\{greek\} (?<braces>(?#named) \{ ( [^{}]++ | (?&braces)(?#recursion; reference by name) )* \} )" rs1

            Finds the outer braces and all nested braces between from:
            Dummes Gelaber \foreignlanguage{greek}{Qr'onos} mehr Gelaber
            or (long line):
            Dummes Gelaber \foreignlanguage{greek}{Qr'o {some nested {and some more} braces} nos} mehr Gelaber

            You can then skip the enclosing braces, i.e.
            ^!Set %var%=^$StrCopy("^$GetSelection$";2;^$Calc(^$StrSize("^$GetSelection$")$-2)$)$

            HTH

            --- In ntb-clips@yahoogroups.com, Axel Berger <Axel-Berger@...> wrote:
            >
            > In my documents I have this string:
            >
            > \foreignlanguage{greek}{Qr'onos}
            >
            > To convert that from TeX to HTML I need to find the greek word "Qr'onos"
            > Easy:
            >
            > ^!Find "\\foreignlanguage\{greek\}\{([^\}]+)\}" RS1
            >
            > But what if the word or phrase contains an inner {} pair? How to I find
            > and select from the starting outer { up to its matching closing } ?
            >
            > Danke
            > Axel
            >
          • flo.gehrke
            ... You will achieve the same result without ^$StrCopy$ when writing... ^!Find (?x) foreignlanguage {greek } ( { ( ([^{}]++|(?1) )* ) } ) RS2 So far,
            Message 5 of 10 , Dec 3, 2012
            • 0 Attachment
              --- In ntb-clips@yahoogroups.com, "m.feichtinger" <mafei@...> wrote:
              >
              > In your case:
              > ^!Find "\\foreignlanguage\{greek\}(\{([^{}]++|(?1))*\})" rs1
              >
              > Finds the outer braces and all nested braces between from:
              > Dummes Gelaber \foreignlanguage{greek}{Qr'onos} mehr Gelaber
              > (...)
              > You can then skip the enclosing braces, i.e.
              > ^!Set %var%=^$StrCopy("^$GetSelection$";2;^$Calc(^$StrSize("^$GetSelection$")$-2)$)$

              You will achieve the same result without ^$StrCopy$ when writing...

              ^!Find "(?x)\\foreignlanguage\{greek\} (\{ ( ([^{}]++|(?1) )* )\} )" RS2

              So far, however, we didn't see any "inner {} pair" in the sample strings. Also in...

              Dummes Gelaber \foreignlanguage{greek}{Qr'onos} mehr Gelaber

              there is no sequence of outer and inner brackets but a sequence of two parenthesized strings. So I suppose it's about something like...

              \foreignlanguage{greek}{Axel{Berger}Odenthal}

              Since there isn't much recursion needed here I think a simple pattern like...

              ^!Find "\\foreignlanguage\{greek}\{(.+)}" RS1

              will suffice here.

              If the outer braces should be enclosed in the match (as with Axel's latest clip) try...

              ^!Find "\\foreignlanguage\{greek}(\{.+})" RS1

              Regards,
              Flo
            • Axel Berger
              ... I d rather not include them, that s just what Match Brackets does. I d have had to eliminate them later. ... Yes. Or rather some LaTeX construct like v{s}
              Message 6 of 10 , Dec 3, 2012
              • 0 Attachment
                "flo.gehrke" wrote:
                > If the outer braces should be enclosed in the match
                > (as with Axel's latest clip)

                I'd rather not include them, that's just what Match Brackets does. I'd
                have had to eliminate them later.

                > So I suppose it's about something like...
                > \foreignlanguage{greek}{Axel{Berger}Odenthal}

                Yes. Or rather some LaTeX construct like \v{s} for an accented s.

                > I think a simple pattern like...
                > ^!Find "\\foreignlanguage\{greek}\{(.+)}" RS1
                > will suffice here.

                No, that'll find "Axel{Berger" in the example above, not
                "Axel{Berger}Odenthal"

                Danke
                Axel
              • flo.gehrke
                ... How come? For me, it s perfectly matching the whole string Axel{Berger}Odenthal . Please test it again. I don t have your complete data, so it s difficult
                Message 7 of 10 , Dec 4, 2012
                • 0 Attachment
                  --- In ntb-clips@yahoogroups.com, Axel Berger <Axel-Berger@...> wrote:
                  >
                  > > So I suppose it's about something like...
                  > > \foreignlanguage{greek}{Axel{Berger}Odenthal}
                  > > (...)
                  > > I think a simple pattern like...
                  > > ^!Find "\\foreignlanguage\{greek}\{(.+)}" RS1
                  > > will suffice here.
                  >
                  > No, that'll find "Axel{Berger" in the example above, not
                  > "Axel{Berger}Odenthal"

                  How come? For me, it's perfectly matching the whole string 'Axel{Berger}Odenthal'. Please test it again.

                  I don't have your complete data, so it's difficult to decide this -- but the only problem could possibly be the dot that might cause a lot of back tracking. So it might be more efficient to define the characters that occur between those brackets:

                  ^!Find "\\foreignlanguage\{greek}\{([\w{}']+)}" RS1

                  Regards,
                  Flo
                • Axel Berger
                  ... To be honest, I had not tested, just looked at it, and you re right. It works because you use the greedy find, something I almost never do as a mater of
                  Message 8 of 10 , Dec 4, 2012
                  • 0 Attachment
                    "flo.gehrke" wrote:
                    > Please test it again.

                    To be honest, I had not tested, just looked at it, and you're right. It
                    works because you use the greedy find, something I almost never do as a
                    mater of course. The simple reason:
                    Have

                    \foreignlanguage{greek}{Axel{Berger}Odenthal} some more waffle
                    \foreignlanguage{greek}{Axel{Berger}Odenthal}

                    without any hard line break and that search falls flat on its face.

                    > I don't have your complete data, so it's difficult to decide this

                    To use that clip at all I need it to cover a very general case. Once I
                    begin using the foreignlanguage notation in my database in earnest and
                    begin to rely on automatic HTML conversion, I've no idea what might crop
                    up in there. The only thing I do know is, that nested curly braces are
                    one of the most frequent TeX constructs of all.

                    Axel
                  • flo.gehrke
                    ... That s why I said there could be a problem with the dot and made a second proposal. Would you mind testing this? Thanks. Flo P.S. It s always a problem to
                    Message 9 of 10 , Dec 4, 2012
                    • 0 Attachment
                      --- In ntb-clips@yahoogroups.com, Axel Berger <Axel-Berger@...> wrote:
                      >
                      > "flo.gehrke" wrote:
                      > > Please test it again.
                      >
                      > To be honest, I had not tested, just looked at it, and you're
                      > right. (..) Have
                      >
                      > \foreignlanguage{greek}{Axel{Berger}Odenthal} some more waffle
                      > \foreignlanguage{greek}{Axel{Berger}Odenthal}
                      >
                      > without any hard line break and that search falls flat on its face.

                      That's why I said there could be a problem with the dot and made a second proposal. Would you mind testing this? Thanks.

                      Flo

                      P.S. It's always a problem to deal with an issue without having the data and all details. And it's even less fun if the conditions are changed with each message :-(
                    • Axel Berger
                      ... Yes, I m very sorry about that. I had /meant/ to make clear from the outset, that those {} may contain just about anything and still be a legal TeX
                      Message 10 of 10 , Dec 5, 2012
                      • 0 Attachment
                        "flo.gehrke" wrote:
                        > And it's even less fun if the conditions are changed
                        > with each message :-(

                        Yes, I'm very sorry about that. I had /meant/ to make clear from the
                        outset, that those {} may contain just about "anything" and still be a
                        legal TeX constraint. Just about the only constraint is inner curly
                        braces having to be paired and come in the right order.

                        The example given was an extremely simple one. I already knew Match
                        Brackets copes with anything thrown at it so did not need to try to trip
                        it up. I only needed to make sure that my sequence of Find, Match,
                        GetSelection worked as I wanted it to and for that a simple word
                        sufficed. Your second Regex won't find

                        \foreignlanguage{greek}{Axel {Berger} Odenthal}

                        I just see, I had not stressed from the outset that a _general_ solution
                        was needed. What I have is a database of books and articles and I
                        automatically generate lists of references in LaTeX and HTML. TeX is the
                        main format and NT translates to HTML. What I am working on right now is
                        the best format for incorporating titles in non-latin script into that
                        database.

                        Danke
                        Axel
                      Your message has been successfully submitted and would be delivered to recipients shortly.