Loading ...
Sorry, an error occurred while loading the content.

Re: GetDocMatchAll

Expand Messages
  • Flo
    ... Eb, Sheri, and all, This is an another instructive solution presented by Sheri! Regarding ... I would say: The RegEx matches either a line that contains
    Message 1 of 19 , Dec 6, 2008
    • 0 Attachment
      --- In ntb-clips@yahoogroups.com, "ebbtidalflats" <ebbtidalflats@...>
      wrote:
      >
      > Verrry Interrrestingg,
      >
      > Thanks Sheri. Your test lines work as desired.
      > I see the code working, when I single-step. But I do not understand
      > what is happening...

      Eb, Sheri, and all,

      This is an another instructive solution presented by Sheri! Regarding
      Eb's questions, I would like to try the following reply:

      >>"^$GetDocListAll("(?i)(^.*\bTarget\b.*)|(^.+)";"$2\r\n")$"
      >
      > What is the purpose of "|(^.+)"?
      > What is the purpose of "$2\r\n"?

      I would say: The RegEx matches either a line that
      contains "Target" OR it matches any line. If "Target" is found, the
      RegEx Machine immediately stops the search and doesn't care for the
      rest. It replaces the match with $2 and a CRNL. Since the search has
      been stopped, $2 is empty, and the line is replaced with a CRNL only
      (that's why it outputs empty lines in this case). If "Target" isn't
      found, the whole line is captured with $2 (those are the lines you
      want to select since they don't contain "Target").

      So, choosing Sheri's four lines, the function returns...

      [empty][CRNL]
      Subject[CRNL]
      [empty][CRNL]
      Interesting[CRNL]

      This is inserted at the Doc_End and selected.

      Next,...

      ^!Set %result%=^$GetDocReplaceAll("^\R+";"")$

      assigns the selected text to %result%, that is: the hits only,
      removing those empty lines that were produced by ^$GetDocListAll$.

      Finally, the whole text is selected and replaced with the contents of
      %result%, that is: the two matching lines only (if we don't cancel
      the job). I get into trouble with...

      ^!InsertText ^%Empty%

      It removes the hits, but - in the end - those are removed anyway.
      Maybe its useful to see these hits in case we cancel the replacing of
      the whole text with the hits?

      *** I hope this interpretation will survive Sheri's critical eye ;-)
      ***

      Eb, in case you insist on ^$GetDocMatchAll$, I would like to present
      to you another version based on Sheri's concept...


      ^!SetClipboard ^$GetDocMatchAll("(^.*TARGET.*$)|(^.+$)";2)$
      ^!SetClipboard ^$StrReplace(";";"^P";^$GetClipboard$;0;0)$
      ^!Toolbar Paste New
      ^!Replace "^\r\n" >> "" AWRS


      Want some more? What about this...


      ^!SetArray %Lines%=^$GetDocMatchAll(^.+$)$
      ^!Set %Count%=1
      :Loop
      ^!IfMatch "^.*TARGET.*$" "^%Lines^%Count%%" Skip
      ^!Append %Select%=^%Lines^%Count%%^%NL%
      ^!Inc %Count%
      ^!If ^%Count% = ^%Lines0% Out
      ^!Goto Loop
      :Out
      ^!Toolbar New Document
      ^!InsertText ^%Select%
      ^!ClearVariable %Select%


      Eb, maybe it's interesting to understand why your first idea...

      ^$GetDocMatchAll("^[^\r\n]*(?!TARGET)[^\r\n]*")$

      didn't work. I think combining a Negative Lookahead Assertion
      with "[^\r\n]*" can't work.

      The RegEx says: "Find a position from where you don't see "TARGET"
      when looking ahead. This position must be preceded or not preceded or
      followed or not followed by anything that is no CRNL".

      That means: This RegEx is always true. That's why your concept
      selects all lines without excluding lines which contain "TARGET".

      I think in a more conventional way we just would remove lines
      containing "TARGET" in order to get lines NOT containing that word.
      This still works with "good old" ^!Replace...

      ^!Replace "^.*TARGET.*(\r\n|\Z)" >> "" AWRS

      This is my homework for today...

      Flo :-P

      Note: Never post to this forum unlesse you are prepared to make a
      fool of yourself!
       
    • Sheri
      ... You would need to present your doc and clip for help explaining that. Two things come to mind. First, in the pattern I used: (?i)(^.* bTarget b.*)|(^.+)
      Message 2 of 19 , Dec 6, 2008
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, "ebbtidalflats" <ebbtidalflats@...>
        wrote:

        > the code does not work when I
        > remove the New Document command and the test text, then
        > run it on an open document.
        >
        > It still jumps to the end of the doc, and runs the same
        > from then on. And the clip removes 65 lines from the
        > document. But only 24 lines contained the target word,
        > some of which targets were NOT removed!

        You would need to present your doc and clip for help explaining that.
        Two things come to mind. First, in the pattern I used:

        (?i)(^.*\bTarget\b.*)|(^.+)

        the \b's require the target text to be whole words, e.g., "targets",
        "targeting" and "targeted" are not being in/excluded.

        Second, if you change the target text, you need to make sure any
        metacharacters it contains are escaped. You can't for example use a
        variable that might contain metacharacters like "^" without escaping
        them, like "\^".

        Flo has given a very good explanation of what the pattern does.
        Regular expressions search for text that affirmatively matches a
        pattern, not one that doesn't. Even when you use a negative assertion,
        you are actually searching for something that follows from a given
        position.

        Regular expressions always work from left to right. Both of the
        subpatterns match whole lines that are not empty lines. If a line
        fails to match the first subpattern (because it doesn't contain
        "target"), it must match the second subpattern because that subpattern
        matches anything (except linebreaks). The vertical bar says "or"
        between the subpatterns. But if it matches the first subpattern, it
        doesn't try the second. In the format string, we ask only for the
        second subpattern, aka $2 in GetDocListAll or just 2 in GetDocMatchAll.

        There was a long standing bug until the latest NoteTab version that
        prevented this type of pattern from working properly in
        GetDocMatchall. Cheers to Eric that its fixed now. :)

        Regards,
        Sheri
      • Sheri
        Hi Flo, ... That is probably the most direct approach. If for some reason we wanted the result in a variable instead of the document we could use the new
        Message 3 of 19 , Dec 7, 2008
        • 0 Attachment
          Hi Flo,

          > I think in a more conventional way we just would remove lines
          > containing "TARGET" in order to get lines NOT containing that word.
          > This still works with "good old" ^!Replace...
          >
          > ^!Replace "^.*TARGET.*(\r\n|\Z)" >> "" AWRS

          That is probably the most direct approach. If for some reason we
          wanted the result in a variable instead of the document we could use
          the new ^$GetDocReplaceall$ function instead. But I think you meant to
          use \z instead of \Z... :D

          Also my list pattern could have been improved resource-wise by not
          capturing substring #1 (since it wasn't needed), e.g.,

          "(?i)(?:^.*\bTarget\b.*)|(^.+)";"$1\r\n"

          >
          > This is my homework for today...

          A+ :D

          Regards,
          Sheri
        • ebbtidalflats
          Hi Flo, Sheri, That s a lot to digest, and it will take me awhile. I may not get a chance to try your suggestions until this weekend, thanks very much for your
          Message 4 of 19 , Dec 8, 2008
          • 0 Attachment
            Hi Flo, Sheri,


            That's a lot to digest, and it will take me awhile.
            I may not get a chance to try your suggestions until
            this weekend, thanks very much for your tips.

            The document I'm testing has 525 lines. A bit much to
            append. But it's all text, whole words, actually a
            bunch of random SQL statements, which I searched for
            "SELECT" and NOT "SELECT".



            Regards,


            Eb

            --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@...> wrote:
            >
            > > ...

            > You would need to present your doc and clip for help explaining that.
            > Two things come to mind. First, in the pattern I used:
            >
            > (?i)(^.*\bTarget\b.*)|(^.+)
            >
            > the \b's require the target text to be whole words, e.g., "targets",
            > "targeting" and "targeted" are not being in/excluded.
            >
            > Second, if you change the target text, you need to make sure any
            > metacharacters it contains are escaped. You can't for example use a
            > variable that might contain metacharacters like "^" without escaping
            > them, like "\^".
            >
            > Flo has given a very good explanation of what the pattern does.
            > Regular expressions search for text that affirmatively matches a
            > pattern, not one that doesn't. Even when you use a negative assertion,
            > you are actually searching for something that follows from a given
            > position.
            >
            > Regular expressions always work from left to right. Both of the
            > subpatterns match whole lines that are not empty lines. If a line
            > fails to match the first subpattern (because it doesn't contain
            > "target"), it must match the second subpattern because that subpattern
            > matches anything (except linebreaks). The vertical bar says "or"
            > between the subpatterns. But if it matches the first subpattern, it
            > doesn't try the second. In the format string, we ask only for the
            > second subpattern, aka $2 in GetDocListAll or just 2 in GetDocMatchAll.
            >
          • ebbtidalflats
            Sheri, Flo, Thanks very much for your help and explanations. It turns out that there were two reasons Sheri s clip didn t work on MY document: 1. I screwed up
            Message 5 of 19 , Dec 11, 2008
            • 0 Attachment
              Sheri, Flo,

              Thanks very much for your help and explanations.

              It turns out that there were two reasons Sheri's clip didn't work
              on MY document:

              1. I screwed up
              -- didn't replace "target" with the real keyword D=8.

              This resulted in NO lines being removed by THIS test,
              but see below

              2. The code did strip blank lines, intended to remove
              those created by the replace algorithm.

              There happened to be 25 original blank lines
              and 24 keywords in the doc. Coincident!
              25 lines removed, verifying code faked out.

              Result: Wild Goose Chase.

              When I finally pinned this down, I moved the EoLs into the search
              pattern to handle them in the same step. The working part is now down
              to 3 lines:

              ;Sheri's fix, modified to single step
              ;long line ---
              ^!Set
              %keepers%="^$GetDocReplaceAll("(?i)(^.*\bSELECT\b.*\r\n)|(^.+\r\n)";"$2")$"
              ;end long line ---
              ^!Select All
              ^%keepers%
              ;clip end ---


              As long as blocks of text are organized in single lines, the algorithm
              can remove sentences (lines) containing keywords.
              It's complement removes sentences NOT containing the keyword.



              Regards,


              Eb
            • Sheri
              ... %keepers%= ^$GetDocReplaceAll( (?i)(^.* bSELECT b.* r n)|(^.+ r n) ; $2 )$ ... Hi Eb, It s not necessary to use the alternation and substrings if using
              Message 6 of 19 , Dec 11, 2008
              • 0 Attachment
                --- In ntb-clips@yahoogroups.com, "ebbtidalflats" <ebbtidalflats@...>
                wrote:
                >
                > Sheri, Flo,
                >
                > Thanks very much for your help and explanations.
                >
                > It turns out that there were two reasons Sheri's clip didn't work
                > on MY document:
                >
                > 1. I screwed up
                > -- didn't replace "target" with the real keyword D=8.
                >
                > This resulted in NO lines being removed by THIS test,
                > but see below
                >
                > 2. The code did strip blank lines, intended to remove
                > those created by the replace algorithm.
                >
                > There happened to be 25 original blank lines
                > and 24 keywords in the doc. Coincident!
                > 25 lines removed, verifying code faked out.
                >
                > Result: Wild Goose Chase.
                >
                > When I finally pinned this down, I moved the EoLs into the search
                > pattern to handle them in the same step. The working part is now down
                > to 3 lines:
                >
                > ;Sheri's fix, modified to single step
                > ;long line ---
                > ^!Set
                >
                %keepers%="^$GetDocReplaceAll("(?i)(^.*\bSELECT\b.*\r\n)|(^.+\r\n)";"$2")$"
                > ;end long line ---
                > ^!Select All
                > ^%keepers%
                > ;clip end ---
                >
                >
                > As long as blocks of text are organized in single lines, the
                > algorithm can remove sentences (lines) containing keywords. It's
                > complement removes sentences NOT containing the keyword.

                Hi Eb,

                It's not necessary to use the alternation and substrings if using
                ^$GetDocReplaceAll$.

                You would just replace the matching lines with an empty string, just
                like in regex ^!Replace.

                Main difference is, no special switches. If text is selected it
                applies only in the selection. Otherwise it applies to the whole
                document. Other difference is the result isn't pasted into the
                document window (unless you choose to paste it).

                e.g.,

                ^!Set %keepers%="^$GetDocReplaceAll("(?i)(^.*\bSELECT\b.*\r\n)";"")$"

                Regards,
                Sheri
              • ebbtidalflats
                ... Thanks Sheri, I _LIKE_ shorter code. Eb
                Message 7 of 19 , Dec 11, 2008
                • 0 Attachment
                  --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@...> wrote:
                  >
                  > --- In ntb-clips@yahoogroups.com, "ebbtidalflats" <ebbtidalflats@>
                  > wrote:
                  > >
                  > It's not necessary to use the alternation and substrings if using
                  > ^$GetDocReplaceAll$.
                  >
                  > You would just replace the matching lines with an empty string, just
                  > like in regex ^!Replace.
                  >
                  > Main difference is, no special switches. If text is selected it
                  > applies only in the selection. Otherwise it applies to the whole
                  > document. Other difference is the result isn't pasted into the
                  > document window (unless you choose to paste it).
                  >
                  > e.g.,
                  >
                  > ^!Set %keepers%="^$GetDocReplaceAll("(?i)(^.*\bSELECT\b.*\r\n)";"")$"
                  >


                  Thanks Sheri,

                  I _LIKE_ shorter code.

                  Eb
                • hsavage
                  My regex education is lacking and I would appreciate assistance from our regex stars if they have time. I m trying to refine a browser selection clip using,
                  Message 8 of 19 , Dec 12, 2008
                  • 0 Attachment
                    My regex education is lacking and I would appreciate assistance from our
                    regex stars if they have time.

                    I'm trying to refine a browser selection clip using,
                    ^!Set %browser%=^$GetDocMatchaLL("^\[.+\]")$
                    to collect the name headings within the browsers.dat file.

                    The line above works fine but requires 2 ^$StrReplace( lines to get rid
                    of the brackets. Is there a regex answer that will find the bracketed
                    titles and return only the text between them using a modified example of
                    the line above.

                    Help is appreciated.

                    ·············································
                    ºvº SL_day# 347 - created 2008.12.12_15.04.47

                    World's Shortest Books
                    • Different Ways To Spell Bob

                    € hrs € hsavage € pobox € com
                  • Flo
                    ... Harvey, Try... ^!Set %browser%=^$GetDocMatchAll( ^ [ K[^]]+ )$ ^!Info ^%browser% In my tests, the closing bracket ] needs not to be escaped inside the
                    Message 9 of 19 , Dec 12, 2008
                    • 0 Attachment
                      --- In ntb-clips@yahoogroups.com, hsavage <hsavage@...> wrote:
                      >
                      > I'm trying to refine a browser selection clip using,
                      > ^!Set %browser%=^$GetDocMatchaLL("^\[.+\]")$
                      > to collect the name headings within the browsers.dat file.
                      > The line above works fine but requires 2 ^$StrReplace( lines to get
                      > rid of the brackets. Is there a regex answer that will find the
                      > bracketed titles and return only the text between them...

                      Harvey,

                      Try...

                      ^!Set %browser%=^$GetDocMatchAll("^\[\K[^]]+")$
                      ^!Info ^%browser%

                      In my tests, the closing bracket "]" needs not to be escaped inside
                      the Character Class. If you have any problems with that, try [^\]] or
                      [^\x5D].

                      Regards,
                      Flo
                       
                    • hsavage
                      ... Flo, Thanks very much, it appears to work exactly as I wanted and needed.
                      Message 10 of 19 , Dec 12, 2008
                      • 0 Attachment
                        Flo wrote:
                        > Harvey,
                        >
                        > Try...
                        >
                        > ^!Set %browser%=^$GetDocMatchAll("^\[\K[^]]+")$
                        > ^!Info ^%browser%
                        >
                        > In my tests, the closing bracket "]" needs not to be escaped inside
                        > the Character Class. If you have any problems with that, try [^\]] or
                        > [^\x5D].
                        >
                        > Regards,
                        > Flo

                        Flo,

                        Thanks very much, it appears to work exactly as I wanted and needed.

                        ·············································
                        ºvº SL_day# 347 - created 2008.12.12_17.48.10

                        World's Shortest Books
                        • Different Ways To Spell Bob

                        € hrs € hsavage € pobox € com
                      • Sheri
                        ... Hi Harvey, Flo, You consider adding r n [ into the character class e.g.: ^ [ K[^ r n [ ]]+ Although you wouldn t expect to encounter a situation where a
                        Message 11 of 19 , Dec 12, 2008
                        • 0 Attachment
                          --- In ntb-clips@yahoogroups.com, "Flo" <flo.gehrke@...> wrote:
                          >
                          > --- In ntb-clips@yahoogroups.com, hsavage <hsavage@> wrote:
                          > >
                          > > I'm trying to refine a browser selection clip using,
                          > > ^!Set %browser%=^$GetDocMatchaLL("^\[.+\]")$
                          > > to collect the name headings within the browsers.dat file.
                          > > The line above works fine but requires 2 ^$StrReplace( lines to get
                          > > rid of the brackets. Is there a regex answer that will find the
                          > > bracketed titles and return only the text between them...
                          >
                          > Harvey,
                          >
                          > Try...
                          >
                          > ^!Set %browser%=^$GetDocMatchAll("^\[\K[^]]+")$
                          > ^!Info ^%browser%
                          >
                          > In my tests, the closing bracket "]" needs not to be escaped inside
                          > the Character Class. If you have any problems with that, try [^\]] or
                          > [^\x5D].
                          >
                          > Regards,
                          > Flo
                          >  
                          >

                          Hi Harvey, Flo,

                          You consider adding \r\n\[ into the character class e.g.:

                          "^\[\K[^\r\n\[\]]+"

                          Although you wouldn't expect to encounter a situation where a closing
                          bracket is missing and another opening bracket exists before a closing
                          bracket, as is, the pattern would match across multiple lines right up
                          to the next closing bracket.

                          Regards,
                          Sheri
                        • Sheri
                          ... Oops my fingers got ahead of me, I meant to say You might want to consider ... :)
                          Message 12 of 19 , Dec 12, 2008
                          • 0 Attachment
                            --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@...> wrote:
                            > Hi Harvey, Flo,
                            >
                            > You consider adding \r\n\[ into the character class e.g.:

                            Oops my fingers got ahead of me, I meant to say "You might want to
                            consider ..." :)
                          • hsavage
                            ... Sheri, Flo, This works also, thanks again. For those interested this clip picks up the title of the browsers.dat entries, creates an Array from them and
                            Message 13 of 19 , Dec 12, 2008
                            • 0 Attachment
                              Sheri wrote:
                              >
                              > Hi Harvey, Flo,
                              >
                              > You consider adding \r\n\[ into the character class e.g.:
                              >
                              > "^\[\K[^\r\n\[\]]+"
                              >
                              > Although you wouldn't expect to encounter a situation where a closing
                              > bracket is missing and another opening bracket exists before a closing
                              > bracket, as is, the pattern would match across multiple lines right up
                              > to the next closing bracket.
                              >
                              > Regards,
                              > Sheri

                              Sheri, Flo,

                              This works also, thanks again.

                              For those interested this clip picks up the title of the browsers.dat
                              entries, creates an Array from them and uses any number of them to view
                              a html file.

                              Users can choose 1 or greater number of browsers to view the file at the
                              same time.

                              I currently use 6 browsers and can view the file in all 6 by selecting
                              all when the clip is run.

                              Of course, the browsers.dat entries must work correctly for the clip to
                              work.

                              H="MultiBrowsers"
                              ; • Modified-Updated~Created_2008.12.12
                              ; • hrs ø hsavage·pobox·com_09:45:31p
                              ; • Uses Browsers.Dat File Entries
                              ; • create necessary entries in browsers.dat
                              ; • to exclude browsers.dat entries from clip list
                              ; • place semi-colon before name in browsers.dat
                              ^!ClearVariables
                              ^!SetScreenUpdate 0
                              ^!Set %di%=C:\Documents and Settings\User\desktop\emdoc.emd
                              ^!SetWizardWidth 100
                              ^!SetWizardTitle "Select ALTERNATE Browsers"
                              ^!SetWizardLabel "PICK A BROWSER,^%nL%MAY SELECT MORE THAN ONE -"
                              ^!SetListDelimiter |
                              ^!Open ^$GetAppPath$browsers.dat
                              ^!Set %browser%=^$GetDocMatchAll("^\[\K[^\r\n\[\]]+")$
                              ^!Close
                              ^!SetDocIndex ^$GetDocIndex(^%di%)$
                              ; • this first ^!Set %url% line is very long, may get wrapped in email.
                              ^!Set %url%=^?{(T=O;H=11)VIEW THIS FILE, OR, SELECT
                              ANOTHER==C:\Documents and Settings\User\desktop\emdoc.emd};
                              %browser%="^?{(T=A;H=9)BROWSER TO VIEW FILE WITH==^%browser%}"
                              ;
                              ^!Set %url%=^$StrReplace("|";":";"^$FileToUrl(^%url%)$";0;0)$
                              ^!SetArray %browser%=^%browser%
                              ^!Set %count%=^%browser0%; %loop%=0
                              :LOOP
                              ^!Inc %loop%
                              ^!Url ["^%browser^%loop%%"] "^%url%"
                              ^!If ^%loop% < ^%count% LOOP

                              ·············································
                              ºvº SL_day# 347 - created 2008.12.12_23.14.23

                              World's Shortest Books
                              • Different Ways To Spell Bob

                              € hrs € hsavage € pobox € com
                            • hsavage
                              Sheri, Flo, All, In the previous cut&paste some of the variables in the clip were expanded, hopefully this will take care of that problem. H= MultiBrowsers ;
                              Message 14 of 19 , Dec 12, 2008
                              • 0 Attachment
                                Sheri, Flo, All,

                                In the previous cut&paste some of the variables in the clip were
                                expanded, hopefully this will take care of that problem.

                                H="MultiBrowsers"
                                ; • Modified-Updated~Created_2008.12.12
                                ; • hrs ø hsavage·pobox·com_09:45:31p
                                ; • Uses Browsers.Dat File Entries
                                ; • create necessary entries in browsers.dat
                                ; • to exclude browsers.dat entries from clip list
                                ; • place semi-colon before name in browsers.dat
                                ^!ClearVariables
                                ^!SetScreenUpdate 0
                                ^!Set %di%=^##
                                ^!SetWizardWidth 100
                                ^!SetWizardTitle "Select ALTERNATE Browsers"
                                ^!SetWizardLabel "PICK A BROWSER,^%nL%MAY SELECT MORE THAN ONE -"
                                ^!SetListDelimiter |
                                ^!Open ^$GetAppPath$browsers.dat
                                ^!Set %browser%=^$GetDocMatchAll("^\[\K[^\r\n\[\]]+")$
                                ^!Close
                                ^!SetDocIndex ^$GetDocIndex(^%di%)$
                                ; • this first ^!Set %url% line is very long, may get wrapped in email.
                                ^!Set %url%=^?{(T=O;H=11)VIEW THIS FILE, OR, SELECT ANOTHER==^##};
                                %browser%="^?{(T=A;H=9)BROWSER TO VIEW FILE WITH==^%browser%}"
                                ;
                                ^!Set %url%=^$StrReplace("|";":";"^$FileToUrl(^%url%)$";0;0)$
                                ^!SetArray %browser%=^%browser%
                                ^!Set %count%=^%browser0%; %loop%=0
                                :LOOP
                                ^!Inc %loop%
                                ^!Url ["^%browser^%loop%%"] "^%url%"
                                ^!If ^%loop% < ^%count% LOOP

                                --
                                ·············································
                                ºvº SL_day# 347 - created 2008.12.12_23.26.11

                                World's Shortest Books
                                • Different Ways To Spell Bob

                                € hrs € hsavage € pobox € com
                              Your message has been successfully submitted and would be delivered to recipients shortly.