Loading ...
Sorry, an error occurred while loading the content.

Searching with Boolean Expressions

Expand Messages
  • flo.gehrke
    In the past, we have often discussed the drawbacks to the Search | Search Disk command (Ctrl+D). For example: The contents of Search·Disk.fvr is incomplete
    Message 1 of 14 , May 16, 2011
    • 0 Attachment
      In the past, we have often discussed the drawbacks to the 'Search | Search Disk' command (Ctrl+D). For example: "The contents of Search·Disk.fvr is incomplete except when the search returned 65 or fewer files (Sheri, 6/27/2009, NT Basic Group, message #21243)."

      Today, I would like remember another issue: Search Disk doesn't master Boolean expressions -- see "Searching for multiple text items", Clip Group, May 2009, message #19253 ff. Any attempt to "emulate" a kind of Boolean search with NT turned out to be rather complicated (for example: Sheri, 5/30/2009, #19274).

      Occasionally, the free file searching utility 'Agent Ransack' (AR) was mentioned as a useful work-around and add-on to NT. There is an update to version 2010 available at www.mythicsoft.com. It masters normal and Boolean expressions and RegEx.

      However, I can't see how to limit the AR output to file names only when run from the command line. Is anyone working with that tool and could give me a helping hand?

      So far, I tried the following work-around. Say, we want to find all files (names only) matching the Boolean expression

      'apples NOT (oranges OR bananas)'

      (Note that, in AR, operators must be in upper case.)

      The following clip will prompt you to enter that expression and will search all TXT files in the \Documents directory. The output will be stored in MyAgent.txt. From that file, all file names which have been found will be selected and overwrite the result:


      ; Prompt user to enter an expression
      ^!Set %Expr%=^?{Enter Boolean Expression:}
      ^!SetScreenUpdate Off
      ; Execute AR in background and save search result as MyAgent.txt
      ; Next one long line
      ^!ShellWait "E:\Agent Ransack\AgentRansack.exe" -o "E:\Notetab\Documents\MyAgent.txt" -oft -d "E:\Notetab\Documents" -c "^%Expr%" -ceb
      ; End of long line
      ; Insert search result into an empty NT document
      ^$GetFileText(E:\Notetab\Documents\MyAgent.txt)$
      ; Some delay needed before executing next command
      ^!Delay 10
      ^!SetClipboard ^$GetDocListAll("E:\\.+txt";"$0\r\n")$
      ; Overwrite search result with clipboard contents
      ^!Select All
      ^$GetClipboard$


      For me, this works fine. It needs some seconds, though. I wonder, however, if there is a better solution?

      Regards,
      Flo
    • Sheri
      ... Interesting, I was unaware of any updates since 2003. The update is also available as FileLocator Lite. I chose to install that instead of replacing my old
      Message 2 of 14 , May 16, 2011
      • 0 Attachment
        On 5/16/2011 9:40 AM, flo.gehrke wrote:
        > Occasionally, the free file searching utility 'Agent Ransack' (AR) was mentioned as a useful work-around and add-on to NT. There is an update to version 2010 available at www.mythicsoft.com. It masters normal and Boolean expressions and RegEx.

        Interesting, I was unaware of any updates since 2003. The update is also
        available as FileLocator Lite. I chose to install that instead of
        replacing my old Agent Ransack. Some of your command line options don't
        work in the old version.

        > However, I can't see how to limit the AR output to file names only when run from the command line. Is anyone working with that tool and could give me a helping hand?
        >
        > So far, I tried the following work-around. Say, we want to find all files (names only) matching the Boolean expression
        >
        > 'apples NOT (oranges OR bananas)'
        >
        > (Note that, in AR, operators must be in upper case.)
        >
        > The following clip will prompt you to enter that expression and will search all TXT files in the \Documents directory. The output will be stored in MyAgent.txt. From that file, all file names which have been found will be selected and overwrite the result:
        >
        >
        > ; Prompt user to enter an expression
        > ^!Set %Expr%=^?{Enter Boolean Expression:}
        > ^!SetScreenUpdate Off
        > ; Execute AR in background and save search result as MyAgent.txt
        > ; Next one long line
        > ^!ShellWait "E:\Agent Ransack\AgentRansack.exe" -o "E:\Notetab\Documents\MyAgent.txt" -oft -d "E:\Notetab\Documents" -c "^%Expr%" -ceb
        > ; End of long line
        > ; Insert search result into an empty NT document
        > ^$GetFileText(E:\Notetab\Documents\MyAgent.txt)$
        > ; Some delay needed before executing next command
        > ^!Delay 10
        > ^!SetClipboard ^$GetDocListAll("E:\\.+txt";"$0\r\n")$
        > ; Overwrite search result with clipboard contents
        > ^!Select All
        > ^$GetClipboard$
        >
        >
        > For me, this works fine. It needs some seconds, though. I wonder, however, if there is a better solution?
        >
        > Regards,
        > Flo
        >

        FWIW here are a few observations that might help speed up the NoteTab
        part of it. You should use a clip command where available instead of
        outputting function results as document text. So use either ^!Open or
        ^!InsertFile instead of bare ^$GetFileText. Otherwise the content of
        ^$GetFileText gets unnecessarily evaluated for embedded functions. And
        no need for clipboard overhead; you could use ^!Select All, then
        ^!InsertText ^$GetDocListAll(...) instead. Finally, seems to me the
        pattern should begin with a caret. Otherwise, there is unnecessary
        backtracking looking for unanchored file names. Not sure if you'd still
        need the ^!Delay, I presume its there because the ^!ShellWait ends
        before the output file is available. Possibly it would be faster and
        more reliable to test ^!IfFileExist in a loop instead of using an
        arbitrary Delay. Also, since NoteTab is not updating the screen during
        the ^!ShellWait I think I'd not turn off screen updates, at least not
        until after that the ^!ShellWait complete.

        Regards,
        Sheri
      • Sheri
        ... It looks like you can suppress content lines with -ocn With that option specified, you get file names and file properties. To get file names only, you d
        Message 3 of 14 , May 16, 2011
        • 0 Attachment
          On 5/16/2011 9:40 AM, flo.gehrke wrote:
          > However, I can't see how to limit the AR output to file names only when run from the command line. Is anyone working with that tool and could give me a helping hand?

          It looks like you can suppress content lines with -ocn

          With that option specified, you get file names and file properties. To
          get file names only, you'd still need to manipulate the result, but
          you'd be loading and acting upon a smaller source document.

          Regards,
          Sheri
        • flo.gehrke
          ... Sheri, Thanks for your reply! Following your advice, the clip is much faster when written as... ^!Set %Expr%=^?{Enter Boolean Expression:} ^!ShellWait
          Message 4 of 14 , May 16, 2011
          • 0 Attachment
            --- In ntb-clips@yahoogroups.com, Sheri <silvermoonwoman@...> wrote:
            >
            > FWIW here are a few observations that might help speed up the
            > NoteTab part of it...

            Sheri,

            Thanks for your reply!

            Following your advice, the clip is much faster when written as...

            ^!Set %Expr%=^?{Enter Boolean Expression:}
            ^!ShellWait "E:\Agent Ransack\AgentRansack.exe" -o "E:\Notetab\Documents\MyAgent.txt" -oft -d "E:\Notetab\Documents" -c "^%Expr%" -ceb
            ^!InsertFile "E:\Notetab\Documents\MyAgent.txt"
            ^!Select All
            ^!InsertText ^$GetDocListAll("^E:.+\.txt";"$0\r\n")$

            It's even faster with the -ocn operator you mentioned. In this case, we just have to remove file size, date, and time from the output if necessary...

            ^!Set %Expr%=^?{Enter Boolean Expression:}
            ^!ShellWait "E:\Agent Ransack\AgentRansack.exe" -o "E:\Notetab\Documents\MyAgent.txt" -oft -d "E:\Notetab\Documents" -c "^%Expr%" -ceb -ocn
            ^!InsertFile "E:\Notetab\Documents\MyAgent.txt"
            ^!Replace "^E:.+\.txt\K.+$" >> "" WARS

            Another advantage would be if we could avoid that output file (MyAgent.txt) and insert the result directly into NT. But there seems to be no way to do that, isn't it?

            Regards,
            Flo
          • diodeom
            ... Possibly for many tasks the good old findstr could be a straightforward alternative, where any needed filtering is either piped to another cmd or delegated
            Message 5 of 14 , May 16, 2011
            • 0 Attachment
              Flo wrote:
              >
              > (...) Say, we want to find all files (names only) matching
              > the Boolean expression
              >
              > 'apples NOT (oranges OR bananas)'
              >

              Possibly for many tasks the good old findstr could be a straightforward alternative, where any needed filtering is either piped to another cmd or delegated to cozy string manipulations of NoteTab/PCRE.

              To address the given Boolean case (that I have a really hard time imagining in practical application :), here's a sample compromise (with superfluous refinements left out for clarity):

              ^!ChDir E:\Notetab\Documents
              ^!InsertSelect ^$GetOutput(cmd.exe /c findstr "apples" *.txt |findstr /v "oranges bananas")$
              ^$GetDocListAll("^[^:]++";$0\r\n)$

              BTW, findstr is no slouch when it comes to basic RegEx searches through piles of files -- and it's not limited to plain text either.
            • flo.gehrke
              ... diodeom, Thanks for your pointing out FINDSTR and your clip! In the past, we discussed many aspects of FINDSTR as an alternative to Search Disk. In my
              Message 6 of 14 , May 17, 2011
              • 0 Attachment
                --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
                >
                > Possibly for many tasks the good old findstr could be a
                > straightforward alternative...

                diodeom,

                Thanks for your pointing out FINDSTR and your clip!

                In the past, we discussed many aspects of FINDSTR as an alternative to Search Disk. In my database, the oldest record is from Jane, 7/31/2007, #16823. In case you are interested to follow those discussions you may start with #19253 for example.

                As an DOS-alternative to FINDSTR we also tested BFIND (#19276).

                But, finally, I think Agent Ransack has proved itself the most efficient tool for tasks like that.

                Testing your clip I get no correct result. I created six short files (name / contents, using 'aaaaa' etc not to mix it up with other files in that directory):

                anthony.txt / aaaaa
                bertha.txt / ooooo
                carla.txt / bbbbb
                elsa.txt / aaaaa ooooo
                fred.txt / ooooo bbbbb
                george.txt / aaaaa bbbbb

                According with those contents, I changed your clip to...

                ^!ChDir E:\Notetab\Documents
                ^!InsertSelect ^$GetOutput(cmd.exe /c findstr "aaaaa" *.txt |findstr /v "ooooo bbbbb")$
                ^$GetDocListAll("^[^:]++";$0\r\n)$

                This ends up in an empty document. Without the third line the output is...

                notetab.txt: (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
                pcre.txt: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                pcre.txt: (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()

                What's wrong with that?

                Moreover -- as far as I can see -- FINDSTR offers no Boolean NOT and no parentheses. So how could you define an exclusion like 'x NOT (y OR z)'?

                Flo
              • flo.gehrke
                ... I have to add... A long aaaa... string is occurring in the two files that are being output (notetab.txt, pcre.txt). When moving those two files to
                Message 7 of 14 , May 17, 2011
                • 0 Attachment
                  --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                  >
                  > This ends up in an empty document. Without the third line the
                  > output is...
                  >
                  > notetab.txt: (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
                  > pcre.txt: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                  > pcre.txt: (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()

                  I have to add...

                  A long 'aaaa...' string is occurring in the two files that are being output (notetab.txt, pcre.txt). When moving those two files to another directory the output is completely empty. That is, no one of those six files is being selected whereas my Agent Ransack clips provide correct results.

                  Flo
                • diodeom
                  ... Try placing at least a carriage return at each end of your sample content. Without a wholesome line to work on findstr aaaaa by itself produces
                  Message 8 of 14 , May 17, 2011
                  • 0 Attachment
                    Flo wrote:
                    >
                    > Testing your clip I get no correct result. I created six short files (name / contents, using 'aaaaa' etc not to mix it up with other files in that directory):
                    >
                    > anthony.txt / aaaaa
                    > bertha.txt / ooooo
                    > carla.txt / bbbbb
                    > elsa.txt / aaaaa ooooo
                    > fred.txt / ooooo bbbbb
                    > george.txt / aaaaa bbbbb
                    >

                    Try placing at least a carriage return at each end of your sample content. Without a wholesome line to work on findstr "aaaaa" by itself produces bunched-up results in one line (anthony.txt:aaaaaelsa.txt:aaaaa ooooogeorge.txt:aaaaa bbbbb). The subsequent findstr /v "ooooo bbbbb" attempts to locate a line that *does not* contain either "ooooo" or "bbbbb" -- and obviously fails.
                  • Art Kocsis
                    ... Ditto diodeom,. My development was interrupted at DOS and I was not even aware of FINDSTR. I have just used FIND and tried to work around its limitations.
                    Message 9 of 14 , May 17, 2011
                    • 0 Attachment
                      At 05/17/2011 04:29, you wrote:
                      >--- In <mailto:ntb-clips%40yahoogroups.com>ntb-clips@yahoogroups.com,
                      >"diodeom" <diomir@...> wrote:
                      > > Possibly for many tasks the good old findstr could be a
                      > > straightforward alternative...
                      >diodeom,
                      >
                      >Thanks for your pointing out FINDSTR and your clip!
                      Ditto diodeom,. My development was interrupted at DOS and I was not even aware
                      of FINDSTR. I have just used FIND and tried to work around its limitations.
                      I really
                      need to spend some time looking over the XP commands!


                      >Moreover -- as far as I can see -- FINDSTR offers no Boolean NOT and no
                      >parentheses. So how could you define an exclusion like 'x NOT (y OR z)'?
                      >Flo

                      Flo, the answer your question re exclusions is logical equivalences.

                      Your expression is actually: x AND (NOT (y OR z))

                      The logical equivalent of "NOT (y OR z)" is "(NOT y) AND (NOT z)" so your
                      expression becomes "x AND (NOT y) AND (NOT z)"

                      So to accomplish your expression use FINDSTR three times - once to
                      extract all "x"s, then to exclude all "y"s and finally to exclude all "z"s
                      piping the output each time to the next step. Each pipe is like an AND.

                      findstr x | findstr /v y | findstr /v z

                      Namaste', Art
                    • diodeom
                      ... One can find lines not containing y s or z s in one shot by findstr /v y z where the space serves as OR in this notation. (By contrast, the following
                      Message 10 of 14 , May 17, 2011
                      • 0 Attachment
                        Art wrote:
                        >
                        > So to accomplish your expression use FINDSTR three times - once to
                        > extract all "x"s, then to exclude all "y"s and finally to exclude all "z"s
                        > piping the output each time to the next step. Each pipe is like an AND.
                        >
                        > findstr x | findstr /v y | findstr /v z
                        >

                        One can find lines not containing y's or z's in one shot by findstr /v "y z" where the space serves as OR in this notation. (By contrast, the following locates any lines without the literal "y z" string: findstr /v /c:"y z")
                      • flo.gehrke
                        ... diodeom, Yes, it s working with an additional CRNL. Also I can see the excluding effect now. May I tax your patience with another question? Does FINDSTR
                        Message 11 of 14 , May 17, 2011
                        • 0 Attachment
                          --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:

                          > Try placing at least a carriage return at each end of your
                          > sample content.

                          diodeom,

                          Yes, it's working with an additional CRNL. Also I can see the excluding effect now.

                          May I tax your patience with another question? Does FINDSTR allow even more complex Boolean expressions? A next stage would be to add a third criterion and to exclude combined criteria. Given another file henry.txt...

                          aaaaa
                          bbbbb
                          ccccc

                          Now we want to find 'aaaaa' but exclude files containing 'bbbbb' and 'ccccc'. With Agent Ransack, it works with...

                          'aaaaa NOT (bbbbb AND ccccc)'

                          The correct output will be anthony.txt, elsa.txt, and george.txt.

                          Thanks also to Art Kocsis! Possibly, Art has explained this already but I can't see how to formulate it with FINDSTR.

                          So far, I didn't test it with a greater amount of files. But I'm sure that FINDSTR would be faster -- at least for simple evaluations. On the other hand, Agent Ransack might provide an easier solution.

                          Flo
                        • John Shotsky
                          It s funny, I never thought much about Findstr before I learned regex. Now, when I see what all it can do, I think there are a lot of possibilities in my
                          Message 12 of 14 , May 17, 2011
                          • 0 Attachment
                            It's funny, I never thought much about Findstr before I learned regex. Now, when I see what all it can do, I think there
                            are a lot of possibilities in my clips. Here's Microsoft's reference page:

                            http://technet.microsoft.com/en-us/library/bb490907.aspx



                            Regards,

                            John



                            From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of diodeom
                            Sent: Tuesday, May 17, 2011 07:50
                            To: ntb-clips@yahoogroups.com
                            Subject: [Clip] Re: Searching with Boolean Expressions





                            Art wrote:
                            >
                            > So to accomplish your expression use FINDSTR three times - once to
                            > extract all "x"s, then to exclude all "y"s and finally to exclude all "z"s
                            > piping the output each time to the next step. Each pipe is like an AND.
                            >
                            > findstr x | findstr /v y | findstr /v z
                            >

                            One can find lines not containing y's or z's in one shot by findstr /v "y z" where the space serves as OR in this
                            notation. (By contrast, the following locates any lines without the literal "y z" string: findstr /v /c:"y z")





                            [Non-text portions of this message have been removed]
                          • diodeom
                            ... Findstr operates in the context of individual lines within files, so to implement aaaaa NOT (bbbbb AND ccccc) for entire files, I d probably (at first
                            Message 13 of 14 , May 17, 2011
                            • 0 Attachment
                              Flo wrote:
                              >
                              > Now we want to find 'aaaaa' but exclude files containing 'bbbbb' and 'ccccc'. With Agent Ransack, it works with...
                              >
                              > 'aaaaa NOT (bbbbb AND ccccc)'
                              >

                              Findstr operates in the context of individual lines within files, so to implement "aaaaa NOT (bbbbb AND ccccc)" for entire files, I'd probably (at first glance) build the list of files containing "aaaaa" first:

                              findstr /m aaaaa

                              Then remove dupes, preserve the remnants for the last step and look among them for the first condition of exclusion:

                              findstr /m bbbbb

                              And then among the resulting list of files for the second condition:

                              findstr /m cccccc

                              Where I'd simply end up with any filenames to remove from the first list.

                              Achieving it with a single fancy long-winded call to cmd is probably not for the faint of heart, so I'd either make a .bat or just let Clip manipulate variables and act as glue between the blazing searches.

                              To answer your question, Flo, I don't see an AND provision within the Findstr itself, but command redirection operators (&, &&, | and ||) afford plenty of Boolean logic.
                            • Art Kocsis
                              Duh! I need more sleep. Or need to grow younger. I had just read that an hour earlier. In one eye and out the hole in my head!!! Art
                              Message 14 of 14 , May 17, 2011
                              • 0 Attachment
                                Duh! I need more sleep. Or need to grow younger.
                                I had just read that an hour earlier.
                                In one eye and out the hole in my head!!!

                                Art

                                At 05/17/2011 07:49, diodeom wrote:
                                >Art wrote:
                                > > So to accomplish your expression use FINDSTR three times - once to
                                > > extract all "x"s, then to exclude all "y"s and finally to exclude all "z"s
                                > > piping the output each time to the next step. Each pipe is like an AND.
                                > >
                                > > findstr x | findstr /v y | findstr /v z
                                >
                                >One can find lines not containing y's or z's in one shot by findstr /v "y
                                >z" where the space serves as OR in this notation. (By contrast, the
                                >following locates any lines without the literal "y z" string: findstr /v
                                >/c:"y z")
                              Your message has been successfully submitted and would be delivered to recipients shortly.