Loading ...
Sorry, an error occurred while loading the content.

Count Occurrences Command - Found

Expand Messages
  • Art Kocsis
    Found it: ^$StrCount( SubStr ; Str ;CaseSensitive;WholeWord)$ Hello, Is there any command or RegEx that will count the occurrence of a ^!Find similar to the
    Message 1 of 7 , Sep 8, 2009
    • 0 Attachment
      Found it: ^$StrCount("SubStr";"Str";CaseSensitive;WholeWord)$

      Hello,

      Is there any command or RegEx that will count the occurrence of a ^!Find
      similar to the 'Count Occurrences" checkbox of the Find & Replace toolbar
      command? Obviously I could write a clip to accomplish this but I am looking
      for a single line statement. All of the option of the toolbar command are
      replicated in the clip command except this one.

      Namaste', Art

      We could beat the odds, if we finally gave up our addiction to getting even
      and got odd instead. -NW

      ----------


      No virus found in this outgoing message.
      Checked by AVG - www.avg.com
      Version: 8.5.412 / Virus Database: 270.13.83/2353 - Release Date: 09/08/09 06:48:00


      [Non-text portions of this message have been removed]
    • ebbtidalflats
      It would not be easy to do this as a one-liner: ^!SetArray %list%=^$GetDocMatchAll... The count is in ^%list0% Cheers, Eb
      Message 2 of 7 , Sep 10, 2009
      • 0 Attachment
        It would not be easy to do this as a one-liner:

        ^!SetArray %list%=^$GetDocMatchAll...

        The count is in "^%list0%"

        Cheers,


        Eb


        --- In ntb-clips@yahoogroups.com, Art Kocsis <artkns@...> wrote:
        >
        > Found it: ^$StrCount("SubStr";"Str";CaseSensitive;WholeWord)$
        >
        > Hello,
        >
        > Is there any command or RegEx that will count the occurrence of a ^!Find
        > similar to the 'Count Occurrences" checkbox of the Find & Replace toolbar
        > command? Obviously I could write a clip to accomplish this but I am looking
        > for a single line statement. All of the option of the toolbar command are
        > replicated in the clip command except this one.
        >
        > Namaste', Art
        >
        > We could beat the odds, if we finally gave up our addiction to getting even
        > and got odd instead. -NW
        >
        > ----------
        >
        >
        > No virus found in this outgoing message.
        > Checked by AVG - www.avg.com
        > Version: 8.5.412 / Virus Database: 270.13.83/2353 - Release Date: 09/08/09 06:48:00
        >
        >
        > [Non-text portions of this message have been removed]
        >
      • flo.gehrke
        ... Hi Eb, I agree with you. Using ^$GetDocMatchAll$ is the better solution. It is just as fast as ^$StrCount$ and it allows to use RegEx. A more refined
        Message 3 of 7 , Sep 11, 2009
        • 0 Attachment
          --- In ntb-clips@yahoogroups.com, "ebbtidalflats" <ebbtidalflats@...> wrote:
          >
          > It would not be easy to do this as a one-liner:
          >
          > ^!SetArray %list%=^$GetDocMatchAll...
          >
          > The count is in "^%list0%"
          >
          > Cheers,
          > Eb

          Hi Eb,

          I agree with you. Using ^$GetDocMatchAll$ is the better solution. It is just as fast as ^$StrCount$ and it allows to use RegEx. A more refined approach would be...


          ^!Set %Term%=^?[Enter search term:]; %Mode%=^?[Search for:==_Whole words^=1|Substrings^=0]; %Ignore%=^?[Ignore case:==_Yes^=1|No^=0]
          ^!Set %Expr%=^%Term%
          ^!IfTrue ^%Mode% Next Else Skip
          ^!Set %Expr%=\b^%Term%\b
          ^!IfFalse ^%Ignore% Next Else Skip_2
          ^!Set %Expr%=(?-i)^%Expr%
          ^!Goto Skip
          ^!Set %Expr%=(?i)^%Expr%
          ^!SetArray %Occurrences%=^$GetDocMatchAll("^%Expr%")$
          ^!If ^%Occurrences0%="" Next Else Skip_2
          ^!Info The search term does not occur in the document
          ^!Goto End
          ^!Info ^%Term% occurs ^%Occurrences0% times


          With regard to compounds, it would probably need some more improvements. For example: When counting "back" as a whole word only, also "back-end" will provide a match.

          Another problem is counting a complete list of search terms within a large file. This could easily be integrated in that clip but the clip execution will soon be unacceptably slow. This pertains to both ^$StrCount$ and ^$GetDocMatchAll$.

          Is there a solution for that? The only way I could think of is using the Text Statistics and extracting a list of your search terms from its result.

          But nevertheless, there's still an issue left with the Text Statistics: It doesn't distinguish between upper and lower case. For example: If a document contains three occurrences of "Report" and two occurrences of "report", the Text Statistics will count five occurrences of "report". That's why I'm still using a Text Analysis tool for jobs like that. AntConc, for example, allows to use a list of search terms saved on your HD.

          Flo
        • silvermoonwoman2001
          ... I prefer to use use a combination of ^$GetDocListAll$ and ^$StrCount$. I ve had some poor results with ^!SetArray. Before ^$GetDocListAll$, I did something
          Message 4 of 7 , Sep 11, 2009
          • 0 Attachment
            --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
            >
            > --- In ntb-clips@yahoogroups.com, "ebbtidalflats" <ebbtidalflats@> wrote:
            > >
            > > It would not be easy to do this as a one-liner:
            > >
            > > ^!SetArray %list%=^$GetDocMatchAll...
            > >
            > > The count is in "^%list0%"
            > >
            > > Cheers,
            > > Eb
            >
            > Hi Eb,
            >
            > I agree with you. Using ^$GetDocMatchAll$ is the better solution. It is just as fast as ^$StrCount$ and it allows to use RegEx.

            I prefer to use use a combination of ^$GetDocListAll$ and ^$StrCount$. I've had some poor results with ^!SetArray. Before ^$GetDocListAll$, I did something similar with ^$GetDocMatchAll$ (counted delimiters, but since the last match in GDMA doesn't get a delimiter, I had to add one to the result).

            With large and numerous matches the assignment of ^$GetDocMatchAll to an array is not fast and consumes excessive resources. In one extreme example (matching series of lines in a 30+ mb document), when using a combination of ^$StrCount and ^$GetDocListAll$, I got the result in seconds, whereas I had to terminate (via task manager) after many minutes the same pattern in ^$GetDocMatchAll with result fed to ^!SetArray. Also if using ^!SetArray you need to be confident that none of the matches will have anything that "looks like" clipcode, particularly semicolon followed by percent sign. That particular combination causes a truncated array and corresponding bad count in the array's item zero.

            Regards,
            Sheri
          • flo.gehrke
            ... Thanks for that hint, Sheri! So far, I ve never tested it with such a big file. Now I took a file with just 15 MB and chose a search term that occurs about
            Message 5 of 7 , Sep 11, 2009
            • 0 Attachment
              --- In ntb-clips@yahoogroups.com, "silvermoonwoman2001" <silvermoonwoman@...> wrote:


              > I prefer to use use a combination of ^$GetDocListAll$ and
              > ^$StrCount$. I've had some poor results with ^!SetArray...

              Thanks for that hint, Sheri!

              So far, I've never tested it with such a big file. Now I took a file with just 15 MB and chose a search term that occurs about 32,000 times.

              I tried the following clip (hoping the combination is correct)...

              ^!Info ^$StrCount("X";"^$GetDocListAll("(?-i)\bterm\b";"X")$;1;0)$

              When comparing that clip with...

              ^!SetArray %Array%=^$GetDocMatchAll("(?-i)\bterm\b")$
              ^!Info ^%Array0%

              ...both clips came to the same result but the first one actually was about five times faster than the second one! So the combination of ^$StrCount$ and ^$GetDocListAll$ obviously is the best solution.

              Flo
            • Art Kocsis
              ... Wow, Lots of responses for what I thought a simple question. I was quite happy with $StrCount for its convenience alone but seeing all the discussion about
              Message 6 of 7 , Sep 11, 2009
              • 0 Attachment
                At 09-11-2009 08:33, you Sheri wrote:

                > > At 09-11-2009 6:47:05, flo wrote:
                > > I agree with you. Using ^$GetDocMatchAll$ is the better solution.
                > > It is just as fast as ^$StrCount$ and it allows to use RegEx.
                >
                >I prefer to use use a combination of ^$GetDocListAll$ and ^$StrCount$.
                >I've had some poor results with ^!SetArray. Before ^$GetDocListAll$, I did
                >something similar with ^$GetDocMatchAll$ (counted delimiters, but since
                >the last match in GDMA doesn't get a delimiter, I had to add one to the
                >result).

                Wow, Lots of responses for what I thought a simple question.

                I was quite happy with $StrCount for its convenience alone but seeing all
                the discussion
                about speed I thought I should at least test it. On a half a dozen files of
                3.6 MB each, the
                execution time was 0 seconds ( according to NTB's time function).

                That's good enough for me! <g>

                ^!Set %start%=^$GetDate(hh:nn:ss)$
                ^!Set %nimg%=^$StrCount(".gif";"^$GetText$";False;False)$
                ^!Set %finsh%=^$GetDate(hh:nn:ss)$
                ^!Continue %nimg% = "^%nimg%", %start% = "^%start%", %finsh% = "^%finsh%"

                Namaste', Art

                Never go skinny dippin' in a crawdad hole.
                Sage of the Catskills

                ----------


                No virus found in this outgoing message.
                Checked by AVG - www.avg.com
                Version: 8.5.412 / Virus Database: 270.13.91/2363 - Release Date: 09/11/09 09:15:00


                [Non-text portions of this message have been removed]
              • ebbtidalflats
                ... To get the correct count in one operation, just prefix the string with an extra delimiter: ^$StrCount( ^%delim% ; ^%delim%^%targetString% ;0;0)$ Cheers Eb
                Message 7 of 7 , Sep 12, 2009
                • 0 Attachment
                  --- In ntb-clips@yahoogroups.com, "silvermoonwoman2001" <silvermoonwoman@...> wrote:
                  >> I prefer to use use a combination of ^$GetDocListAll$ and ^$StrCount$. I've had
                  >> some poor results with ^!SetArray. Before ^$GetDocListAll$, I did something
                  >> similar with ^$GetDocMatchAll$ (counted delimiters, but since the last match in
                  >> GDMA doesn't get a delimiter, I had to add one to the result).


                  To get the correct count in one operation, just prefix
                  the string with an extra delimiter:

                  ^$StrCount("^%delim%";"^%delim%^%targetString%";0;0)$



                  Cheers


                  Eb
                Your message has been successfully submitted and would be delivered to recipients shortly.