Loading ...
Sorry, an error occurred while loading the content.

Re: What command for Count Occurences?

Expand Messages
  • hpaulissen
    Hello Dick in Singapore, I assume you have better weather conditions than we have over here in Europe... ... which ... I have a clip here that works for me
    Message 1 of 4 , Mar 1, 2002
    • 0 Attachment
      Hello Dick in Singapore,

      I assume you have better weather conditions than we have over here in
      Europe...

      > I have fairly large files (30 - 100 MB; 500K - 1,500K lines),
      which
      > ontain multiple occurrences of a string. I would like to know the
      > number of occurrences, using a clip. The Replace command does not
      > have a "Count Occurrences" option. What command(s) should I use?

      I have a clip here that works for me (tested on smaller filesizes
      than you have ;) - I don't know whether you can use it. OK, what it
      does is the following: it finds the number of occurences of a certain
      string in your document; I didn't want to use RegEx-search here
      because of the size of your files (regex is not ideal in these
      circumstances). For every hit the count is incremented. At the end of
      this process the result is shown in a window where you can give in
      how many subdocuments you want to create.

      The Variable %SecondCount% is 'Nr. of Hits' divided by 'Nr. of
      Documents you want'.

      Then the clip goes back to the top again and it starts searching for %
      SecondCount% occurences of your searchstrings. If it has found %
      SecondCount% occurences, everything before the current line is
      selected and appended to a new file. The selection is thrown away and
      the next %SecondCount% occurences are found (ditto)...

      The active document gets smaller and smaller, at the end all that
      remains is selected and appended to the last file.

      Hope this works for you, let us know...

      Hugo

      ^!Jump Doc_START
      ^!Set %Count%=0
      ^!Set %DocNr%=1
      ^!Set %String%=^?{Find string=_Any text you like|Some other text|}
      ^!Set %Path%=^$GetPath(^$GetDocName$)$
      ^!Set %Name%=^$GetName(^$GetDocName$)$
      ^!SetScreenUpdate OFF
      :FIND
      ^!Find "^%String%" S
      ^!IfError DIVIDE
      ^!Inc %Count%
      ^!GoTo FIND
      :DIVIDE
      ^!SetWizardLabel ^%Count% Occurences of [^%String%]
      ^!Set %NrDocuments%=^?{Number of Documents to create...=_No, don't
      split^=1|2|3|4|}
      ^!If ^%NrDocuments%=1 END
      ^!Set %SecondCount%=^$Calc(Round(^%Count%/^%NrDocuments%);0)$
      ^!Jump Doc_START
      ^!Set %Count%=0
      ;Set DocNr
      :FindAGAIN
      ^!Find "^%String%" S
      ^!Inc %Count%
      ^!If ^%Count%>^%SecondCount% SPLIT
      ^!GoTo FINDAGAIN
      :SPLIT
      ^!Jump Line_START
      ^!SelectTo 1:1
      ^!AppendToFile "^%Path%^%Name%_Part^%DocNr%.txt" ^$GetSelection$
      ^!Toolbar Cut
      ^!Inc %DocNr%
      ^!Dec %NrDocuments%
      ^!Set %Count%=0
      ^!If ^%NrDocuments%=1 SelectRest Else FindAGAIN
      :SelectRest
      ^!Select ALL
      ^!If ^$StrSize(^%DocNr%)$=1 ^!Set %DocNR%=0^%DocNR%
      ^!AppendToFile "^%Path%^%Name%_Part^%DocNr%.txt" ^$GetSelection$
      ^!ToolBar Reload Document
      ^!Wait
      ^!Keyboard ENTER
      ^!Info Created ^$Calc(^%DocNR%)$ documents...^%NL%(^%Path%^%Name%_Part
      [x].txt)
    • Martyn Folkes
      This should do what you want (it is 3 lines): ^!Set %string%=^?{String to count}; %filename%=^?{(T=O)Filename} ^!Set
      Message 2 of 4 , Mar 1, 2002
      • 0 Attachment
        This should do what you want (it is 3 lines):

        ^!Set %string%=^?{String to count}; %filename%=^?{(T=O)Filename}
        ^!Set %count%=^$StrCount("^%string%";"^$GetFileText("^%filename%")$";N;N)$
        ^!Prompt There are ^%count% occurences of your search string.

        If you only want to spit the file into 2, it may be easiest to open the file
        and cut and paste half of it into a new file.

        Martyn


        > -----Original Message-----
        > From: bobbit_singapore [mailto:dick.gascoigne@...]
        > Sent: 01 March 2002 07:25
        > To: ntb-clips@yahoogroups.com
        > Subject: [Clip] What command for Count Occurences?
        >
        >
        > Using NoteTab Pro 4.86d:
        >
        > I have fairly large files (30 - 100 MB; 500K - 1,500K lines), which
        > ontain multiple occurrences of a string. I would like to know the
        > number of occurrences, using a clip. The Replace command does not
        > have a "Count Occurrences" option. What command(s) should I use?
        >
        > BTW:
        > The String I'm searchin for represents the header line which starts a
        > new document, of which there are several thousand in a file. My end
        > objective is to divide the file into 'N' files, each with an
        > approximately equal number of documents. IE: if the count is 8,200
        > documents, and 'N' = 2, I want to wind up with two files of 4,100
        > documents each. Is there a best way in NTB to do this, or should I
        > be looking at another tool?
        >
        > Best Regards,
        >
        > Dick Gascoigne
        > dick.gascoigne@...
        > --
        > Appic (S) Pte Ltd
        > 74A Amoy Street; Singapore 069893
        > Tel: (+65) 225-9908 Fax: (+65) 225-9092
        > Email: dick.gascoigne@...
        > Web: www.appic.com
        >
        >
        >
        >
        >
        >
        >
        >
        > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
        >
        >
        >
      • Dick Gascoigne
        Thanks to all who responded -- all the ideas were good ! Results feedback follows: The purpose is to divide a huge spool file of telephone bills into multiple
        Message 3 of 4 , Mar 3, 2002
        • 0 Attachment
          Thanks to all who responded -- all the ideas were good !

          Results feedback follows:

          The purpose is to divide a huge spool file of telephone bills into multiple
          files to be printed in parallel on multiple printers. Each bill starts with
          a standard header line.

          One of the controlling factors is that the file has to be split on a
          document boundary, but the files don't have to be of exactly equal number of
          documents.

          Method:

          I first made a clip combining the techniques: The Martyn-Tyrell method for
          counting the number of documents,
          and then splitting the file with the "Hugo technique" of using Find to count
          up to some number of documents, then Selecting those lines, Appending to a
          file, and then cutting them out and again count/Select/Append/Cut until the
          file is all split.

          For a file of 8,200 documents, and 500,000 lines, the whole process took 50
          min on a 700MHz, 128K box. Not fast enough for production use. Most of the
          time seemed to be the repetitive Finds, and particularly the Cut (deleting
          some 150,000 lines at a whack).

          Solution: I load the file, Get the total linecount (not document count),
          and divide by the number of files I want to split it into, yielding
          something I call SegmentSize.

          Then I jump to line number SegmentSize, and search forward for the end of
          the document I landed in, getting EndLine. Then Select from StartLine
          (initially 1) to EndLine, and Append the Selection to create a file.

          Then I reset StartLine to be EndLine, Jump forward SegmentSize lines,
          Search, Select, Append, ... repeating until done.

          The Search only ever has to go to the next document start, and I never Cut.
          Time now: 56 seconds to split the 8,200 bill file into four files !!!

          Beautiful ! Thank you all.

          I'll be pleased to post the clip or send it privately if anyone would find
          it useful.

          (And yes, the weather is better here than in Europe -- always 24 - 34 deg.
          C, but 80% humidity)
          (But the skiing sucks!)

          Best Regards,

          Dick Gascoigne
          --
          Appic (S) Pte Ltd
          74A Amoy Street; Singapore 069893
          Tel: (+65) 6225-9908 Fax: (+65)6 225-9092
          Email: dick.gascoigne@...
          Web: www.appic.com

          >
          > > I have fairly large files (30 - 100 MB; 500K - 1,500K lines),
          > which
          > > contain multiple occurrences of a string. I would like to know the
          > > number of occurrences, using a clip. The Replace command does not
          > > have a "Count Occurrences" option. What command(s) should I use?
          >
        Your message has been successfully submitted and would be delivered to recipients shortly.