Loading ...
Sorry, an error occurred while loading the content.

What command for Count Occurences?

Expand Messages
  • bobbit_singapore
    Using NoteTab Pro 4.86d: I have fairly large files (30 - 100 MB; 500K - 1,500K lines), which ontain multiple occurrences of a string. I would like to know
    Message 1 of 4 , Feb 28, 2002
    • 0 Attachment
      Using NoteTab Pro 4.86d:

      I have fairly large files (30 - 100 MB; 500K - 1,500K lines), which
      ontain multiple occurrences of a string. I would like to know the
      number of occurrences, using a clip. The Replace command does not
      have a "Count Occurrences" option. What command(s) should I use?

      BTW:
      The String I'm searchin for represents the header line which starts a
      new document, of which there are several thousand in a file. My end
      objective is to divide the file into 'N' files, each with an
      approximately equal number of documents. IE: if the count is 8,200
      documents, and 'N' = 2, I want to wind up with two files of 4,100
      documents each. Is there a best way in NTB to do this, or should I
      be looking at another tool?

      Best Regards,

      Dick Gascoigne
      dick.gascoigne@...
      --
      Appic (S) Pte Ltd
      74A Amoy Street; Singapore 069893
      Tel: (+65) 225-9908 Fax: (+65) 225-9092
      Email: dick.gascoigne@...
      Web: www.appic.com
    • hpaulissen
      Hello Dick in Singapore, I assume you have better weather conditions than we have over here in Europe... ... which ... I have a clip here that works for me
      Message 2 of 4 , Mar 1, 2002
      • 0 Attachment
        Hello Dick in Singapore,

        I assume you have better weather conditions than we have over here in
        Europe...

        > I have fairly large files (30 - 100 MB; 500K - 1,500K lines),
        which
        > ontain multiple occurrences of a string. I would like to know the
        > number of occurrences, using a clip. The Replace command does not
        > have a "Count Occurrences" option. What command(s) should I use?

        I have a clip here that works for me (tested on smaller filesizes
        than you have ;) - I don't know whether you can use it. OK, what it
        does is the following: it finds the number of occurences of a certain
        string in your document; I didn't want to use RegEx-search here
        because of the size of your files (regex is not ideal in these
        circumstances). For every hit the count is incremented. At the end of
        this process the result is shown in a window where you can give in
        how many subdocuments you want to create.

        The Variable %SecondCount% is 'Nr. of Hits' divided by 'Nr. of
        Documents you want'.

        Then the clip goes back to the top again and it starts searching for %
        SecondCount% occurences of your searchstrings. If it has found %
        SecondCount% occurences, everything before the current line is
        selected and appended to a new file. The selection is thrown away and
        the next %SecondCount% occurences are found (ditto)...

        The active document gets smaller and smaller, at the end all that
        remains is selected and appended to the last file.

        Hope this works for you, let us know...

        Hugo

        ^!Jump Doc_START
        ^!Set %Count%=0
        ^!Set %DocNr%=1
        ^!Set %String%=^?{Find string=_Any text you like|Some other text|}
        ^!Set %Path%=^$GetPath(^$GetDocName$)$
        ^!Set %Name%=^$GetName(^$GetDocName$)$
        ^!SetScreenUpdate OFF
        :FIND
        ^!Find "^%String%" S
        ^!IfError DIVIDE
        ^!Inc %Count%
        ^!GoTo FIND
        :DIVIDE
        ^!SetWizardLabel ^%Count% Occurences of [^%String%]
        ^!Set %NrDocuments%=^?{Number of Documents to create...=_No, don't
        split^=1|2|3|4|}
        ^!If ^%NrDocuments%=1 END
        ^!Set %SecondCount%=^$Calc(Round(^%Count%/^%NrDocuments%);0)$
        ^!Jump Doc_START
        ^!Set %Count%=0
        ;Set DocNr
        :FindAGAIN
        ^!Find "^%String%" S
        ^!Inc %Count%
        ^!If ^%Count%>^%SecondCount% SPLIT
        ^!GoTo FINDAGAIN
        :SPLIT
        ^!Jump Line_START
        ^!SelectTo 1:1
        ^!AppendToFile "^%Path%^%Name%_Part^%DocNr%.txt" ^$GetSelection$
        ^!Toolbar Cut
        ^!Inc %DocNr%
        ^!Dec %NrDocuments%
        ^!Set %Count%=0
        ^!If ^%NrDocuments%=1 SelectRest Else FindAGAIN
        :SelectRest
        ^!Select ALL
        ^!If ^$StrSize(^%DocNr%)$=1 ^!Set %DocNR%=0^%DocNR%
        ^!AppendToFile "^%Path%^%Name%_Part^%DocNr%.txt" ^$GetSelection$
        ^!ToolBar Reload Document
        ^!Wait
        ^!Keyboard ENTER
        ^!Info Created ^$Calc(^%DocNR%)$ documents...^%NL%(^%Path%^%Name%_Part
        [x].txt)
      • Martyn Folkes
        This should do what you want (it is 3 lines): ^!Set %string%=^?{String to count}; %filename%=^?{(T=O)Filename} ^!Set
        Message 3 of 4 , Mar 1, 2002
        • 0 Attachment
          This should do what you want (it is 3 lines):

          ^!Set %string%=^?{String to count}; %filename%=^?{(T=O)Filename}
          ^!Set %count%=^$StrCount("^%string%";"^$GetFileText("^%filename%")$";N;N)$
          ^!Prompt There are ^%count% occurences of your search string.

          If you only want to spit the file into 2, it may be easiest to open the file
          and cut and paste half of it into a new file.

          Martyn


          > -----Original Message-----
          > From: bobbit_singapore [mailto:dick.gascoigne@...]
          > Sent: 01 March 2002 07:25
          > To: ntb-clips@yahoogroups.com
          > Subject: [Clip] What command for Count Occurences?
          >
          >
          > Using NoteTab Pro 4.86d:
          >
          > I have fairly large files (30 - 100 MB; 500K - 1,500K lines), which
          > ontain multiple occurrences of a string. I would like to know the
          > number of occurrences, using a clip. The Replace command does not
          > have a "Count Occurrences" option. What command(s) should I use?
          >
          > BTW:
          > The String I'm searchin for represents the header line which starts a
          > new document, of which there are several thousand in a file. My end
          > objective is to divide the file into 'N' files, each with an
          > approximately equal number of documents. IE: if the count is 8,200
          > documents, and 'N' = 2, I want to wind up with two files of 4,100
          > documents each. Is there a best way in NTB to do this, or should I
          > be looking at another tool?
          >
          > Best Regards,
          >
          > Dick Gascoigne
          > dick.gascoigne@...
          > --
          > Appic (S) Pte Ltd
          > 74A Amoy Street; Singapore 069893
          > Tel: (+65) 225-9908 Fax: (+65) 225-9092
          > Email: dick.gascoigne@...
          > Web: www.appic.com
          >
          >
          >
          >
          >
          >
          >
          >
          > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
          >
          >
          >
        • Dick Gascoigne
          Thanks to all who responded -- all the ideas were good ! Results feedback follows: The purpose is to divide a huge spool file of telephone bills into multiple
          Message 4 of 4 , Mar 3, 2002
          • 0 Attachment
            Thanks to all who responded -- all the ideas were good !

            Results feedback follows:

            The purpose is to divide a huge spool file of telephone bills into multiple
            files to be printed in parallel on multiple printers. Each bill starts with
            a standard header line.

            One of the controlling factors is that the file has to be split on a
            document boundary, but the files don't have to be of exactly equal number of
            documents.

            Method:

            I first made a clip combining the techniques: The Martyn-Tyrell method for
            counting the number of documents,
            and then splitting the file with the "Hugo technique" of using Find to count
            up to some number of documents, then Selecting those lines, Appending to a
            file, and then cutting them out and again count/Select/Append/Cut until the
            file is all split.

            For a file of 8,200 documents, and 500,000 lines, the whole process took 50
            min on a 700MHz, 128K box. Not fast enough for production use. Most of the
            time seemed to be the repetitive Finds, and particularly the Cut (deleting
            some 150,000 lines at a whack).

            Solution: I load the file, Get the total linecount (not document count),
            and divide by the number of files I want to split it into, yielding
            something I call SegmentSize.

            Then I jump to line number SegmentSize, and search forward for the end of
            the document I landed in, getting EndLine. Then Select from StartLine
            (initially 1) to EndLine, and Append the Selection to create a file.

            Then I reset StartLine to be EndLine, Jump forward SegmentSize lines,
            Search, Select, Append, ... repeating until done.

            The Search only ever has to go to the next document start, and I never Cut.
            Time now: 56 seconds to split the 8,200 bill file into four files !!!

            Beautiful ! Thank you all.

            I'll be pleased to post the clip or send it privately if anyone would find
            it useful.

            (And yes, the weather is better here than in Europe -- always 24 - 34 deg.
            C, but 80% humidity)
            (But the skiing sucks!)

            Best Regards,

            Dick Gascoigne
            --
            Appic (S) Pte Ltd
            74A Amoy Street; Singapore 069893
            Tel: (+65) 6225-9908 Fax: (+65)6 225-9092
            Email: dick.gascoigne@...
            Web: www.appic.com

            >
            > > I have fairly large files (30 - 100 MB; 500K - 1,500K lines),
            > which
            > > contain multiple occurrences of a string. I would like to know the
            > > number of occurrences, using a clip. The Replace command does not
            > > have a "Count Occurrences" option. What command(s) should I use?
            >
          Your message has been successfully submitted and would be delivered to recipients shortly.