  • Dick Gascoigne
    Mar 3, 2002
      Thanks to all who responded -- all the ideas were good !

      Results feedback follows:

      The purpose is to divide a huge spool file of telephone bills into multiple
      files to be printed in parallel on multiple printers. Each bill starts with
      a standard header line.

      One of the controlling factors is that the file has to be split on a
      document boundary, but the files don't have to be of exactly equal number of


      I first made a clip combining the techniques: The Martyn-Tyrell method for
      counting the number of documents,
      and then splitting the file with the "Hugo technique" of using Find to count
      up to some number of documents, then Selecting those lines, Appending to a
      file, and then cutting them out and again count/Select/Append/Cut until the
      file is all split.

      For a file of 8,200 documents, and 500,000 lines, the whole process took 50
      min on a 700MHz, 128K box. Not fast enough for production use. Most of the
      time seemed to be the repetitive Finds, and particularly the Cut (deleting
      some 150,000 lines at a whack).

      Solution: I load the file, Get the total linecount (not document count),
      and divide by the number of files I want to split it into, yielding
      something I call SegmentSize.

      Then I jump to line number SegmentSize, and search forward for the end of
      the document I landed in, getting EndLine. Then Select from StartLine
      (initially 1) to EndLine, and Append the Selection to create a file.

      Then I reset StartLine to be EndLine, Jump forward SegmentSize lines,
      Search, Select, Append, ... repeating until done.

      The Search only ever has to go to the next document start, and I never Cut.
      Time now: 56 seconds to split the 8,200 bill file into four files !!!

      Beautiful ! Thank you all.

      I'll be pleased to post the clip or send it privately if anyone would find
      it useful.

      (And yes, the weather is better here than in Europe -- always 24 - 34 deg.
      C, but 80% humidity)
      (But the skiing sucks!)

      Best Regards,

      Dick Gascoigne
      Appic (S) Pte Ltd
      74A Amoy Street; Singapore 069893
      Tel: (+65) 6225-9908 Fax: (+65)6 225-9092
      Email: dick.gascoigne@...
      Web: www.appic.com

      > > I have fairly large files (30 - 100 MB; 500K - 1,500K lines),
      > which
      > > contain multiple occurrences of a string. I would like to know the
      > > number of occurrences, using a clip. The Replace command does not
      > > have a "Count Occurrences" option. What command(s) should I use?
