Thanks to all who responded -- all the ideas were good !
Results feedback follows:
The purpose is to divide a huge spool file of telephone bills into multiple
files to be printed in parallel on multiple printers. Each bill starts with
a standard header line.
One of the controlling factors is that the file has to be split on a
document boundary, but the files don't have to be of exactly equal number of
I first made a clip combining the techniques: The Martyn-Tyrell method for
counting the number of documents,
and then splitting the file with the "Hugo technique" of using Find to count
up to some number of documents, then Selecting those lines, Appending to a
file, and then cutting them out and again count/Select/Append/Cut until the
file is all split.
For a file of 8,200 documents, and 500,000 lines, the whole process took 50
min on a 700MHz, 128K box. Not fast enough for production use. Most of the
time seemed to be the repetitive Finds, and particularly the Cut (deleting
some 150,000 lines at a whack).
Solution: I load the file, Get the total linecount (not document count),
and divide by the number of files I want to split it into, yielding
something I call SegmentSize.
Then I jump to line number SegmentSize, and search forward for the end of
the document I landed in, getting EndLine. Then Select from StartLine
(initially 1) to EndLine, and Append the Selection to create a file.
Then I reset StartLine to be EndLine, Jump forward SegmentSize lines,
Search, Select, Append, ... repeating until done.
The Search only ever has to go to the next document start, and I never Cut.
Time now: 56 seconds to split the 8,200 bill file into four files !!!
Beautiful ! Thank you all.
I'll be pleased to post the clip or send it privately if anyone would find
(And yes, the weather is better here than in Europe -- always 24 - 34 deg.
C, but 80% humidity)
(But the skiing sucks!)
Appic (S) Pte Ltd
74A Amoy Street; Singapore 069893
Tel: (+65) 6225-9908 Fax: (+65)6 225-9092
> > I have fairly large files (30 - 100 MB; 500K - 1,500K lines),
> > contain multiple occurrences of a string. I would like to know the
> > number of occurrences, using a clip. The Replace command does not
> > have a "Count Occurrences" option. What command(s) should I use?