20471Re: modifying format of stats tool output & concatenating stats from many files
- Mar 1, 2010"Sheri" <silvermoonwoman@...> wrote:
>Pretty cool. Sorting of this final list of about 780,000 lines took only half a minute. Then killing all "\R(?!\x2A)[^\t]++" (I had an asterisk in front of each word00<tab>value pair) fixed the table up in no time.
> Do it for each subsidiary document. As each is done, tack the
> content of the temporary document onto the end of a consolidated
> variable. At the conclusion of the loop, A separate process
> would do it for whole document, using 00 as the suffix, before
> the subdocument loop. Sort the many, many rows. Paste the sorted
> result into a temporary document. Do something to preserve the
> words that had 00 suffix. Then replace all the linebreak-plus-
> words-with-their-suffixes with the empty string. Fix up the 00
> items one last time. Should end up with one column with words
> (formerly the 00 items) the "All" frequencies, and one column
> for each subdocument's frequencies.
- << Previous post in topic Next post in topic >>