20471Re: modifying format of stats tool output & concatenating stats from many files

  • diodeom
    Mar 1, 2010
      "Sheri" <silvermoonwoman@...> wrote:
      > Do it for each subsidiary document. As each is done, tack the
      > content of the temporary document onto the end of a consolidated
      > variable. At the conclusion of the loop, A separate process
      > would do it for whole document, using 00 as the suffix, before
      > the subdocument loop. Sort the many, many rows. Paste the sorted
      > result into a temporary document. Do something to preserve the
      > words that had 00 suffix. Then replace all the linebreak-plus-
      > words-with-their-suffixes with the empty string. Fix up the 00
      > items one last time. Should end up with one column with words
      > (formerly the 00 items) the "All" frequencies, and one column
      > for each subdocument's frequencies.

      Pretty cool. Sorting of this final list of about 780,000 lines took only half a minute. Then killing all "\R(?!\x2A)[^\t]++" (I had an asterisk in front of each word00<tab>value pair) fixed the table up in no time.
