Loading ...
Sorry, an error occurred while loading the content.

Re: modifying format of stats tool output & concatenating stats from many files

Expand Messages
  • Sheri
    ... (etc.) [snip] I think I had a better idea than trying to create a paste-able column of frequencies. Instead of above replacement, keep the word column, but
    Message 1 of 26 , Mar 1 6:04 AM
    • 0 Attachment
      --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
      >
      > Both columns combined and sorted:
      >
      > 56
      > 56<tab>1
      > abolish
      > abounds
      > absurd
      > absurd<tab>3
      > abyss
      > accept
      > accident
      > accident<tab>1
      > accord
      > account
      > achieved
      > acknowledge
      > acknowledge<tab>22
      >
      > After:
      > ^!Replace "^([^\t\r]++)\R\1\b" >> "" WARS
      > (Tabs preserved as unique markers for the next step)
      >
      > <tab>1
      > abolish
      > abounds
      > <tab>3
      (etc.)

      [snip]

      I think I had a better idea than trying to create a paste-able column of frequencies. Instead of above replacement, keep the word column, but append an index number (subsidiary document occurrence) to the words.

      ^!Set %suffix%="01"
      ^!Replace "^([^\t\r]++)\R\1\b" >> "$1" RAWS
      ^!Replace "^([^\t\r]++)\K(?=\R)" >> "\t0" RAWS
      ^!Replace "^[^\t\r]++(?=\t)" >> "$0\^%suffix%" RAWS

      For above items, you should have

      5601<tab>1
      abolish01<tab>0
      abounds01<tab>0
      absurd01<tab>3

      Do it for each subsidiary document. As each is done, tack the content of the temporary document onto the end of a consolidated variable. At the conclusion of the loop, A separate process would do it for whole document, using 00 as the suffix, before the subdocument loop. Sort the many, many rows. Paste the sorted result into a temporary document. Do something to preserve the words that had 00 suffix. Then replace all the linebreak-plus-words-with-their-suffixes with the empty string. Fix up the 00 items one last time. Should end up with one column with words (formerly the 00 items) the "All" frequencies, and one column for each subdocument's frequencies.

      Let me know if you were able to follow that :D

      BTW, it might be faster to insert the clipboard contents (from text statistics) under clip control into one quasi-permanent temporary document than creating and destroying multitudes of new documents.

      For sorting, we have ^$StrSort with various options.

      It would be nice if we had a clip function to capture text statistics so the ^!Keyboard tricks and clipboard were not necessary.

      Regards,
      Sheri
    • diodeom
      ... Pretty cool. Sorting of this final list of about 780,000 lines took only half a minute. Then killing all R(?! x2A)[^ t]++ (I had an asterisk in front of
      Message 2 of 26 , Mar 1 8:47 AM
      • 0 Attachment
        "Sheri" <silvermoonwoman@...> wrote:
        >
        > Do it for each subsidiary document. As each is done, tack the
        > content of the temporary document onto the end of a consolidated
        > variable. At the conclusion of the loop, A separate process
        > would do it for whole document, using 00 as the suffix, before
        > the subdocument loop. Sort the many, many rows. Paste the sorted
        > result into a temporary document. Do something to preserve the
        > words that had 00 suffix. Then replace all the linebreak-plus-
        > words-with-their-suffixes with the empty string. Fix up the 00
        > items one last time. Should end up with one column with words
        > (formerly the 00 items) the "All" frequencies, and one column
        > for each subdocument's frequencies.
        >

        Pretty cool. Sorting of this final list of about 780,000 lines took only half a minute. Then killing all "\R(?!\x2A)[^\t]++" (I had an asterisk in front of each word00<tab>value pair) fixed the table up in no time.
      • diodeom
        ... I suppose there is no harm in this minute redundancy where any array elements that were (re)set individually somewhere in the target code are offered in
        Message 3 of 26 , Mar 1 9:18 AM
        • 0 Attachment
          "Sheri" <silvermoonwoman@...> wrote:
          >
          > ;2007-01-14 created by Sheri Pierce
          > ;2010-03-01 Revision by Sheri Pierce
          > ;revision tested with NoteTab 6.2 (PCRE 8.01)
          > ;use to help create clearvariable statements for the clip being edited
          > ^!If ^$GetSelSize$>0 Next Else Skip
          > ^!Continue Some text is highlighted. Only variables set within the selection will be considered.
          > ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable $0\r\n")$"
          > ^!IfEmpty ^%varnames% Next Else Skip_2
          > ^!Info No variables found
          > ^!Goto Clear
          > ^!Set %varnames%="^$StrSort(^%varnames%;No;Yes;Yes)$"
          > ^!Info ^%varnames%
          > ^!Set %varnames%=""
          > :Clear
          > ^!ClearVariable %varnames%
          > ;end of clip
          >

          I suppose there is no harm in this minute redundancy where any array elements that were (re)set individually somewhere in the target code are offered in the results as well.

          [\d\pL_] could be \w, I believe. And I'd guess ^!Set %varnames%="" may be a leftover from before :Clear existed?
        • Don - HtmlFixIt.com
          ... Could one of you break this line down item by item (on the right side of the = sign anyway) for those of us who are slower ... ?i = case insensitive? ^
          Message 4 of 26 , Mar 1 10:05 AM
          • 0 Attachment
            On 3/1/2010 12:18 PM, diodeom wrote:
            > ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable $0\r\n")$"

            Could one of you break this line down item by item (on the right side of
            the = sign anyway) for those of us who are slower ...

            ?i = case insensitive?
            ^ means at start of a line?
            \^ means a carrot actually exists on the text?
            \! means an ! actually exists in the text?
            Set either Array or Code? is actually in the text?
            So this is to update a clip?

            ? means find first ... non-greedy?
            \x20 is a space?
            I'm pretty well lost after that ...

            I even bottom posted cause I want to know.
          • Sheri
            ... w doesn t match high ascii characters, which tho I don t use them in variable names myself, are not prohibited. And I d guess ^!Set %varnames%= may be a
            Message 5 of 26 , Mar 1 12:21 PM
            • 0 Attachment
              --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
              >
              >
              > I suppose there is no harm in this minute redundancy where any
              > array elements that were (re)set individually somewhere in the
              > target code are offered in the results as well.
              >
              > [\d\pL_] could be \w, I believe.

              \w doesn't match high ascii characters, which tho I don't use them in variable names myself, are not prohibited.

              And I'd guess ^!Set %varnames%="" may be a leftover from before :Clear existed?

              There is this in the docs:

              "If you assign an empty value to an array variable, or use the ^!Set command to assign a new value to it, the array is automatically removed from memory."

              ... so out of an abundance of caution, I set the array variable equal to empty string before using ClearVariable on it.

              Regards,
              Sheri
            • diodeom
              ... Locate (case insensitive) either ^!Set or ^!SetArray or ^!SetCode followed by a space OR locate a semicolon followed by a space. The pattern is looking
              Message 6 of 26 , Mar 1 12:32 PM
              • 0 Attachment
                "Don - HtmlFixIt.com" <don@...> wrote:
                >
                > > ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable $0\r\n")$"
                >
                > Could one of you break this line down item by item (on the right side of
                > the = sign anyway)

                Locate (case insensitive) either ^!Set or ^!SetArray or ^!SetCode followed by a space OR locate a semicolon followed by a space. The pattern is looking here for one of the four possible scenarios of declaring variables, the last one being when more than one is set on the same line. Next, \K disregards what was located so far and the subsequent statement ungreedily demands a string of word characters (digits, letters or underscores) between two percentage symbols (that is: a variable, the only captured here string) which precedes an equal sign.

                These captured strings are then inserted ($0 references them) by GetDocListAll in the stored, ready to display ^!ClearVariable statements, each on its own line.
              • Sheri
                ... Hi Don, I ve missed you! Maybe it would help to know that that clip is meant to be run from the clipbar while clipedit is the active document. It shows the
                Message 7 of 26 , Mar 1 12:41 PM
                • 0 Attachment
                  --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
                  >
                  > On 3/1/2010 12:18 PM, diodeom wrote:
                  > > ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable $0\r\n")$"
                  >
                  > Could one of you break this line down item by item (on the right
                  > side of the = sign anyway) for those of us who are slower ...
                  >
                  > ?i = case insensitive?
                  > ^ means at start of a line?
                  > \^ means a carrot actually exists on the text?
                  > \! means an ! actually exists in the text?
                  > Set either Array or Code? is actually in the text?
                  > So this is to update a clip?
                  >
                  > ? means find first ... non-greedy?
                  > \x20 is a space?
                  > I'm pretty well lost after that ...
                  >
                  > I even bottom posted cause I want to know.
                  >

                  Hi Don, I've missed you!

                  Maybe it would help to know that that clip is meant to be run from the clipbar while clipedit is the active document. It shows the names of variables found in clip currently being edited. GetDocListAll gives the opportunity to format the matches, so they are formatted with the ClearVariable command in front of each one. The purpose is just to help make the list of ClearVariable commands needed to clear only the variables actually Set by the clip being written.

                  It only finds %xxx% if:
                  - it follows a line start followed by ^!Set or ^!SetCode or ^!SetArray and one space, OR if it follows a semicolon (anywhere on a line) plus an optional space
                  - it is followed by an equal sign

                  "(?=" signifies a lookahead assertion. There is an equal sign inside the look ahead assertion, (?=\=). It may not need that backslash, but it makes it easier to see (at least for me). So, an equal sign must follow (looking ahead) the %xxx% to be a match. The equal sign is not actually part of the matched text.

                  The search is case insensitive so that ^!set works as well as ^!Set or any other mixed case that might be found.

                  Regards,
                  Sheri
                • Don - HtmlFixIt.com
                  ... Guest appearance ;-) Prodigal son? ... Look ahead and look back assertions are currently over my head -- I need to solve that. It s funny in one way
                  Message 8 of 26 , Mar 1 2:29 PM
                  • 0 Attachment
                    > Hi Don, I've missed you!
                    Guest appearance ;-) Prodigal son?

                    > "(?=" signifies a lookahead assertion. There is an equal sign inside the look ahead assertion, (?=\=). It may not need that backslash, but it makes it easier to see (at least for me). So, an equal sign must follow (looking ahead) the %xxx% to be a match. The equal sign is not actually part of the matched text.

                    Look ahead and look back assertions are currently over my head -- I need
                    to solve that. It's funny in one way because regex won't search
                    backwards, but it will look back ...

                    So that bit that kind of looks like a rear end ... (?=\=) ... really is
                    just to say "followed by an equals sign." It has three characters (?=
                    to say look ahead ... find an equals sign (escaped for good measure) and
                    closing parenthesis.

                    If we wanted the search pattern followed by a letter R it would be (?=R)
                    if I follow correctly ... can it also accept or patterns (?=r|R) and so
                    forth?
                  • diodeom
                    ... My overly concise suggestion didn t imply it well: it was because I m aware of what setting a variable to empty does — and of your high standards —
                    Message 9 of 26 , Mar 2 5:56 AM
                    • 0 Attachment
                      "Sheri" <silvermoonwoman@...> wrote:
                      >
                      > I wrote:
                      > >
                      > > And I'd guess ^!Set %varnames%="" may be a leftover from
                      > > before :Clear existed?
                      >
                      > There is this in the docs:
                      >
                      > "If you assign an empty value to an array variable, or use the
                      > ^!Set command to assign a new value to it, the array is
                      > automatically removed from memory."
                      >
                      > ... so out of an abundance of caution, I set the array variable
                      > equal to empty string before using ClearVariable on it.
                      >

                      My overly concise suggestion didn't imply it well: it was because I'm aware of what setting a variable to empty does — and of your high standards — that I assumed you didn't mean to do the same thing twice. I speculated that maybe as you were writing this clip, initially ^!Set %varnames%="" was the last line, and after adding a provision for cases where no variables are captured, which called for a label :Clear, its removal was overlooked (or an alternative of just placing the label right above it). Sorry, Sheri; I'm sure it's not the last time I guessed incorrectly. (I'm also quite certain it won't stop me from trying... :)
                    • Sheri
                      ... LOL. I m deleting the line, this clip doesn t even create an array for varnames. Probably an earlier version did. Will try to pay better attention next
                      Message 10 of 26 , Mar 2 8:05 AM
                      • 0 Attachment
                        On 3/2/2010 8:56 AM, diodeom wrote:
                        > "Sheri"<silvermoonwoman@...> wrote:
                        >
                        >> I wrote:
                        >>
                        >>> And I'd guess ^!Set %varnames%="" may be a leftover from
                        >>> before :Clear existed?
                        >>>
                        >> There is this in the docs:
                        >>
                        >> "If you assign an empty value to an array variable, or use the
                        >> ^!Set command to assign a new value to it, the array is
                        >> automatically removed from memory."
                        >>
                        >> ... so out of an abundance of caution, I set the array variable
                        >> equal to empty string before using ClearVariable on it.
                        >>
                        >>
                        > My overly concise suggestion didn't imply it well: it was because I'm aware of what setting a variable to empty does — and of your high standards — that I assumed you didn't mean to do the same thing twice. I speculated that maybe as you were writing this clip, initially ^!Set %varnames%="" was the last line, and after adding a provision for cases where no variables are captured, which called for a label :Clear, its removal was overlooked (or an alternative of just placing the label right above it). Sorry, Sheri; I'm sure it's not the last time I guessed incorrectly. (I'm also quite certain it won't stop me from trying... :)
                        >

                        LOL. I'm deleting the line, this clip doesn't even create an array for
                        varnames. Probably an earlier version did. Will try to pay better
                        attention next time.

                        Regards,
                        Sheri
                      • Alec Burgess
                        Sheri (silvermoonwoman@comcast.net) wrote (in part) (on 2010-03-01 at ... I was on holiday (got back yesterday) and am just noodling thru the large number of
                        Message 11 of 26 , Mar 5 4:39 PM
                        • 0 Attachment
                          Sheri (silvermoonwoman@...) wrote (in part) (on 2010-03-01 at
                          15:41):
                          > Maybe it would help to know that that clip is meant to be run from the
                          > clipbar while clipedit is the active document. It shows the names of
                          > variables found in clip currently being edited. GetDocListAll gives
                          > the opportunity to format the matches, so they are formatted with the
                          > ClearVariable command in front of each one. The purpose is just to
                          > help make the list of ClearVariable commands needed to clear only the
                          > variables actually Set by the clip being written.
                          >
                          > It only finds %xxx% if:
                          > - it follows a line start followed by ^!Set or ^!SetCode or ^!SetArray
                          > and one space, OR if it follows a semicolon (anywhere on a line) plus
                          > an optional space
                          > - it is followed by an equal sign
                          I was on holiday (got back yesterday) and am just noodling thru the
                          large number of posts during the past week.

                          I read this thread but didn't test any of the example code.

                          > > > ^!Set %varnames%="
                          >
                          > ^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable
                          > $0\r\n")$"
                          >


                          Sheri: wrt to your "find variables for clearing" clip. You appear from
                          the comments but haven't checked the regex itself, to be expecting a
                          variable xxx to be defined as:
                          ^!set %xxx%=asdf
                          Following works though I use the construct without %...% only
                          accidentally :-[ :
                          ^!set xxx=asdf
                          ^!info ^%xxx%

                          Will your clip capture this usage?

                          >
                          > "(?=" signifies a lookahead assertion. There is an equal sign inside
                          > the look ahead assertion, (?=\=). It may not need that backslash, but
                          > it makes it easier to see (at least for me). So, an equal sign must
                          > follow (looking ahead) the %xxx% to be a match. The equal sign is not
                          > actually part of the matched text.
                          >
                          > The search is case insensitive so that ^!set works as well as ^!Set or
                          > any other mixed case that might be found.


                          --
                          Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)




                          [Non-text portions of this message have been removed]
                        • Sheri
                          ... No, of course not. :) But (sigh) I suppose by failing to capture them, it is remotely possible that some accidental variables would fail to get released.
                          Message 12 of 26 , Mar 6 6:24 AM
                          • 0 Attachment
                            --- In ntb-clips@yahoogroups.com, Alec Burgess <buralex@...> wrote:
                            >
                            > Sheri (silvermoonwoman@...) wrote (in part) (on 2010-03-01 at
                            > 15:41):
                            > >
                            > > ^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable
                            > > $0\r\n")$"
                            > >
                            >
                            > Sheri: wrt to your "find variables for clearing" clip. You appear
                            > from the comments but haven't checked the regex itself, to be
                            > expecting a
                            >
                            > variable xxx to be defined as:
                            > ^!set %xxx%=asdf
                            > Following works though I use the construct without %...% only
                            > accidentally :-[ :
                            > ^!set xxx=asdf
                            > ^!info ^%xxx%
                            >
                            > Will your clip capture this usage?
                            >

                            No, of course not. :)

                            But (sigh) I suppose by failing to capture them, it is remotely possible that some "accidental" variables would fail to get released.

                            So perhaps it would be better to modify the capture part as follows:

                            ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K(%?)([\d\pL_]+?)\3(?=\=)";"^!ClearVariable %$4%\r\n")$"

                            Regards,
                            Sheri
                          Your message has been successfully submitted and would be delivered to recipients shortly.