Loading ...
Sorry, an error occurred while loading the content.

Re: modifying format of stats tool output & concatenating stats from many files

Expand Messages
  • Sheri
    ... Thanks, I didn t realize you could use a space in front of additional variable names. Here s a revision (there are a couple of long lines): ;2007-01-14
    Message 1 of 26 , Mar 1, 2010
    • 0 Attachment
      --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:

      > While on this topic: your handy "List Variable Names" clip could
      > maybe include a provision to account for multiple variables
      > declared in a single line, e.g. ^!Set %Var1%=one; %Var2%=two;
      > %Var3%=three.)

      Thanks, I didn't realize you could use a space in front of additional variable names.

      Here's a revision (there are a couple of long lines):

      ;2007-01-14 created by Sheri Pierce
      ;2010-03-01 Revision by Sheri Pierce
      ;revision tested with NoteTab 6.2 (PCRE 8.01)
      ;use to help create clearvariable statements for the clip being edited
      ^!If ^$GetSelSize$>0 Next Else Skip
      ^!Continue Some text is highlighted. Only variables set within the selection will be considered.
      ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable $0\r\n")$"
      ^!IfEmpty ^%varnames% Next Else Skip_2
      ^!Info No variables found
      ^!Goto Clear
      ^!Set %varnames%="^$StrSort(^%varnames%;No;Yes;Yes)$"
      ^!Info ^%varnames%
      ^!Set %varnames%=""
      :Clear
      ^!ClearVariable %varnames%
      ;end of clip

      Regards,
      Sheri
    • Sheri
      ... (etc.) [snip] I think I had a better idea than trying to create a paste-able column of frequencies. Instead of above replacement, keep the word column, but
      Message 2 of 26 , Mar 1, 2010
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
        >
        > Both columns combined and sorted:
        >
        > 56
        > 56<tab>1
        > abolish
        > abounds
        > absurd
        > absurd<tab>3
        > abyss
        > accept
        > accident
        > accident<tab>1
        > accord
        > account
        > achieved
        > acknowledge
        > acknowledge<tab>22
        >
        > After:
        > ^!Replace "^([^\t\r]++)\R\1\b" >> "" WARS
        > (Tabs preserved as unique markers for the next step)
        >
        > <tab>1
        > abolish
        > abounds
        > <tab>3
        (etc.)

        [snip]

        I think I had a better idea than trying to create a paste-able column of frequencies. Instead of above replacement, keep the word column, but append an index number (subsidiary document occurrence) to the words.

        ^!Set %suffix%="01"
        ^!Replace "^([^\t\r]++)\R\1\b" >> "$1" RAWS
        ^!Replace "^([^\t\r]++)\K(?=\R)" >> "\t0" RAWS
        ^!Replace "^[^\t\r]++(?=\t)" >> "$0\^%suffix%" RAWS

        For above items, you should have

        5601<tab>1
        abolish01<tab>0
        abounds01<tab>0
        absurd01<tab>3

        Do it for each subsidiary document. As each is done, tack the content of the temporary document onto the end of a consolidated variable. At the conclusion of the loop, A separate process would do it for whole document, using 00 as the suffix, before the subdocument loop. Sort the many, many rows. Paste the sorted result into a temporary document. Do something to preserve the words that had 00 suffix. Then replace all the linebreak-plus-words-with-their-suffixes with the empty string. Fix up the 00 items one last time. Should end up with one column with words (formerly the 00 items) the "All" frequencies, and one column for each subdocument's frequencies.

        Let me know if you were able to follow that :D

        BTW, it might be faster to insert the clipboard contents (from text statistics) under clip control into one quasi-permanent temporary document than creating and destroying multitudes of new documents.

        For sorting, we have ^$StrSort with various options.

        It would be nice if we had a clip function to capture text statistics so the ^!Keyboard tricks and clipboard were not necessary.

        Regards,
        Sheri
      • diodeom
        ... Pretty cool. Sorting of this final list of about 780,000 lines took only half a minute. Then killing all R(?! x2A)[^ t]++ (I had an asterisk in front of
        Message 3 of 26 , Mar 1, 2010
        • 0 Attachment
          "Sheri" <silvermoonwoman@...> wrote:
          >
          > Do it for each subsidiary document. As each is done, tack the
          > content of the temporary document onto the end of a consolidated
          > variable. At the conclusion of the loop, A separate process
          > would do it for whole document, using 00 as the suffix, before
          > the subdocument loop. Sort the many, many rows. Paste the sorted
          > result into a temporary document. Do something to preserve the
          > words that had 00 suffix. Then replace all the linebreak-plus-
          > words-with-their-suffixes with the empty string. Fix up the 00
          > items one last time. Should end up with one column with words
          > (formerly the 00 items) the "All" frequencies, and one column
          > for each subdocument's frequencies.
          >

          Pretty cool. Sorting of this final list of about 780,000 lines took only half a minute. Then killing all "\R(?!\x2A)[^\t]++" (I had an asterisk in front of each word00<tab>value pair) fixed the table up in no time.
        • diodeom
          ... I suppose there is no harm in this minute redundancy where any array elements that were (re)set individually somewhere in the target code are offered in
          Message 4 of 26 , Mar 1, 2010
          • 0 Attachment
            "Sheri" <silvermoonwoman@...> wrote:
            >
            > ;2007-01-14 created by Sheri Pierce
            > ;2010-03-01 Revision by Sheri Pierce
            > ;revision tested with NoteTab 6.2 (PCRE 8.01)
            > ;use to help create clearvariable statements for the clip being edited
            > ^!If ^$GetSelSize$>0 Next Else Skip
            > ^!Continue Some text is highlighted. Only variables set within the selection will be considered.
            > ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable $0\r\n")$"
            > ^!IfEmpty ^%varnames% Next Else Skip_2
            > ^!Info No variables found
            > ^!Goto Clear
            > ^!Set %varnames%="^$StrSort(^%varnames%;No;Yes;Yes)$"
            > ^!Info ^%varnames%
            > ^!Set %varnames%=""
            > :Clear
            > ^!ClearVariable %varnames%
            > ;end of clip
            >

            I suppose there is no harm in this minute redundancy where any array elements that were (re)set individually somewhere in the target code are offered in the results as well.

            [\d\pL_] could be \w, I believe. And I'd guess ^!Set %varnames%="" may be a leftover from before :Clear existed?
          • Don - HtmlFixIt.com
            ... Could one of you break this line down item by item (on the right side of the = sign anyway) for those of us who are slower ... ?i = case insensitive? ^
            Message 5 of 26 , Mar 1, 2010
            • 0 Attachment
              On 3/1/2010 12:18 PM, diodeom wrote:
              > ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable $0\r\n")$"

              Could one of you break this line down item by item (on the right side of
              the = sign anyway) for those of us who are slower ...

              ?i = case insensitive?
              ^ means at start of a line?
              \^ means a carrot actually exists on the text?
              \! means an ! actually exists in the text?
              Set either Array or Code? is actually in the text?
              So this is to update a clip?

              ? means find first ... non-greedy?
              \x20 is a space?
              I'm pretty well lost after that ...

              I even bottom posted cause I want to know.
            • Sheri
              ... w doesn t match high ascii characters, which tho I don t use them in variable names myself, are not prohibited. And I d guess ^!Set %varnames%= may be a
              Message 6 of 26 , Mar 1, 2010
              • 0 Attachment
                --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
                >
                >
                > I suppose there is no harm in this minute redundancy where any
                > array elements that were (re)set individually somewhere in the
                > target code are offered in the results as well.
                >
                > [\d\pL_] could be \w, I believe.

                \w doesn't match high ascii characters, which tho I don't use them in variable names myself, are not prohibited.

                And I'd guess ^!Set %varnames%="" may be a leftover from before :Clear existed?

                There is this in the docs:

                "If you assign an empty value to an array variable, or use the ^!Set command to assign a new value to it, the array is automatically removed from memory."

                ... so out of an abundance of caution, I set the array variable equal to empty string before using ClearVariable on it.

                Regards,
                Sheri
              • diodeom
                ... Locate (case insensitive) either ^!Set or ^!SetArray or ^!SetCode followed by a space OR locate a semicolon followed by a space. The pattern is looking
                Message 7 of 26 , Mar 1, 2010
                • 0 Attachment
                  "Don - HtmlFixIt.com" <don@...> wrote:
                  >
                  > > ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable $0\r\n")$"
                  >
                  > Could one of you break this line down item by item (on the right side of
                  > the = sign anyway)

                  Locate (case insensitive) either ^!Set or ^!SetArray or ^!SetCode followed by a space OR locate a semicolon followed by a space. The pattern is looking here for one of the four possible scenarios of declaring variables, the last one being when more than one is set on the same line. Next, \K disregards what was located so far and the subsequent statement ungreedily demands a string of word characters (digits, letters or underscores) between two percentage symbols (that is: a variable, the only captured here string) which precedes an equal sign.

                  These captured strings are then inserted ($0 references them) by GetDocListAll in the stored, ready to display ^!ClearVariable statements, each on its own line.
                • Sheri
                  ... Hi Don, I ve missed you! Maybe it would help to know that that clip is meant to be run from the clipbar while clipedit is the active document. It shows the
                  Message 8 of 26 , Mar 1, 2010
                  • 0 Attachment
                    --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
                    >
                    > On 3/1/2010 12:18 PM, diodeom wrote:
                    > > ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable $0\r\n")$"
                    >
                    > Could one of you break this line down item by item (on the right
                    > side of the = sign anyway) for those of us who are slower ...
                    >
                    > ?i = case insensitive?
                    > ^ means at start of a line?
                    > \^ means a carrot actually exists on the text?
                    > \! means an ! actually exists in the text?
                    > Set either Array or Code? is actually in the text?
                    > So this is to update a clip?
                    >
                    > ? means find first ... non-greedy?
                    > \x20 is a space?
                    > I'm pretty well lost after that ...
                    >
                    > I even bottom posted cause I want to know.
                    >

                    Hi Don, I've missed you!

                    Maybe it would help to know that that clip is meant to be run from the clipbar while clipedit is the active document. It shows the names of variables found in clip currently being edited. GetDocListAll gives the opportunity to format the matches, so they are formatted with the ClearVariable command in front of each one. The purpose is just to help make the list of ClearVariable commands needed to clear only the variables actually Set by the clip being written.

                    It only finds %xxx% if:
                    - it follows a line start followed by ^!Set or ^!SetCode or ^!SetArray and one space, OR if it follows a semicolon (anywhere on a line) plus an optional space
                    - it is followed by an equal sign

                    "(?=" signifies a lookahead assertion. There is an equal sign inside the look ahead assertion, (?=\=). It may not need that backslash, but it makes it easier to see (at least for me). So, an equal sign must follow (looking ahead) the %xxx% to be a match. The equal sign is not actually part of the matched text.

                    The search is case insensitive so that ^!set works as well as ^!Set or any other mixed case that might be found.

                    Regards,
                    Sheri
                  • Don - HtmlFixIt.com
                    ... Guest appearance ;-) Prodigal son? ... Look ahead and look back assertions are currently over my head -- I need to solve that. It s funny in one way
                    Message 9 of 26 , Mar 1, 2010
                    • 0 Attachment
                      > Hi Don, I've missed you!
                      Guest appearance ;-) Prodigal son?

                      > "(?=" signifies a lookahead assertion. There is an equal sign inside the look ahead assertion, (?=\=). It may not need that backslash, but it makes it easier to see (at least for me). So, an equal sign must follow (looking ahead) the %xxx% to be a match. The equal sign is not actually part of the matched text.

                      Look ahead and look back assertions are currently over my head -- I need
                      to solve that. It's funny in one way because regex won't search
                      backwards, but it will look back ...

                      So that bit that kind of looks like a rear end ... (?=\=) ... really is
                      just to say "followed by an equals sign." It has three characters (?=
                      to say look ahead ... find an equals sign (escaped for good measure) and
                      closing parenthesis.

                      If we wanted the search pattern followed by a letter R it would be (?=R)
                      if I follow correctly ... can it also accept or patterns (?=r|R) and so
                      forth?
                    • diodeom
                      ... My overly concise suggestion didn t imply it well: it was because I m aware of what setting a variable to empty does — and of your high standards —
                      Message 10 of 26 , Mar 2, 2010
                      • 0 Attachment
                        "Sheri" <silvermoonwoman@...> wrote:
                        >
                        > I wrote:
                        > >
                        > > And I'd guess ^!Set %varnames%="" may be a leftover from
                        > > before :Clear existed?
                        >
                        > There is this in the docs:
                        >
                        > "If you assign an empty value to an array variable, or use the
                        > ^!Set command to assign a new value to it, the array is
                        > automatically removed from memory."
                        >
                        > ... so out of an abundance of caution, I set the array variable
                        > equal to empty string before using ClearVariable on it.
                        >

                        My overly concise suggestion didn't imply it well: it was because I'm aware of what setting a variable to empty does — and of your high standards — that I assumed you didn't mean to do the same thing twice. I speculated that maybe as you were writing this clip, initially ^!Set %varnames%="" was the last line, and after adding a provision for cases where no variables are captured, which called for a label :Clear, its removal was overlooked (or an alternative of just placing the label right above it). Sorry, Sheri; I'm sure it's not the last time I guessed incorrectly. (I'm also quite certain it won't stop me from trying... :)
                      • Sheri
                        ... LOL. I m deleting the line, this clip doesn t even create an array for varnames. Probably an earlier version did. Will try to pay better attention next
                        Message 11 of 26 , Mar 2, 2010
                        • 0 Attachment
                          On 3/2/2010 8:56 AM, diodeom wrote:
                          > "Sheri"<silvermoonwoman@...> wrote:
                          >
                          >> I wrote:
                          >>
                          >>> And I'd guess ^!Set %varnames%="" may be a leftover from
                          >>> before :Clear existed?
                          >>>
                          >> There is this in the docs:
                          >>
                          >> "If you assign an empty value to an array variable, or use the
                          >> ^!Set command to assign a new value to it, the array is
                          >> automatically removed from memory."
                          >>
                          >> ... so out of an abundance of caution, I set the array variable
                          >> equal to empty string before using ClearVariable on it.
                          >>
                          >>
                          > My overly concise suggestion didn't imply it well: it was because I'm aware of what setting a variable to empty does — and of your high standards — that I assumed you didn't mean to do the same thing twice. I speculated that maybe as you were writing this clip, initially ^!Set %varnames%="" was the last line, and after adding a provision for cases where no variables are captured, which called for a label :Clear, its removal was overlooked (or an alternative of just placing the label right above it). Sorry, Sheri; I'm sure it's not the last time I guessed incorrectly. (I'm also quite certain it won't stop me from trying... :)
                          >

                          LOL. I'm deleting the line, this clip doesn't even create an array for
                          varnames. Probably an earlier version did. Will try to pay better
                          attention next time.

                          Regards,
                          Sheri
                        • Alec Burgess
                          Sheri (silvermoonwoman@comcast.net) wrote (in part) (on 2010-03-01 at ... I was on holiday (got back yesterday) and am just noodling thru the large number of
                          Message 12 of 26 , Mar 5, 2010
                          • 0 Attachment
                            Sheri (silvermoonwoman@...) wrote (in part) (on 2010-03-01 at
                            15:41):
                            > Maybe it would help to know that that clip is meant to be run from the
                            > clipbar while clipedit is the active document. It shows the names of
                            > variables found in clip currently being edited. GetDocListAll gives
                            > the opportunity to format the matches, so they are formatted with the
                            > ClearVariable command in front of each one. The purpose is just to
                            > help make the list of ClearVariable commands needed to clear only the
                            > variables actually Set by the clip being written.
                            >
                            > It only finds %xxx% if:
                            > - it follows a line start followed by ^!Set or ^!SetCode or ^!SetArray
                            > and one space, OR if it follows a semicolon (anywhere on a line) plus
                            > an optional space
                            > - it is followed by an equal sign
                            I was on holiday (got back yesterday) and am just noodling thru the
                            large number of posts during the past week.

                            I read this thread but didn't test any of the example code.

                            > > > ^!Set %varnames%="
                            >
                            > ^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable
                            > $0\r\n")$"
                            >


                            Sheri: wrt to your "find variables for clearing" clip. You appear from
                            the comments but haven't checked the regex itself, to be expecting a
                            variable xxx to be defined as:
                            ^!set %xxx%=asdf
                            Following works though I use the construct without %...% only
                            accidentally :-[ :
                            ^!set xxx=asdf
                            ^!info ^%xxx%

                            Will your clip capture this usage?

                            >
                            > "(?=" signifies a lookahead assertion. There is an equal sign inside
                            > the look ahead assertion, (?=\=). It may not need that backslash, but
                            > it makes it easier to see (at least for me). So, an equal sign must
                            > follow (looking ahead) the %xxx% to be a match. The equal sign is not
                            > actually part of the matched text.
                            >
                            > The search is case insensitive so that ^!set works as well as ^!Set or
                            > any other mixed case that might be found.


                            --
                            Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)




                            [Non-text portions of this message have been removed]
                          • Sheri
                            ... No, of course not. :) But (sigh) I suppose by failing to capture them, it is remotely possible that some accidental variables would fail to get released.
                            Message 13 of 26 , Mar 6, 2010
                            • 0 Attachment
                              --- In ntb-clips@yahoogroups.com, Alec Burgess <buralex@...> wrote:
                              >
                              > Sheri (silvermoonwoman@...) wrote (in part) (on 2010-03-01 at
                              > 15:41):
                              > >
                              > > ^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable
                              > > $0\r\n")$"
                              > >
                              >
                              > Sheri: wrt to your "find variables for clearing" clip. You appear
                              > from the comments but haven't checked the regex itself, to be
                              > expecting a
                              >
                              > variable xxx to be defined as:
                              > ^!set %xxx%=asdf
                              > Following works though I use the construct without %...% only
                              > accidentally :-[ :
                              > ^!set xxx=asdf
                              > ^!info ^%xxx%
                              >
                              > Will your clip capture this usage?
                              >

                              No, of course not. :)

                              But (sigh) I suppose by failing to capture them, it is remotely possible that some "accidental" variables would fail to get released.

                              So perhaps it would be better to modify the capture part as follows:

                              ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K(%?)([\d\pL_]+?)\3(?=\=)";"^!ClearVariable %$4%\r\n")$"

                              Regards,
                              Sheri
                            Your message has been successfully submitted and would be delivered to recipients shortly.