Loading ...
Sorry, an error occurred while loading the content.

modifying format of stats tool output & concatenating stats from many files

Expand Messages
  • arthurkendall80
    This is my third day with NoteTab so please let me know if I am re-inventing the wheel. I am getting started on statistical exploration of text data. Diodeom
    Message 1 of 26 , Feb 22, 2010
    • 0 Attachment
      This is my third day with NoteTab so please let me know if I am re-inventing the wheel.


      I am getting started on statistical exploration of text data.
      Diodeom on this list was kind enough to create a clip to break out a .txt file that was a concatenation of documents. Diodeom also wrote a clip to run the text statistics tool on each .txt file in a folder and create a file of stats for each.

      Are either of the following reasonable things for a newbie to try to do with Notetab?

      My short term goal is to have a general procedure to use on sets of documents. In the long term I want to apply this to sets of documents like the reports countries make to the UN on their promotion of Human Rights activities. I would like to create a tab separated file with no blanks in the labels. This will end up as a table with 1)a column for the word or total label, 2) a column for the frequency over all the files in the folder, 3 to K) frequencies for each of K separate files in the folder.

      The application I'll be passing the file to cannot have blanks in the column header name or row labels in the first column. The subsequent rows would be based on the list found for the concatenation of all the files with zeros in the cells when a word does not occur in a file. There would be a header row and a row for every word/item found in the whole set of files.

      The table would be something like this where "<tab>" indicates a tab character.

      Word<tab>All_files<tab>file_name_1<tab>file_name_2<tab>file_name_3 ...<tab>file_name_last
      (<tab>80<tab>5<tab>6<tab>7...<tab>11
      )<tab>80<tab>5<tab>6<tab>7...<tab>11
      a<tab>500<tab>15<tab>16<tab>17...<tab>21
      ...
      zebra<tab>8<tab>5<tab>0<tab>0...<tab>0
      words_items<tab>620<tab>123<tab>456<tab>122...<tab>77
      Total_Words<tab>1608<tab>123<tab>456<tab>122...<tab>77
      Total_Punctuation<tab>163<tab>123<tab>456<tab>...122<tab>77
      Total_Other Text<tab>3<tab>123<tab>456<tab>122...<tab>77
      Total_Characters<tab>9346<tab>123<tab>456<tab>...122<tab>77
      Total_Paragraphs<tab>108<tab>123<tab>456<tab>122...<tab>77

      If assembling all the frequencies into a table is horrendous,
      how hard would it be to modify the output of the text statistics tool to
      do this for each file? I could then do the assembly of the table with SPSS a statistical package.

      Word<tab>file_name
      (<tab>80
      )<tab>80
      a<tab>11
      ...
      zebra<tab>2
      words_items<tab>620
      Total_Words<tab>1608
      Total_Punctuation<tab>163
      Total_Other Text<tab>3
      Total_Characters<tab>9346
      Total_Paragraphs<tab>108

      If neither of the above is practical, I could manually edit the stat files and drop the percent columns when reading the files into the stat package.
    • Sheri
      On 2/22/2010 9:08 AM, arthurkendall80 wrote: [snip] ... To me, the first sounded horrendous, the second, easy. You can open one of your text statistics files
      Message 2 of 26 , Feb 22, 2010
      • 0 Attachment
        On 2/22/2010 9:08 AM, arthurkendall80 wrote:
        [snip]
        > If assembling all the frequencies into a table is horrendous,
        > how hard would it be to modify the output of the text statistics tool to
        > do this for each file? I could then do the assembly of the table with SPSS a statistical package.
        >
        > Word<tab>file_name
        > (<tab>80
        > )<tab>80
        > a<tab>11
        > ...
        > zebra<tab>2
        > words_items<tab>620
        > Total_Words<tab>1608
        > Total_Punctuation<tab>163
        > Total_Other Text<tab>3
        > Total_Characters<tab>9346
        > Total_Paragraphs<tab>108
        >
        > If neither of the above is practical, I could manually edit the stat files and drop the percent columns when reading the files into the stat package.
        >
        >

        To me, the first sounded horrendous, the second, easy.

        You can open one of your text statistics files and try this clip. If it
        does what you want, this clip could be run in a loop, e.g., for each of
        the target files. Instructions would need to be added for opening,
        saving, and closing the files in sequence.

        ^!Replace "^PDifferent words/items counted" >> "words_items" TAWS
        ^!Replace "\t[\d\.]++$" >> "" RAWS
        ^!Replace ":\x20(\d++)$" >> "\t$1" RAWS
        ^!Replace "(?i)[a-z]++\K\x20" >> "_" RAWS
        ^!Replace "^Word\t\KFrequency\t%\R">>"^$GetDocName$" RWS
        ;end of clip

        BTW, if you right click in the clip panel there is an option to "Add
        from clipboard". That's the most convenient way to transfer clips from
        email to NoteTab. The H= line (if any) provides the clip name. Otherwise
        you have to enter one.

        If you actually paste or type an H= line into the clip-edit window, when
        the clip is run it will paste the line into the active document (because
        the line doesn't start with a command). You would have to be editing the
        .clb clip library as a document to be able to actually paste or type the
        H= line and have it behave as a clip header instead of one of the text
        or instruction lines in the clip. You personalized .clb files are stored
        in the Application Data area, e.g., for XP and NoteTab Pro, they are at
        C:\Documents and Settings\User\Application Data\NoteTab Pro\Libraries

        Regards,
        Sheri
      • arthurkendall80
        I appreciate your help. I tried using the clip. It is getting there. It put in the whole filespec instead of just the filename and had a hardpage in the
        Message 3 of 26 , Feb 22, 2010
        • 0 Attachment
          I appreciate your help.

          I tried using the clip. It is getting there.
          It put in the whole filespec instead of just the filename and had a hardpage in the middle of the pasted filespec.
          It lost the column of frequencies.

          Also, where in the "Stats Please" clip that Diodeom posted would I insert these commands?

          Thank you.

          Art

          --- In ntb-clips@yahoogroups.com, Sheri <silvermoonwoman@...> wrote:
          >
          > On 2/22/2010 9:08 AM, arthurkendall80 wrote:
          > [snip]
          > > If assembling all the frequencies into a table is horrendous,
          > > how hard would it be to modify the output of the text statistics tool to
          > > do this for each file? I could then do the assembly of the table with SPSS a statistical package.
          > >
          > > Word<tab>file_name
          > > (<tab>80
          > > )<tab>80
          > > a<tab>11
          > > ...
          > > zebra<tab>2
          > > words_items<tab>620
          > > Total_Words<tab>1608
          > > Total_Punctuation<tab>163
          > > Total_Other Text<tab>3
          > > Total_Characters<tab>9346
          > > Total_Paragraphs<tab>108
          > >
          > > If neither of the above is practical, I could manually edit the stat files and drop the percent columns when reading the files into the stat package.
          > >
          > >
          >
          > To me, the first sounded horrendous, the second, easy.
          >
          > You can open one of your text statistics files and try this clip. If it
          > does what you want, this clip could be run in a loop, e.g., for each of
          > the target files. Instructions would need to be added for opening,
          > saving, and closing the files in sequence.
          >
          > ^!Replace "^PDifferent words/items counted" >> "words_items" TAWS
          > ^!Replace "\t[\d\.]++$" >> "" RAWS
          > ^!Replace ":\x20(\d++)$" >> "\t$1" RAWS
          > ^!Replace "(?i)[a-z]++\K\x20" >> "_" RAWS
          > ^!Replace "^Word\t\KFrequency\t%\R">>"^$GetDocName$" RWS
          > ;end of clip
          >
          > BTW, if you right click in the clip panel there is an option to "Add
          > from clipboard". That's the most convenient way to transfer clips from
          > email to NoteTab. The H= line (if any) provides the clip name. Otherwise
          > you have to enter one.
          >
          > If you actually paste or type an H= line into the clip-edit window, when
          > the clip is run it will paste the line into the active document (because
          > the line doesn't start with a command). You would have to be editing the
          > .clb clip library as a document to be able to actually paste or type the
          > H= line and have it behave as a clip header instead of one of the text
          > or instruction lines in the clip. You personalized .clb files are stored
          > in the Application Data area, e.g., for XP and NoteTab Pro, they are at
          > C:\Documents and Settings\User\Application Data\NoteTab Pro\Libraries
          >
          > Regards,
          > Sheri
          >
        • diodeom
          ... Your sample of desired output shows only frequencies kept while percentages are absent. Do you want both values now? It may be helpful to set up a separate
          Message 4 of 26 , Feb 22, 2010
          • 0 Attachment
            "arthurkendall80" <art@...> wrote:
            >
            > It lost the column of frequencies.
            >
            > Also, where in the "Stats Please" clip that Diodeom posted would I insert these commands?
            >

            Your sample of desired output shows only frequencies kept while percentages are absent. Do you want both values now?

            It may be helpful to set up a separate folder for the stats files (it may be easier to process them later that way). If you decide to do that, uncomment (remove the semicolon in front of) the second line, remove the entire ^!Save AS line and uncomment one below.

            ^!Set %Path%=^?{(T=D)Folder=C:\Users\Art\Desktop\fed\separate\}
            ;^!Set %OutPath%=^?{(T=D)Folder=C:\Users\Art\Desktop\fed\stats\}
            ^!SetScreenUpdate 0
            ^!Set %File%=^$GetFileFirst(^%Path%;*.txt)$
            ^!Goto Skip_2
            :Loop
            ^!Set %File%=^$GetFileNext$
            ^!IfEmpty ^%File% Done
            ^!Open "^%File%"
            ^!Keyboard Alt+T S &100 M &500 Ctrl+A Ctrl+C Alt+C
            ^!Close
            ^!Menu File/New
            ^!Paste
            ^!Replace "^PDifferent words/items counted" >> "words_items" TAWS
            ^!Replace "\t[\d\.]++$" >> "" RAWS
            ^!Replace ":\x20(\d++)$" >> "\t$1" RAWS
            ^!Replace "(?i)[a-z]++\K\x20" >> "_" RAWS
            ^!Replace "^Word\t\KFrequency\t%\R">>"^$GetFileName(^%File%)$" RWS
            ^!Save AS ^%Path%^$GetName(^%File%)$_stats.txt
            ;^!Save AS ^%OutPath%^$GetName(^%File%)$_stats.txt
            ^!Close
            ^!Goto Loop
            :Done
            ^!CloseFileFind
            ^!ClearVariable %Path%
            ^!ClearVariable %OutPath%
            ^!ClearVariable %File%
            ^!SetScreenUpdate 1
            ^!Prompt Done!

            By the way, sorry, Art — in the "splitting" clip I didn't foresee that the "mother" file would have its closing paragraph/s "End of the Project Gutenberg EBook..." removed. The paper-finding pattern was prepared to terminate its capture either before the next FED or before the quoted above line.
          • arthurkendall80
            Thank you for your help. I ll give the new clip a try in the morning. The suggestion about a separate folder is a good one. I thought I would have to move
            Message 5 of 26 , Feb 22, 2010
            • 0 Attachment
              Thank you for your help.
              I'll give the new clip a try in the morning.

              The suggestion about a separate folder is a good one. I thought I would have to move them "by hand" to a different folder.

              I guess there are different versions on Guttenberg. I first downloaded the Federalist Papers when I retired in 2001 and am just now getting around to learning about computer aided analysis of text.
              Mine does not mention "EBook".

              I do not need the percentages just the frequencies.

              Art

              --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
              >
              > "arthurkendall80" <art@> wrote:
              > >
              > > It lost the column of frequencies.
              > >
              > > Also, where in the "Stats Please" clip that Diodeom posted would I insert these commands?
              > >
              >
              > Your sample of desired output shows only frequencies kept while percentages are absent. Do you want both values now?
              >
              > It may be helpful to set up a separate folder for the stats files (it may be easier to process them later that way). If you decide to do that, uncomment (remove the semicolon in front of) the second line, remove the entire ^!Save AS line and uncomment one below.
              >
              > ^!Set %Path%=^?{(T=D)Folder=C:\Users\Art\Desktop\fed\separate\}
              > ;^!Set %OutPath%=^?{(T=D)Folder=C:\Users\Art\Desktop\fed\stats\}
              > ^!SetScreenUpdate 0
              > ^!Set %File%=^$GetFileFirst(^%Path%;*.txt)$
              > ^!Goto Skip_2
              > :Loop
              > ^!Set %File%=^$GetFileNext$
              > ^!IfEmpty ^%File% Done
              > ^!Open "^%File%"
              > ^!Keyboard Alt+T S &100 M &500 Ctrl+A Ctrl+C Alt+C
              > ^!Close
              > ^!Menu File/New
              > ^!Paste
              > ^!Replace "^PDifferent words/items counted" >> "words_items" TAWS
              > ^!Replace "\t[\d\.]++$" >> "" RAWS
              > ^!Replace ":\x20(\d++)$" >> "\t$1" RAWS
              > ^!Replace "(?i)[a-z]++\K\x20" >> "_" RAWS
              > ^!Replace "^Word\t\KFrequency\t%\R">>"^$GetFileName(^%File%)$" RWS
              > ^!Save AS ^%Path%^$GetName(^%File%)$_stats.txt
              > ;^!Save AS ^%OutPath%^$GetName(^%File%)$_stats.txt
              > ^!Close
              > ^!Goto Loop
              > :Done
              > ^!CloseFileFind
              > ^!ClearVariable %Path%
              > ^!ClearVariable %OutPath%
              > ^!ClearVariable %File%
              > ^!SetScreenUpdate 1
              > ^!Prompt Done!
              >
              > By the way, sorry, Art — in the "splitting" clip I didn't foresee that the "mother" file would have its closing paragraph/s "End of the Project Gutenberg EBook..." removed. The paper-finding pattern was prepared to terminate its capture either before the next FED or before the quoted above line.
              >
            • Sheri
              ... Sorry about that, I was working with a temporary file and ^$GetDocName$ doesn t return a complete filespec in that case. A complete filespec has
              Message 6 of 26 , Feb 24, 2010
              • 0 Attachment
                On 2/22/2010 2:43 PM, arthurkendall80 wrote:
                > It put in the whole filespec instead of just the filename and had a hardpage in the middle of the pasted filespec.
                >
                Sorry about that, I was working with a temporary file and ^$GetDocName$
                doesn't return a complete filespec in that case. A complete filespec has
                backslashes, and blackslashes serve as escape characters in the
                replacement text seen by clipcode's regex replace command (^!Replace,
                with an "R" option specified).

                > It lost the column of frequencies.
                >
                I think you must have run it twice on the same document. It was removing
                the last column with numeric data in it.

                > Also, where in the "Stats Please" clip that Diodeom posted would I insert these commands?
                >
                Hopefully Dio fixed you up.
                > Thank you.
                >
                You're welcome.

                Regards,
                Sheri
              • arthurkendall80
                ... Thank you. I guess I used the wrong terminology. I meant that it put in filename.txt rather than just filename as the column header. All the files in
                Message 7 of 26 , Feb 24, 2010
                • 0 Attachment
                  --- In ntb-clips@yahoogroups.com, Sheri <silvermoonwoman@...> wrote:
                  >
                  > On 2/22/2010 2:43 PM, arthurkendall80 wrote:
                  > > It put in the whole filespec instead of just the filename and had a hardpage in the middle of the pasted filespec.
                  > >
                  > Sorry about that, I was working with a temporary file and ^$GetDocName$
                  > doesn't return a complete filespec in that case. A complete filespec has
                  > backslashes, and blackslashes serve as escape characters in the
                  > replacement text seen by clipcode's regex replace command (^!Replace,
                  > with an "R" option specified).
                  >
                  > > It lost the column of frequencies.
                  > >
                  > I think you must have run it twice on the same document. It was removing
                  > the last column with numeric data in it.
                  >
                  > > Also, where in the "Stats Please" clip that Diodeom posted would I insert these commands?
                  > >
                  > Hopefully Dio fixed you up.
                  > > Thank you.
                  > >
                  > You're welcome.
                  >
                  > Regards,
                  > Sheri
                  >
                  Thank you.

                  I guess I used the wrong terminology. I meant that it put in "filename.txt" rather than just "filename" as the column header. All the files in the folder the clip works on are .txt files.

                  Yes it is possible that I ran it twice.

                  Diodeom fixed me up wrt where to place the commands in the earlier clip.


                  Art
                • diodeom
                  ... On my comp the following clip averages about 20 seconds per paper as it produces the TSV table (as you described it) directly from the concatenated set of
                  Message 8 of 26 , Feb 27, 2010
                  • 0 Attachment
                    "arthurkendall80" <art@...> wrote:
                    >
                    > I would like to create a tab separated file with no blanks in the labels. This will end up as a table with 1)a column for the word or total label, 2) a column for the frequency over all the files in the folder, 3 to K) frequencies for each of K separate files in the folder.
                    >
                    > The application I'll be passing the file to cannot have blanks in the column header name or row labels in the first column. The subsequent rows would be based on the list found for the concatenation of all the files with zeros in the cells when a word does not occur in a file. There would be a header row and a row for every word/item found in the whole set of files.
                    >
                    > The table would be something like this where "<tab>" indicates a tab character.
                    >
                    > Word<tab>All_files<tab>file_name_1<tab>file_name_2<tab>file_name_3 ...<tab>file_name_last
                    > (<tab>80<tab>5<tab>6<tab>7...<tab>11
                    > )<tab>80<tab>5<tab>6<tab>7...<tab>11
                    > a<tab>500<tab>15<tab>16<tab>17...<tab>21
                    > ...
                    > zebra<tab>8<tab>5<tab>0<tab>0...<tab>0
                    > words_items<tab>620<tab>123<tab>456<tab>122...<tab>77
                    > Total_Words<tab>1608<tab>123<tab>456<tab>122...<tab>77
                    > Total_Punctuation<tab>163<tab>123<tab>456<tab>...122<tab>77
                    > Total_Other Text<tab>3<tab>123<tab>456<tab>122...<tab>77
                    > Total_Characters<tab>9346<tab>123<tab>456<tab>...122<tab>77
                    > Total_Paragraphs<tab>108<tab>123<tab>456<tab>122...<tab>77
                    >

                    On my comp the following clip averages about 20 seconds per paper as it produces the TSV table (as you described it) directly from the concatenated set of eighty-some Federalist Papers. It's a snail's pace, I'd imagine, for any serious production work on a large number of similar compilations (and there are better tools to accomplish this objective fast), but I provide it here nevertheless as a potentially useful record of an enjoyable notetabbing exercise. To my understanding it does exactly what you want(ed), just not rapidly at all — in its current draft. As is, building this table of text statistics (of eighty-some columns by nine thousand-something rows, from a 1.13 MB file) takes on my system nearly 30 minutes, mostly due to the limitations of my spare time and (more likely) my gray matter.

                    Again, the millisecond delay values at ^!Keyboard actions may have to be adjusted for various sizes of projects. And to make things more universal, a unique identifier (in this case the heading "FED...") could be placed in an user-entered variable instead of being hard-coded; similar provisions could be made for naming of column headers.

                    If you like to give this clip a try, have your "mother" file as the current document. For a short test, maybe it wouldn't hurt to truncate it to just a few papers.

                    ^!Set %Concat%=^$GetDocIndex$
                    ^!Select 0
                    ^!SetScreenUpdate 0
                    ^!Keyboard Alt+T S &500 M &4000 Ctrl+A Ctrl+C Alt+C
                    ^!Jump 1
                    ^!Toolbar Paste New
                    ^!SetWordWrap 0
                    ^!Replace "\t[^\t\r]++$" >> "" WARS
                    ^!Replace "Word^tFrequency^p" >> "Word^tAll" WS
                    ^!Replace "\R\RDi\D++" >> "\r\nWords_Items\t" WRS
                    ^!Set %TotPos%=^$Calc(^$GetRow$-1)$
                    ^!Replace ": " >> "^t" AS
                    ^!Replace " " >> "_" WAS
                    ^!Replace "^\t\d++\R" >> "" WARS
                    ^!Set %Stats%=^$GetDocIndex$
                    :PaperLoop
                    ^!SetDocIndex ^%Concat%
                    ^!Find "^FED.+?\K\d+" RS
                    ^!IfError Done
                    ^!Set %Name%=^$GetSelection$
                    ^!Jump +1
                    ^!Find "(?s).+?(?=(\R++FED|\R*\Z))" RS
                    ^!Keyboard Alt+T S &100 M &500 Ctrl+A Ctrl+C Alt+C
                    ^!Jump +1
                    ^!Toolbar Paste New
                    ^!SelectTo ^$Calc(^$GetRow$-6)$:1
                    ^!SetArray %Tot%=^$GetDocMatchAll(\d++)$
                    ^!Replace "(?s)\R\RDiff.++\Z" >> "\r\n" WRS
                    ^!Replace "\A.++\R\R" >> "" WRS
                    ^!Replace "^\t.++\R" >> "" WARS
                    ^!SetListDelimiter ^p
                    ^!SetArray %Word%=^$GetDocMatchAll("^[^\t]++")$
                    ^!SetArray %Freq%=^$GetDocMatchAll("^[^\t]++\t\K[^\t]++")$
                    ^!DestroyDoc ""
                    ^!SetDocIndex ^%Stats%
                    ^!StatusShow Appending stats for Fed. No. ^%Name%...
                    ^!Replace "^p" >> "^t*^p" WAS
                    ^!Jump 1
                    ^!Replace "*" >> "F^%Name%" S
                    ^!Jump ^%TotPos%
                    ^!Replace "*^p" >> "^%Tot1%^p" S
                    ^!Replace "*^p" >> "^%Tot2%^p" S
                    ^!Replace "*^p" >> "^%Tot3%^p" S
                    ^!Replace "*^p" >> "^%Tot4%^p" S
                    ^!Replace "*^p" >> "^%Tot5%^p" S
                    ^!Replace "*^p" >> "^%Tot6%^p" S
                    ^!Set %No%=0
                    ^!Jump 1
                    :WordLoop
                    ^!Inc %No%
                    ^!If ^%No%>^%Word0% Aster
                    ;long line start
                    ^!Replace "^\Q^%Word^%No%%\E\t[^\x2A]++\K\x2A$" >> "^%Freq^%No%%" RIS
                    ;long line end
                    ^!Goto WordLoop
                    :Aster
                    ^!Replace "*" >> "0" WAS
                    ^!StatusClose
                    ^!Goto PaperLoop
                    :Done
                    ^!SetDocIndex ^%Stats%
                    ^!Jump 1
                    ^!ClearVariables
                    ^!SetScreenUpdate 1
                  • Sheri
                    ... LOL! Bravo, looks like it gets the job done. Possibly the keystrokes could be reduced a little bit by eliminating the Ctrl+C, because in this case Ctrl+A
                    Message 9 of 26 , Feb 28, 2010
                    • 0 Attachment
                      --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
                      >
                      > I provide it here nevertheless as a potentially useful record of
                      > an enjoyable notetabbing exercise. To my understanding it does
                      > exactly what you want(ed), just not rapidly at all — in its
                      > current draft. As is, building this table of text statistics (of
                      > eighty-some columns by nine thousand-something rows, from a 1.13
                      > MB file) takes on my system nearly 30 minutes, mostly due to the
                      > limitations of my spare time and (more likely) my gray matter.

                      LOL! Bravo, looks like it gets the job done. Possibly the keystrokes could be reduced a little bit by eliminating the Ctrl+C, because in this case Ctrl+A alone seems to do the job. You can see that keyboard shortcut if you look at the context menu for the Text Statistics window. One thought impacting the possible accuracy of your result is, it might be best to construct the "concatenated" word list based on a selection from the first "FED..." to the end of the collection. Otherwise the text of any superfluous introductory material (e.g., as exists in the original download) is also considered.

                      I have a few ill-formed ideas rattling around, but don't know how much (if any) alternative approaches might be. For example, initialize the entire consolidated table up-front with the "All" column plus numerous tabs and zeros? Construct an entire column of info and do a columnar paste (Modify->Block->Paste) for each subsidiary set of frequencies into the consolidated table?

                      Window dressing perhaps, but I like to avoid ^!ClearVariables and clear only specified variables. Also, if a clip uses the clipboard, I like to restore its original contents at the end. If a *lot* of info goes on the clipboard, might speed things up to free the memory by clearing the clipboard once the info is no longer needed.

                      Also for whatever help it might be, here's my low-tech clip timer. This timer doesn't steal any clock cycles :D

                      ^!Set %starttime%="Start Time: ^$GetDate(tt)$"
                      ;..rest of clip goes here..
                      ^!Set %endtime%="End Time: ^$GetDate(tt)$"
                      ;start long line
                      ^!Prompt ^$GetClipName$ Complete^%NL%^%NL%Start: ^%starttime%^%NL%End: ^%endtime%
                      ;end long line

                      Regards,
                      Sheri
                    • diodeom
                      ... I was kind of hoping it worked even before you tested it. :D ... Great to know; thank you, Sheri. (I also found where this issue was brought up three years
                      Message 10 of 26 , Feb 28, 2010
                      • 0 Attachment
                        "Sheri" <silvermoonwoman@...> wrote:
                        >
                        > LOL! Bravo, looks like it gets the job done.
                        >

                        I was kind of hoping it worked even before you tested it. :D

                        > Possibly the keystrokes could be reduced a little bit by
                        > eliminating the Ctrl+C, because in this case Ctrl+A alone seems
                        > to do the job. You can see that keyboard shortcut if you look at
                        > the context menu for the Text Statistics window.
                        >

                        Great to know; thank you, Sheri. (I also found where this issue was brought up three years ago: <http://tech.groups.yahoo.com/group/ntb-clips/message/16004>)

                        > One thought impacting the possible accuracy of your result is,
                        > it might be best to construct the "concatenated" word list
                        > based on a selection from the first "FED..." to the end of the
                        > collection. Otherwise the text of any superfluous introductory
                        > material (e.g., as exists in the original download) is also
                        > considered.

                        The thing is, I've already made a mistake of anticipating Gutenberg comments in Art's file before; now I believe it to be stripped of fluff. Still, I'd second the warning that this clip runs text stats on the entirety of a current document.
                      • diodeom
                        ... To construct a column for each subsidiary set of frequencies that would place values in the same rows as their corresponding words occupy in the overall
                        Message 11 of 26 , Feb 28, 2010
                        • 0 Attachment
                          "Sheri" <silvermoonwoman@...> wrote:
                          >
                          > I have a few ill-formed ideas rattling around, but don't know
                          > how much (if any) alternative approaches might be. For example,
                          > initialize the entire consolidated table up-front with the "All"
                          > column plus numerous tabs and zeros? Construct an entire column
                          > of info and do a columnar paste (Modify->Block->Paste) for each
                          > subsidiary set of frequencies into the consolidated table?
                          >

                          To construct a column for each subsidiary set of frequencies that would place values in the same rows as their corresponding words occupy in the overall table (the crux for me), I've given some consideration to the concept of combining both word columns (the "long," complete one, stripped of any frequencies + the "short" one of a given subdocument, including its frequencies), sorting them, then preserving just the frequency wherever it has a preceding "long" column word match (while removing this match), and swapping content of all other lines for zeroes. Shoot, it's probably better if I illustrate it rather than subject you to my poor attempts at describing what I mean:

                          "Long" column (stripped & sorted*):

                          56
                          abolish
                          abounds
                          absurd
                          abyss
                          accept
                          accident
                          accord
                          account
                          achieved
                          acknowledge

                          */Apparently plain sorting doesn't follow the same rules as text stats in regard to the order of punctuation characters. For the purposes of this experiment I re-sorted the stats of the "mother" file (showing here just a simplified smidgen of it) to temporarily avoid addressing this incompatibility.

                          "Short" column:

                          56<tab>1
                          absurd<tab>3
                          accident<tab>1
                          acknowledge<tab>22

                          Both columns combined and sorted:

                          56
                          56<tab>1
                          abolish
                          abounds
                          absurd
                          absurd<tab>3
                          abyss
                          accept
                          accident
                          accident<tab>1
                          accord
                          account
                          achieved
                          acknowledge
                          acknowledge<tab>22

                          After:
                          ^!Replace "^([^\t\r]++)\R\1\b" >> "" WARS
                          (Tabs preserved as unique markers for the next step)

                          <tab>1
                          abolish
                          abounds
                          <tab>3
                          abyss
                          accept
                          <tab>1
                          accord
                          account
                          achieved
                          <tab>22

                          After:
                          ^!Replace "^(?!\t).++" >> "0" WARS
                          ^!Replace "^t" >> "" WAS

                          1
                          0
                          0
                          3
                          0
                          0
                          1
                          0
                          0
                          0
                          22

                          Which now matches its values' row placements with proper words in the main table, and — I imagine — could be suitable for columnar pasting. Seems pretty rapid, though I sense that there ought to be even easier ways to arrive at this objective. (I never sorted anything in my life!) Any suggestions? :)
                        • diodeom
                          ... Yes, ma am! On the stone tablets of Sheri s pet peeves I also recognize top posting and regex backtracking. :) So far for me the occasions where there is
                          Message 12 of 26 , Feb 28, 2010
                          • 0 Attachment
                            "Sheri" <silvermoonwoman@...> wrote:
                            >
                            > Window dressing perhaps, but I like to avoid ^!ClearVariables
                            > and clear only specified variables. Also, if a clip uses the
                            > clipboard, I like to restore its original contents at the end.
                            > If a *lot* of info goes on the clipboard, might speed things up
                            > to free the memory by clearing the clipboard once the info is no
                            > longer needed.
                            >

                            Yes, ma'am! On the stone tablets of Sheri's pet peeves I also recognize top posting and regex backtracking. :)

                            So far for me the occasions where there is actually a need to preserve some variable for or from another clip (or for any other purpose) are exceptions significant enough to assure (only when needed) this perhaps otherwise overly cautious treatment. (While on this topic: your handy "List Variable Names" clip could maybe include a provision to account for multiple variables declared in a single line, e.g. ^!Set %Var1%=one; %Var2%=two; %Var3%=three.)

                            Regarding the clipboard, if there is something there I'd care to preserve, I'd prefer to do that explicitly, rather than to rely on every clip to keep it intact with ^!ClipBoardSave and ^!ClipBoardRestore, "just in case." If I made no conscious effort to keep it, I shouldn't (and won't) complain if I loose it.
                            And in this case, relieving the clipboard of its burden immediately after it's pasted doesn't appear to make any measurable difference to me, I'm sad to report.
                          • diodeom
                            ... There certainly ain t nothin hi-tech about what I use (and it s really only for the hair-splitting stuff — in often-needless centiseconds), and —
                            Message 13 of 26 , Feb 28, 2010
                            • 0 Attachment
                              "Sheri" <silvermoonwoman@...> wrote:
                              >
                              > Also for whatever help it might be, here's my low-tech clip
                              > timer. This timer doesn't steal any clock cycles :D

                              > ^!Set %starttime%="Start Time: ^$GetDate(tt)$"
                              > ;..rest of clip goes here..
                              > ^!Set %endtime%="End Time: ^$GetDate(tt)$"
                              > ;start long line
                              > ^!Prompt ^$GetClipName$ Complete^%NL%^%NL%Start: ^%starttime%^%NL%End: ^%endtime%
                              > ;end long line
                              >

                              There certainly ain't nothin' hi-tech about what I use (and it's really only for the "hair-splitting stuff" — in often-needless centiseconds), and — despite of what I may have joked about — it's not cumbersome at all. Precisely for simplicity I call it with the very same command twice from within a clip to be timed, ^!Clip TimeIt — both for the start of the time capture and for its end (and the subsequent results' calculations). Here it is, commented:

                              H="TimeIt"
                              ;Grab current time first, ask questions later
                              ^!Set %2Time%=^$GetDosOutput(echo %time:~3,2%;%time:~6,2%.%time:~9,2%)$
                              ;Check if start time was recorded earlier
                              ^!IfEmpty ^%1Time% GetStart
                              ;--------------------------------------------------------E-V-A-L-U-A-T-E
                              ;Target code was timed so there is no hurry now
                              ;Remove format constraints, set arrays
                              ^!SetArray %1Time%=^$StrCopy(^%1Time%;1;8)$
                              ^!SetArray %2Time%=^$StrCopy(^%2Time%;1;8)$
                              ;If more start than finish minutes, add 60 to the latter
                              ^!If ^%1Time1%>^%2Time1% ^!Inc %2Time1% 60
                              ;Convert to centiseconds, sum up, ditch arrays
                              ^!Set %1Time%=^$Calc((^%1Time1%*6000)+(^%1Time2%*100))$
                              ^!Set %2Time%=^$Calc((^%2Time1%*6000)+(^%2Time2%*100))$
                              ;Store result in 'recycled' %1Time%
                              ^!Set %1Time%=^$Calc(^%2Time%-^%1Time%)$
                              ;-START------------------------------------------------L-O-N-G---L-I-N-E
                              ^!Info ^$Calc((^%1Time%/100)DIV60)$:^$Calc((^%1Time%/100)MOD60)$.^$StrCopyRight(^%1Time%;2)$^%nl%m:s.cc^%nl%
                              ;-END--------------------------------------------------L-O-N-G---L-I-N-E
                              :Done
                              ^!ClearVariable %1Time%
                              ^!ClearVariable %2Time%
                              ^!Goto End
                              :GetStart
                              ;Get start time just prior to executing target code in the parent clip
                              ^!Set %1Time%=^$GetDosOutput(echo %time:~3,2%;%time:~6,2%.%time:~9,2%)$

                              If there are any bugs in the clip being timed, if anything goes awry with its execution, times get easily mingled and it may be necessary to simply:

                              H="Reset TimeIt"
                              ^!ClearVariable %1Time%
                              ^!ClearVariable %2Time%

                              (As this clip was meant for my own use, I chose simplicity of implementing it over the mentioned vulnerability.)
                            • Sheri
                              ... Thanks, I didn t realize you could use a space in front of additional variable names. Here s a revision (there are a couple of long lines): ;2007-01-14
                              Message 14 of 26 , Mar 1, 2010
                              • 0 Attachment
                                --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:

                                > While on this topic: your handy "List Variable Names" clip could
                                > maybe include a provision to account for multiple variables
                                > declared in a single line, e.g. ^!Set %Var1%=one; %Var2%=two;
                                > %Var3%=three.)

                                Thanks, I didn't realize you could use a space in front of additional variable names.

                                Here's a revision (there are a couple of long lines):

                                ;2007-01-14 created by Sheri Pierce
                                ;2010-03-01 Revision by Sheri Pierce
                                ;revision tested with NoteTab 6.2 (PCRE 8.01)
                                ;use to help create clearvariable statements for the clip being edited
                                ^!If ^$GetSelSize$>0 Next Else Skip
                                ^!Continue Some text is highlighted. Only variables set within the selection will be considered.
                                ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable $0\r\n")$"
                                ^!IfEmpty ^%varnames% Next Else Skip_2
                                ^!Info No variables found
                                ^!Goto Clear
                                ^!Set %varnames%="^$StrSort(^%varnames%;No;Yes;Yes)$"
                                ^!Info ^%varnames%
                                ^!Set %varnames%=""
                                :Clear
                                ^!ClearVariable %varnames%
                                ;end of clip

                                Regards,
                                Sheri
                              • Sheri
                                ... (etc.) [snip] I think I had a better idea than trying to create a paste-able column of frequencies. Instead of above replacement, keep the word column, but
                                Message 15 of 26 , Mar 1, 2010
                                • 0 Attachment
                                  --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
                                  >
                                  > Both columns combined and sorted:
                                  >
                                  > 56
                                  > 56<tab>1
                                  > abolish
                                  > abounds
                                  > absurd
                                  > absurd<tab>3
                                  > abyss
                                  > accept
                                  > accident
                                  > accident<tab>1
                                  > accord
                                  > account
                                  > achieved
                                  > acknowledge
                                  > acknowledge<tab>22
                                  >
                                  > After:
                                  > ^!Replace "^([^\t\r]++)\R\1\b" >> "" WARS
                                  > (Tabs preserved as unique markers for the next step)
                                  >
                                  > <tab>1
                                  > abolish
                                  > abounds
                                  > <tab>3
                                  (etc.)

                                  [snip]

                                  I think I had a better idea than trying to create a paste-able column of frequencies. Instead of above replacement, keep the word column, but append an index number (subsidiary document occurrence) to the words.

                                  ^!Set %suffix%="01"
                                  ^!Replace "^([^\t\r]++)\R\1\b" >> "$1" RAWS
                                  ^!Replace "^([^\t\r]++)\K(?=\R)" >> "\t0" RAWS
                                  ^!Replace "^[^\t\r]++(?=\t)" >> "$0\^%suffix%" RAWS

                                  For above items, you should have

                                  5601<tab>1
                                  abolish01<tab>0
                                  abounds01<tab>0
                                  absurd01<tab>3

                                  Do it for each subsidiary document. As each is done, tack the content of the temporary document onto the end of a consolidated variable. At the conclusion of the loop, A separate process would do it for whole document, using 00 as the suffix, before the subdocument loop. Sort the many, many rows. Paste the sorted result into a temporary document. Do something to preserve the words that had 00 suffix. Then replace all the linebreak-plus-words-with-their-suffixes with the empty string. Fix up the 00 items one last time. Should end up with one column with words (formerly the 00 items) the "All" frequencies, and one column for each subdocument's frequencies.

                                  Let me know if you were able to follow that :D

                                  BTW, it might be faster to insert the clipboard contents (from text statistics) under clip control into one quasi-permanent temporary document than creating and destroying multitudes of new documents.

                                  For sorting, we have ^$StrSort with various options.

                                  It would be nice if we had a clip function to capture text statistics so the ^!Keyboard tricks and clipboard were not necessary.

                                  Regards,
                                  Sheri
                                • diodeom
                                  ... Pretty cool. Sorting of this final list of about 780,000 lines took only half a minute. Then killing all R(?! x2A)[^ t]++ (I had an asterisk in front of
                                  Message 16 of 26 , Mar 1, 2010
                                  • 0 Attachment
                                    "Sheri" <silvermoonwoman@...> wrote:
                                    >
                                    > Do it for each subsidiary document. As each is done, tack the
                                    > content of the temporary document onto the end of a consolidated
                                    > variable. At the conclusion of the loop, A separate process
                                    > would do it for whole document, using 00 as the suffix, before
                                    > the subdocument loop. Sort the many, many rows. Paste the sorted
                                    > result into a temporary document. Do something to preserve the
                                    > words that had 00 suffix. Then replace all the linebreak-plus-
                                    > words-with-their-suffixes with the empty string. Fix up the 00
                                    > items one last time. Should end up with one column with words
                                    > (formerly the 00 items) the "All" frequencies, and one column
                                    > for each subdocument's frequencies.
                                    >

                                    Pretty cool. Sorting of this final list of about 780,000 lines took only half a minute. Then killing all "\R(?!\x2A)[^\t]++" (I had an asterisk in front of each word00<tab>value pair) fixed the table up in no time.
                                  • diodeom
                                    ... I suppose there is no harm in this minute redundancy where any array elements that were (re)set individually somewhere in the target code are offered in
                                    Message 17 of 26 , Mar 1, 2010
                                    • 0 Attachment
                                      "Sheri" <silvermoonwoman@...> wrote:
                                      >
                                      > ;2007-01-14 created by Sheri Pierce
                                      > ;2010-03-01 Revision by Sheri Pierce
                                      > ;revision tested with NoteTab 6.2 (PCRE 8.01)
                                      > ;use to help create clearvariable statements for the clip being edited
                                      > ^!If ^$GetSelSize$>0 Next Else Skip
                                      > ^!Continue Some text is highlighted. Only variables set within the selection will be considered.
                                      > ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable $0\r\n")$"
                                      > ^!IfEmpty ^%varnames% Next Else Skip_2
                                      > ^!Info No variables found
                                      > ^!Goto Clear
                                      > ^!Set %varnames%="^$StrSort(^%varnames%;No;Yes;Yes)$"
                                      > ^!Info ^%varnames%
                                      > ^!Set %varnames%=""
                                      > :Clear
                                      > ^!ClearVariable %varnames%
                                      > ;end of clip
                                      >

                                      I suppose there is no harm in this minute redundancy where any array elements that were (re)set individually somewhere in the target code are offered in the results as well.

                                      [\d\pL_] could be \w, I believe. And I'd guess ^!Set %varnames%="" may be a leftover from before :Clear existed?
                                    • Don - HtmlFixIt.com
                                      ... Could one of you break this line down item by item (on the right side of the = sign anyway) for those of us who are slower ... ?i = case insensitive? ^
                                      Message 18 of 26 , Mar 1, 2010
                                      • 0 Attachment
                                        On 3/1/2010 12:18 PM, diodeom wrote:
                                        > ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable $0\r\n")$"

                                        Could one of you break this line down item by item (on the right side of
                                        the = sign anyway) for those of us who are slower ...

                                        ?i = case insensitive?
                                        ^ means at start of a line?
                                        \^ means a carrot actually exists on the text?
                                        \! means an ! actually exists in the text?
                                        Set either Array or Code? is actually in the text?
                                        So this is to update a clip?

                                        ? means find first ... non-greedy?
                                        \x20 is a space?
                                        I'm pretty well lost after that ...

                                        I even bottom posted cause I want to know.
                                      • Sheri
                                        ... w doesn t match high ascii characters, which tho I don t use them in variable names myself, are not prohibited. And I d guess ^!Set %varnames%= may be a
                                        Message 19 of 26 , Mar 1, 2010
                                        • 0 Attachment
                                          --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
                                          >
                                          >
                                          > I suppose there is no harm in this minute redundancy where any
                                          > array elements that were (re)set individually somewhere in the
                                          > target code are offered in the results as well.
                                          >
                                          > [\d\pL_] could be \w, I believe.

                                          \w doesn't match high ascii characters, which tho I don't use them in variable names myself, are not prohibited.

                                          And I'd guess ^!Set %varnames%="" may be a leftover from before :Clear existed?

                                          There is this in the docs:

                                          "If you assign an empty value to an array variable, or use the ^!Set command to assign a new value to it, the array is automatically removed from memory."

                                          ... so out of an abundance of caution, I set the array variable equal to empty string before using ClearVariable on it.

                                          Regards,
                                          Sheri
                                        • diodeom
                                          ... Locate (case insensitive) either ^!Set or ^!SetArray or ^!SetCode followed by a space OR locate a semicolon followed by a space. The pattern is looking
                                          Message 20 of 26 , Mar 1, 2010
                                          • 0 Attachment
                                            "Don - HtmlFixIt.com" <don@...> wrote:
                                            >
                                            > > ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable $0\r\n")$"
                                            >
                                            > Could one of you break this line down item by item (on the right side of
                                            > the = sign anyway)

                                            Locate (case insensitive) either ^!Set or ^!SetArray or ^!SetCode followed by a space OR locate a semicolon followed by a space. The pattern is looking here for one of the four possible scenarios of declaring variables, the last one being when more than one is set on the same line. Next, \K disregards what was located so far and the subsequent statement ungreedily demands a string of word characters (digits, letters or underscores) between two percentage symbols (that is: a variable, the only captured here string) which precedes an equal sign.

                                            These captured strings are then inserted ($0 references them) by GetDocListAll in the stored, ready to display ^!ClearVariable statements, each on its own line.
                                          • Sheri
                                            ... Hi Don, I ve missed you! Maybe it would help to know that that clip is meant to be run from the clipbar while clipedit is the active document. It shows the
                                            Message 21 of 26 , Mar 1, 2010
                                            • 0 Attachment
                                              --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
                                              >
                                              > On 3/1/2010 12:18 PM, diodeom wrote:
                                              > > ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable $0\r\n")$"
                                              >
                                              > Could one of you break this line down item by item (on the right
                                              > side of the = sign anyway) for those of us who are slower ...
                                              >
                                              > ?i = case insensitive?
                                              > ^ means at start of a line?
                                              > \^ means a carrot actually exists on the text?
                                              > \! means an ! actually exists in the text?
                                              > Set either Array or Code? is actually in the text?
                                              > So this is to update a clip?
                                              >
                                              > ? means find first ... non-greedy?
                                              > \x20 is a space?
                                              > I'm pretty well lost after that ...
                                              >
                                              > I even bottom posted cause I want to know.
                                              >

                                              Hi Don, I've missed you!

                                              Maybe it would help to know that that clip is meant to be run from the clipbar while clipedit is the active document. It shows the names of variables found in clip currently being edited. GetDocListAll gives the opportunity to format the matches, so they are formatted with the ClearVariable command in front of each one. The purpose is just to help make the list of ClearVariable commands needed to clear only the variables actually Set by the clip being written.

                                              It only finds %xxx% if:
                                              - it follows a line start followed by ^!Set or ^!SetCode or ^!SetArray and one space, OR if it follows a semicolon (anywhere on a line) plus an optional space
                                              - it is followed by an equal sign

                                              "(?=" signifies a lookahead assertion. There is an equal sign inside the look ahead assertion, (?=\=). It may not need that backslash, but it makes it easier to see (at least for me). So, an equal sign must follow (looking ahead) the %xxx% to be a match. The equal sign is not actually part of the matched text.

                                              The search is case insensitive so that ^!set works as well as ^!Set or any other mixed case that might be found.

                                              Regards,
                                              Sheri
                                            • Don - HtmlFixIt.com
                                              ... Guest appearance ;-) Prodigal son? ... Look ahead and look back assertions are currently over my head -- I need to solve that. It s funny in one way
                                              Message 22 of 26 , Mar 1, 2010
                                              • 0 Attachment
                                                > Hi Don, I've missed you!
                                                Guest appearance ;-) Prodigal son?

                                                > "(?=" signifies a lookahead assertion. There is an equal sign inside the look ahead assertion, (?=\=). It may not need that backslash, but it makes it easier to see (at least for me). So, an equal sign must follow (looking ahead) the %xxx% to be a match. The equal sign is not actually part of the matched text.

                                                Look ahead and look back assertions are currently over my head -- I need
                                                to solve that. It's funny in one way because regex won't search
                                                backwards, but it will look back ...

                                                So that bit that kind of looks like a rear end ... (?=\=) ... really is
                                                just to say "followed by an equals sign." It has three characters (?=
                                                to say look ahead ... find an equals sign (escaped for good measure) and
                                                closing parenthesis.

                                                If we wanted the search pattern followed by a letter R it would be (?=R)
                                                if I follow correctly ... can it also accept or patterns (?=r|R) and so
                                                forth?
                                              • diodeom
                                                ... My overly concise suggestion didn t imply it well: it was because I m aware of what setting a variable to empty does — and of your high standards —
                                                Message 23 of 26 , Mar 2, 2010
                                                • 0 Attachment
                                                  "Sheri" <silvermoonwoman@...> wrote:
                                                  >
                                                  > I wrote:
                                                  > >
                                                  > > And I'd guess ^!Set %varnames%="" may be a leftover from
                                                  > > before :Clear existed?
                                                  >
                                                  > There is this in the docs:
                                                  >
                                                  > "If you assign an empty value to an array variable, or use the
                                                  > ^!Set command to assign a new value to it, the array is
                                                  > automatically removed from memory."
                                                  >
                                                  > ... so out of an abundance of caution, I set the array variable
                                                  > equal to empty string before using ClearVariable on it.
                                                  >

                                                  My overly concise suggestion didn't imply it well: it was because I'm aware of what setting a variable to empty does — and of your high standards — that I assumed you didn't mean to do the same thing twice. I speculated that maybe as you were writing this clip, initially ^!Set %varnames%="" was the last line, and after adding a provision for cases where no variables are captured, which called for a label :Clear, its removal was overlooked (or an alternative of just placing the label right above it). Sorry, Sheri; I'm sure it's not the last time I guessed incorrectly. (I'm also quite certain it won't stop me from trying... :)
                                                • Sheri
                                                  ... LOL. I m deleting the line, this clip doesn t even create an array for varnames. Probably an earlier version did. Will try to pay better attention next
                                                  Message 24 of 26 , Mar 2, 2010
                                                  • 0 Attachment
                                                    On 3/2/2010 8:56 AM, diodeom wrote:
                                                    > "Sheri"<silvermoonwoman@...> wrote:
                                                    >
                                                    >> I wrote:
                                                    >>
                                                    >>> And I'd guess ^!Set %varnames%="" may be a leftover from
                                                    >>> before :Clear existed?
                                                    >>>
                                                    >> There is this in the docs:
                                                    >>
                                                    >> "If you assign an empty value to an array variable, or use the
                                                    >> ^!Set command to assign a new value to it, the array is
                                                    >> automatically removed from memory."
                                                    >>
                                                    >> ... so out of an abundance of caution, I set the array variable
                                                    >> equal to empty string before using ClearVariable on it.
                                                    >>
                                                    >>
                                                    > My overly concise suggestion didn't imply it well: it was because I'm aware of what setting a variable to empty does — and of your high standards — that I assumed you didn't mean to do the same thing twice. I speculated that maybe as you were writing this clip, initially ^!Set %varnames%="" was the last line, and after adding a provision for cases where no variables are captured, which called for a label :Clear, its removal was overlooked (or an alternative of just placing the label right above it). Sorry, Sheri; I'm sure it's not the last time I guessed incorrectly. (I'm also quite certain it won't stop me from trying... :)
                                                    >

                                                    LOL. I'm deleting the line, this clip doesn't even create an array for
                                                    varnames. Probably an earlier version did. Will try to pay better
                                                    attention next time.

                                                    Regards,
                                                    Sheri
                                                  • Alec Burgess
                                                    Sheri (silvermoonwoman@comcast.net) wrote (in part) (on 2010-03-01 at ... I was on holiday (got back yesterday) and am just noodling thru the large number of
                                                    Message 25 of 26 , Mar 5, 2010
                                                    • 0 Attachment
                                                      Sheri (silvermoonwoman@...) wrote (in part) (on 2010-03-01 at
                                                      15:41):
                                                      > Maybe it would help to know that that clip is meant to be run from the
                                                      > clipbar while clipedit is the active document. It shows the names of
                                                      > variables found in clip currently being edited. GetDocListAll gives
                                                      > the opportunity to format the matches, so they are formatted with the
                                                      > ClearVariable command in front of each one. The purpose is just to
                                                      > help make the list of ClearVariable commands needed to clear only the
                                                      > variables actually Set by the clip being written.
                                                      >
                                                      > It only finds %xxx% if:
                                                      > - it follows a line start followed by ^!Set or ^!SetCode or ^!SetArray
                                                      > and one space, OR if it follows a semicolon (anywhere on a line) plus
                                                      > an optional space
                                                      > - it is followed by an equal sign
                                                      I was on holiday (got back yesterday) and am just noodling thru the
                                                      large number of posts during the past week.

                                                      I read this thread but didn't test any of the example code.

                                                      > > > ^!Set %varnames%="
                                                      >
                                                      > ^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable
                                                      > $0\r\n")$"
                                                      >


                                                      Sheri: wrt to your "find variables for clearing" clip. You appear from
                                                      the comments but haven't checked the regex itself, to be expecting a
                                                      variable xxx to be defined as:
                                                      ^!set %xxx%=asdf
                                                      Following works though I use the construct without %...% only
                                                      accidentally :-[ :
                                                      ^!set xxx=asdf
                                                      ^!info ^%xxx%

                                                      Will your clip capture this usage?

                                                      >
                                                      > "(?=" signifies a lookahead assertion. There is an equal sign inside
                                                      > the look ahead assertion, (?=\=). It may not need that backslash, but
                                                      > it makes it easier to see (at least for me). So, an equal sign must
                                                      > follow (looking ahead) the %xxx% to be a match. The equal sign is not
                                                      > actually part of the matched text.
                                                      >
                                                      > The search is case insensitive so that ^!set works as well as ^!Set or
                                                      > any other mixed case that might be found.


                                                      --
                                                      Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)




                                                      [Non-text portions of this message have been removed]
                                                    • Sheri
                                                      ... No, of course not. :) But (sigh) I suppose by failing to capture them, it is remotely possible that some accidental variables would fail to get released.
                                                      Message 26 of 26 , Mar 6, 2010
                                                      • 0 Attachment
                                                        --- In ntb-clips@yahoogroups.com, Alec Burgess <buralex@...> wrote:
                                                        >
                                                        > Sheri (silvermoonwoman@...) wrote (in part) (on 2010-03-01 at
                                                        > 15:41):
                                                        > >
                                                        > > ^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K%[\d\pL_]+?%(?=\=)";"^!ClearVariable
                                                        > > $0\r\n")$"
                                                        > >
                                                        >
                                                        > Sheri: wrt to your "find variables for clearing" clip. You appear
                                                        > from the comments but haven't checked the regex itself, to be
                                                        > expecting a
                                                        >
                                                        > variable xxx to be defined as:
                                                        > ^!set %xxx%=asdf
                                                        > Following works though I use the construct without %...% only
                                                        > accidentally :-[ :
                                                        > ^!set xxx=asdf
                                                        > ^!info ^%xxx%
                                                        >
                                                        > Will your clip capture this usage?
                                                        >

                                                        No, of course not. :)

                                                        But (sigh) I suppose by failing to capture them, it is remotely possible that some "accidental" variables would fail to get released.

                                                        So perhaps it would be better to modify the capture part as follows:

                                                        ^!Set %varnames%="^$GetDocListAll("(?i)(^\^\!Set(Array|Code)?\x20|;\x20?)\K(%?)([\d\pL_]+?)\3(?=\=)";"^!ClearVariable %$4%\r\n")$"

                                                        Regards,
                                                        Sheri
                                                      Your message has been successfully submitted and would be delivered to recipients shortly.