Loading ...
Sorry, an error occurred while loading the content.

Re: Creation of clip

Expand Messages
  • Sheri
    ... How many keywords? If not more than a few hundred could possibly use something like this (uses regular expression matching). ^!Setlistdelimiter ^P ;next is
    Message 1 of 30 , Jun 14, 2007
    View Source
    • 0 Attachment
      --- In ntb-clips@yahoogroups.com, "idisnick" <idisnick@...> wrote:
      >
      > Can you help me create this clip, I can't figure it out.
      > I have a long list, and a second list of keywords,
      > I would like to have NoteTab take the keywords list and search the
      > first list for them and parse out all the lines (entire lines) that
      > contain those keywods. If you want a fee to do this let me know. Thanks.
      >

      How many keywords? If not more than a few hundred could possibly use
      something like this (uses regular expression matching).

      ^!Setlistdelimiter ^P
      ;next is one long line
      ^!Set
      %linesout%=^$GetDocMatchAll("(?-i)^.*(comprehensive|switch|system).*^%dollar%";0)$
      ;end long line
      ^!Toolbar New Document
      ^!InsertText ^%linesout%

      Run on with the What's New Text file showing and lines containing any
      of those three words get pasted to a new buffer.

      Does not work in versions of NoteTab earlier than 5.1.

      The (?-i) part is causing the search to be case sensitive. If you want
      it to be case insensitive, use (?i) instead. If you leave it off, it
      seems to be defaulting to (?i) which is not what I expected.

      Regards,
      Sheri
    • buralex@gmail.com
      idisnick said on 06/13/2007 5:47:08 PM -0400 ... We had a long discussion about this at the beginning of the year (search the list for
      Message 2 of 30 , Jun 14, 2007
      View Source
      • 0 Attachment
        "idisnick" <idisnick@...> said on 06/13/2007 5:47:08 PM -0400
        > Can you help me create this clip, I can't figure it out.
        > I have a long list, and a second list of keywords,
        > I would like to have NoteTab take the keywords list and search the
        > first list for them and parse out all the lines (entire lines) that
        > contain those keywods. If you want a fee to do this let me know. Thanks.
        We had a long discussion about this at the beginning of the year (search
        the list for "concordance"). It has a working clip.

        An open-source program to do this is: TextStat.
        > "TextSTAT - Free concordance software for Windows and Linux
        > TextSTAT is a simple programme for the analysis of texts. It reads
        > ASCII/ANSI texts (in different encodings) and HTML files (directly
        > from the internet) and ...
        > www.niederlandistik.fu-berlin.de/textstat/software-en.html"
        > http://www.google.com/search?q=TextStat&sourceid=navclient-ff&ie=UTF-8&rls=GGGL,GGGL:2006-39,GGGL:en
        > <http://www.google.com/search?q=TextStat&sourceid=navclient-ff&ie=UTF-8&rls=GGGL,GGGL:2006-39,GGGL:en>

        Regards ... Alec -- buralex-gmail
        --
      • Flo
        I understand that idisnick is working with a keyword list. So I would propose to extend Sheri s concept as follows... ^!Set %Doc%=^$GetDocIndex$ ^!Set
        Message 3 of 30 , Jun 15, 2007
        View Source
        • 0 Attachment
          I understand that idisnick is working with a keyword list. So I would
          propose to extend Sheri's concept as follows...


          ^!Set %Doc%=^$GetDocIndex$
          ^!Set %Keywords%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
          File:}
          ^!Open ^%Keywords%
          ; Remove empty lines in keyword list
          ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
          ; Put keywords into alternation
          ^!Replace "\r\n" >> "|" AWRS
          ; Remove "empty alternation" at list end
          ^!Replace "\|\Z" >> "" AWRS
          ^!Select All
          ^!Set %Search%=^$GetSelection$
          ^!Close ^%Keywords% Discard
          ^!SetDocIndex ^%Doc%
          ^!SetListDelimiter ^P
          ^!Set %linesout%=^$GetDocMatchAll("(?-i)^.*(^%Search%).*^%dollar%";0)$
          ^!Toolbar New Document
          ^!InsertText ^%linesout%

          Example: The active document is...

          Froggy Frog
          Lives in a well
          If you want him
          Pull the bell.

          The keyword list is...

          bell
          Frog
          well

          The clip outputs line #1, 2, and 4. May be it needs some improvements
          to work with bigger texts and word lists ;-)

          Flo
           
        • idisnick
          Hi, I saved this as a clb file and added it to my library bar, but I can t get it to work, it doesn t ask for my keyword list. Can you tell me how it works.
          Message 4 of 30 , Jun 19, 2007
          View Source
          • 0 Attachment
            Hi,
            I saved this as a clb file and added it to my library bar, but I
            can't get it to work, it' doesn't ask for my keyword list. Can you
            tell me how it works. THanks!




            --- In ntb-clips@yahoogroups.com, "Flo" <flo.gehrke@...> wrote:
            >
            > I understand that idisnick is working with a keyword list. So I
            would
            > propose to extend Sheri's concept as follows...
            >
            >
            > ^!Set %Doc%=^$GetDocIndex$
            > ^!Set %Keywords%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
            > File:}
            > ^!Open ^%Keywords%
            > ; Remove empty lines in keyword list
            > ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
            > ; Put keywords into alternation
            > ^!Replace "\r\n" >> "|" AWRS
            > ; Remove "empty alternation" at list end
            > ^!Replace "\|\Z" >> "" AWRS
            > ^!Select All
            > ^!Set %Search%=^$GetSelection$
            > ^!Close ^%Keywords% Discard
            > ^!SetDocIndex ^%Doc%
            > ^!SetListDelimiter ^P
            > ^!Set %linesout%=^$GetDocMatchAll("(?-i)^.*(^%Search%).*^%
            dollar%";0)$
            > ^!Toolbar New Document
            > ^!InsertText ^%linesout%
            >
            > Example: The active document is...
            >
            > Froggy Frog
            > Lives in a well
            > If you want him
            > Pull the bell.
            >
            > The keyword list is...
            >
            > bell
            > Frog
            > well
            >
            > The clip outputs line #1, 2, and 4. May be it needs some
            improvements
            > to work with bigger texts and word lists ;-)
            >
            > Flo
            >  
            >
          • Don - HtmlFixIt.com
            ... ^!Set %Keywords%=^?{(T=O;F= Textfiles (*.txt)|*.txt )Choose Keyword File:} Most clips when sent via email need to be unwrapped if long lines get wrapped.
            Message 5 of 30 , Jun 19, 2007
            View Source
            • 0 Attachment
              idisnick wrote:
              > Hi,
              > I saved this as a clb file and added it to my library bar, but I
              > can't get it to work, it' doesn't ask for my keyword list. Can you
              > tell me how it works. THanks!
              >

              ^!Set %Keywords%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
              File:}


              Most clips when sent via email need to be "unwrapped" if long lines get
              wrapped.

              The above line for example was wrapped for me ^^^

              I had to unwrap it.

              Lines should for the most part start with either a ^! or a ; in most clips.
            • idisnick
              Thanks, I can test it now. This clip is actually doing the opposite of what I want though. I wanted it to parse out all lines thay contain the keywords, and
              Message 6 of 30 , Jun 19, 2007
              View Source
              • 0 Attachment
                Thanks, I can test it now. This clip is actually doing the opposite
                of what I want though. I wanted it to parse out all lines thay
                contain the keywords, and instead it is keeping all the line that
                have the keywords, and deleting all the other lines.



                --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...>
                wrote:
                >
                > idisnick wrote:
                > > Hi,
                > > I saved this as a clb file and added it to my library bar, but I
                > > can't get it to work, it' doesn't ask for my keyword list. Can
                you
                > > tell me how it works. THanks!
                > >
                >
                > ^!Set %Keywords%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                > File:}
                >
                >
                > Most clips when sent via email need to be "unwrapped" if long lines
                get
                > wrapped.
                >
                > The above line for example was wrapped for me ^^^
                >
                > I had to unwrap it.
                >
                > Lines should for the most part start with either a ^! or a ; in
                most clips.
                >
              • Don - HtmlFixIt.com
                Give this a try (note one long line): ;clip to remove lines that do contain keywords ;keywords go in a file with one keyword per line ;start with the file to
                Message 7 of 30 , Jun 19, 2007
                View Source
                • 0 Attachment
                  Give this a try (note one long line):
                  ;clip to remove lines that do contain keywords
                  ;keywords go in a file with one keyword per line
                  ;start with the file to be parsed open
                  ;whacked at by don at htmlfixit dot com
                  ;used regex proposed by pat

                  :StartOfClip
                  ;long line follows
                  ^!Set %Keywords%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword File:}
                  ;long line preceeds

                  :GetKeywords
                  ^!Open ^%Keywords%
                  ; Remove empty lines in keyword list
                  ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                  ; Put keywords into alternation
                  ^!Replace "\r\n" >> "|" AWRS
                  ; Remove "empty alternation" at list end
                  ^!Replace "\|\Z" >> "" AWRS
                  ^!Select All
                  ^!Set %Search%=^$GetSelection$
                  ^!Close Discard

                  ;be sure we are back on proper document
                  ^!Set %Doc%=^$GetDocIndex$
                  ^!SetWordWrap Off
                  ^!Jump Doc_Start
                  ^!SetDebug On


                  :Loop
                  ^!Select Eol
                  ^!Find ".*(^%Search%).*" TIHRS
                  ;this deletes line if contains keyword
                  ^!IfError Continue
                  ^!InsertText ^%EMPTY%
                  ^!Goto Loop

                  :Continue
                  ;finish if end of file is reached
                  ^!If ^$GetRow$ = ^$GetLinecount$ Finish
                  ;this moves to next line if keyword not found
                  ^!Jump +1
                  ^!Goto Loop

                  :Finish
                  ;clean up empty line at start if exists
                  ^!Jump Doc_Start
                  ^!Select Eol
                  ^!If "getselection" = "" Skip_2
                  ^!SelectTo 2:1
                  ^!InsertText ^%EMPTY%

                  :EmptyLinesOut
                  ;clean out all empty lines after deletion
                  ^!Replace "^P^P" >> "^P" ACIWS
                  ^!IfError End
                • Don - HtmlFixIt.com
                  Ok that was not right ... try this one instead ... ;clip to remove lines that do contain keywords ;keywords go in a file with one keyword per line ;start with
                  Message 8 of 30 , Jun 19, 2007
                  View Source
                  • 0 Attachment
                    Ok that was not right ... try this one instead ...

                    ;clip to remove lines that do contain keywords
                    ;keywords go in a file with one keyword per line
                    ;start with the file to be parsed open
                    ;whacked at by don at htmlfixit dot com
                    ;used regex proposed by pat

                    :StartOfClip
                    ;long line follows
                    ^!Set %Keywords%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword File:}
                    ;long line preceeds
                    ^!Set %Doc%=^$GetDocIndex$

                    :GetKeywords
                    ^!Open ^%Keywords%
                    ; Remove empty lines in keyword list
                    ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                    ; Put keywords into alternation
                    ^!Replace "\r\n" >> "|" AWRS
                    ; Remove "empty alternation" at list end
                    ^!Replace "\|\Z" >> "" AWRS
                    ^!Select All
                    ^!Set %Search%=^$GetSelection$
                    ^!Close Discard

                    ;be sure we are back on proper document
                    ^!SetDocIndex ^%Doc%
                    ^!SetWordWrap Off
                    ^!Jump Doc_Start


                    :Loop
                    ^!Select Eol
                    ^!Find ".*(^%Search%).*" TIHRS
                    ;this deletes line if contains keyword
                    ^!IfError Continue
                    ^!InsertText ^%EMPTY%
                    ^!Goto Loop

                    :Continue
                    ;finish if end of file is reached
                    ^!If ^$GetRow$ = ^$GetLinecount$ Finish
                    ;this moves to next line if keyword not found
                    ^!Jump +1
                    ^!Goto Loop

                    :Finish
                    ;clean up any empty line(s) at start if exist(s)
                    ^!Jump Doc_Start
                    ^!Select Eol
                    ^!If "^$GetSelection$" <> "" EmptyLinesOut
                    ^!SelectTo 2:1
                    ^!InsertText ^%EMPTY%
                    ^!Goto Finish

                    :EmptyLinesOut
                    ;clean out all empty lines after deletion
                    ^!Replace "^P^P" >> "^P" ACIWS
                    ^!IfError End
                    ^!Goto EmptyLinesOut
                  • idisnick
                    Hi, It worked but it left empty spaces unlike the last one. Can you make it so it closes the space? Also it took a few minutes to do a list of 24,000, any way
                    Message 9 of 30 , Jun 19, 2007
                    View Source
                    • 0 Attachment
                      Hi,
                      It worked but it left empty spaces unlike the last one. Can you make
                      it so it closes the space? Also it took a few minutes to do a list of
                      24,000, any way to speed it up?
                      Thanks!



                      --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...>
                      wrote:
                      >
                      > Give this a try (note one long line):
                      > ;clip to remove lines that do contain keywords
                      > ;keywords go in a file with one keyword per line
                      > ;start with the file to be parsed open
                      > ;whacked at by don at htmlfixit dot com
                      > ;used regex proposed by pat
                      >
                      > :StartOfClip
                      > ;long line follows
                      > ^!Set %Keywords%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                      File:}
                      > ;long line preceeds
                      >
                      > :GetKeywords
                      > ^!Open ^%Keywords%
                      > ; Remove empty lines in keyword list
                      > ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                      > ; Put keywords into alternation
                      > ^!Replace "\r\n" >> "|" AWRS
                      > ; Remove "empty alternation" at list end
                      > ^!Replace "\|\Z" >> "" AWRS
                      > ^!Select All
                      > ^!Set %Search%=^$GetSelection$
                      > ^!Close Discard
                      >
                      > ;be sure we are back on proper document
                      > ^!Set %Doc%=^$GetDocIndex$
                      > ^!SetWordWrap Off
                      > ^!Jump Doc_Start
                      > ^!SetDebug On
                      >
                      >
                      > :Loop
                      > ^!Select Eol
                      > ^!Find ".*(^%Search%).*" TIHRS
                      > ;this deletes line if contains keyword
                      > ^!IfError Continue
                      > ^!InsertText ^%EMPTY%
                      > ^!Goto Loop
                      >
                      > :Continue
                      > ;finish if end of file is reached
                      > ^!If ^$GetRow$ = ^$GetLinecount$ Finish
                      > ;this moves to next line if keyword not found
                      > ^!Jump +1
                      > ^!Goto Loop
                      >
                      > :Finish
                      > ;clean up empty line at start if exists
                      > ^!Jump Doc_Start
                      > ^!Select Eol
                      > ^!If "getselection" = "" Skip_2
                      > ^!SelectTo 2:1
                      > ^!InsertText ^%EMPTY%
                      >
                      > :EmptyLinesOut
                      > ;clean out all empty lines after deletion
                      > ^!Replace "^P^P" >> "^P" ACIWS
                      > ^!IfError End
                      >
                    • idisnick
                      This one seems to work great! ... File:}
                      Message 10 of 30 , Jun 19, 2007
                      View Source
                      • 0 Attachment
                        This one seems to work great!

                        --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...>
                        wrote:
                        >
                        > Ok that was not right ... try this one instead ...
                        >
                        > ;clip to remove lines that do contain keywords
                        > ;keywords go in a file with one keyword per line
                        > ;start with the file to be parsed open
                        > ;whacked at by don at htmlfixit dot com
                        > ;used regex proposed by pat
                        >
                        > :StartOfClip
                        > ;long line follows
                        > ^!Set %Keywords%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                        File:}
                        > ;long line preceeds
                        > ^!Set %Doc%=^$GetDocIndex$
                        >
                        > :GetKeywords
                        > ^!Open ^%Keywords%
                        > ; Remove empty lines in keyword list
                        > ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                        > ; Put keywords into alternation
                        > ^!Replace "\r\n" >> "|" AWRS
                        > ; Remove "empty alternation" at list end
                        > ^!Replace "\|\Z" >> "" AWRS
                        > ^!Select All
                        > ^!Set %Search%=^$GetSelection$
                        > ^!Close Discard
                        >
                        > ;be sure we are back on proper document
                        > ^!SetDocIndex ^%Doc%
                        > ^!SetWordWrap Off
                        > ^!Jump Doc_Start
                        >
                        >
                        > :Loop
                        > ^!Select Eol
                        > ^!Find ".*(^%Search%).*" TIHRS
                        > ;this deletes line if contains keyword
                        > ^!IfError Continue
                        > ^!InsertText ^%EMPTY%
                        > ^!Goto Loop
                        >
                        > :Continue
                        > ;finish if end of file is reached
                        > ^!If ^$GetRow$ = ^$GetLinecount$ Finish
                        > ;this moves to next line if keyword not found
                        > ^!Jump +1
                        > ^!Goto Loop
                        >
                        > :Finish
                        > ;clean up any empty line(s) at start if exist(s)
                        > ^!Jump Doc_Start
                        > ^!Select Eol
                        > ^!If "^$GetSelection$" <> "" EmptyLinesOut
                        > ^!SelectTo 2:1
                        > ^!InsertText ^%EMPTY%
                        > ^!Goto Finish
                        >
                        > :EmptyLinesOut
                        > ;clean out all empty lines after deletion
                        > ^!Replace "^P^P" >> "^P" ACIWS
                        > ^!IfError End
                        > ^!Goto EmptyLinesOut
                        >
                      • Flo
                        ... Try another one. In a test, it parsed 16.000 lines within a few seconds. This will remind some members of an earlier discussion (see Removing stopwords
                        Message 11 of 30 , Jun 20, 2007
                        View Source
                        • 0 Attachment
                          --- In ntb-clips@yahoogroups.com, "idisnick" <idisnick@...> wrote:
                          > Hi,...
                          > Also it took a few minutes to do a list of
                          > 24,000, any way to speed it up?

                          Try another one. In a test, it "parsed" 16.000 lines within a few
                          seconds. This will remind some members of an earlier discussion
                          (see "Removing stopwords from word list")...;-)

                          Flo


                          ^!Set %Doc%=^$GetDocIndex$
                          ^!Set %Keywords%=^?[(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                          File:]
                          ^!Open ^%Keywords%
                          ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                          ^!Replace "\r\n" >> "|" AWRS
                          ^!Replace "\|\Z" >> "" AWRS
                          ^!Select All
                          ^!Set %Search%=^$GetSelection$
                          ^!Close ^%Keywords% Discard
                          ^!SetDocIndex ^%Doc%
                          ^!SetListDelimiter ^%Space%^P
                          ^!Set %linesout%=^$GetDocMatchAll("^.*(^%Search%).*^%Dollar%";0)$^%
                          Space%
                          ^!Menu Edit/Copy All
                          ^!Toolbar Paste New
                          ^!Jump Doc_End
                          ^!IfFalse ^$IsEmpty(^$GetLine$)$ Next Else Skip
                          ^!Keyboard Enter
                          ^!Select All
                          ^!Menu Modify/Lines/Trim Blanks
                          ^!Jump Doc_End
                          ^%linesout%^%NL%
                          ^!Select All
                          ^$StrSort("^$GetSelection$";0;1;1)$
                          ^!Replace "^(.+)\r\n\1 (\r\n)" >> "" AWIRS
                          ^!Replace "^(.+) \r\n" >> "" AWRS
                          ^!Info Finished!
                          ;end of clip
                        • Jeff Scism
                          You should probably avoid the Keyboard ENTER and other Keyboard commands, (They do not work for everyone, it seems) also things move along quicker with
                          Message 12 of 30 , Jun 20, 2007
                          View Source
                          • 0 Attachment
                            You should probably avoid the Keyboard ENTER and other Keyboard
                            commands, (They do not work for everyone, it seems) also things move
                            along quicker with ^!SetScreenupdate OFF.

                            Flo wrote:
                            >
                            > --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com>,
                            > "idisnick" <idisnick@...> wrote:
                            > > Hi,...
                            > > Also it took a few minutes to do a list of
                            > > 24,000, any way to speed it up?
                            >
                            > Try another one. In a test, it "parsed" 16.000 lines within a few
                            > seconds. This will remind some members of an earlier discussion
                            > (see "Removing stopwords from word list")...;-)
                            >
                            > Flo
                            >
                            > ^!Set %Doc%=^$GetDocIndex$
                            > ^!Set %Keywords%=^?[(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                            > File:]
                            > ^!Open ^%Keywords%
                            > ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                            > ^!Replace "\r\n" >> "|" AWRS
                            > ^!Replace "\|\Z" >> "" AWRS
                            > ^!Select All
                            > ^!Set %Search%=^$GetSelection$
                            > ^!Close ^%Keywords% Discard
                            > ^!SetDocIndex ^%Doc%
                            > ^!SetListDelimiter ^%Space%^P
                            > ^!Set %linesout%=^$GetDocMatchAll("^.*(^%Search%).*^%Dollar%";0)$^%
                            > Space%
                            > ^!Menu Edit/Copy All
                            > ^!Toolbar Paste New
                            > ^!Jump Doc_End
                            > ^!IfFalse ^$IsEmpty(^$GetLine$)$ Next Else Skip
                            > ^!Keyboard Enter
                            > ^!Select All
                            > ^!Menu Modify/Lines/Trim Blanks
                            > ^!Jump Doc_End
                            > ^%linesout%^%NL%
                            > ^!Select All
                            > ^$StrSort("^$GetSelection$";0;1;1)$
                            > ^!Replace "^(.+)\r\n\1 (\r\n)" >> "" AWIRS
                            > ^!Replace "^(.+) \r\n" >> "" AWRS
                            > ^!Info Finished!
                            > ;end of clip
                            >
                            >
                          • Flo
                            ... move ... OK, Jeff. So we better add ^!SetScreenUpdate Off , and replace ^! Keyboard Enter with ^!InsertText ^%NL% . I also added another prompt. Now
                            Message 13 of 30 , Jun 21, 2007
                            View Source
                            • 0 Attachment
                              --- In ntb-clips@yahoogroups.com, Jeff Scism <Scismgenie@...> wrote:
                              >
                              > You should probably avoid the Keyboard ENTER and other Keyboard
                              > commands, (They do not work for everyone, it seems) also things
                              move
                              > along quicker with ^!SetScreenupdate OFF.

                              OK, Jeff. So we better add "^!SetScreenUpdate Off", and replace "^!
                              Keyboard Enter" with "^!InsertText ^%NL%".

                              I also added another prompt. Now you can choose case-sensitive
                              search, or ignore the case.


                              ^!SetScreenUpdate Off
                              ^!SetHintInfo Working...
                              ^!Set %Doc%=^$GetDocIndex$
                              ^!Set %Keywords%=^?[(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                              File:]
                              ^!Set %Case%=^?[Case-sensitive search:==Yes^=(?-i)|_No^=(?i)]
                              ^!Open ^%Keywords%
                              ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                              ^!Replace "\r\n" >> "|" AWRS
                              ^!Replace "\|\Z" >> "" AWRS
                              ^!Select All
                              ^!Set %Search%=^$GetSelection$
                              ^!Close ^%Keywords% Discard
                              ^!SetDocIndex ^%Doc%
                              ^!SetListDelimiter ^%Space%^P
                              ^!Set %linesout%=^$GetDocMatchAll("^%Case%^.*(^%Search%).*^%
                              Dollar%";0)$^%Space%
                              ^!Set %linesout%=^$GetDocMatchAll("^.*(^%Search%).*^%Dollar%";0)$^%
                              Space%
                              ^!Menu Edit/Copy All
                              ^!Toolbar Paste New
                              ^!Jump Doc_End
                              ^!IfFalse ^$IsEmpty(^$GetLine$)$ Next Else Skip
                              ^!InsertText ^%NL%
                              ^!Select All
                              ^!Menu Modify/Lines/Trim Blanks
                              ^!Jump Doc_End
                              ^%linesout%^%NL%
                              ^!Select All
                              ^$StrSort("^$GetSelection$";0;1;1)$
                              ^!Replace "^(.+)\r\n\1 (\r\n)" >> "" AWIRS
                              ^!Replace "^(.+) \r\n" >> "" AWRS
                              ^!Info Finished!
                              ; end of clip


                              So far, there is only one problem with this clip: In a test, it
                              worked fine with 250 keywords, but it failed with 16.000. See my
                              reply to Sheri in this thread...

                              Flo
                               
                            • Flo
                              Sheri wrote... ... In fact, the alternation to be used with ^$GetDocMatchAll$ seems to be limited. When testing this with a file of 250 keywords, and a text of
                              Message 14 of 30 , Jun 21, 2007
                              View Source
                              • 0 Attachment
                                Sheri wrote...

                                > How many keywords? If not more than a few hundred could
                                > possibly use something like this (uses regular expression
                                > matching).
                                >
                                > ^!Setlistdelimiter ^P ;next is one long line ^!Set
                                > %linesout%=^$GetDocMatchAll("(?-
                                > i)^.*(comprehensive|switch|system).*^%dollar%";0)$ ;end long
                                > line ^!Toolbar New Document ^!InsertText ^%linesout%

                                In fact, the alternation to be used with ^$GetDocMatchAll$ seems to
                                be limited. When testing this with a file of 250 keywords, and a text
                                of 16,000 lines, it works fine. It fails when taking those 250
                                keywords as text, and 16.000 words as keywords. NT5 reacts with the
                                message...

                                "Regex error: internal error: overran compiling workspace".

                                (You may test it with those files at http://flogehrke.homepage.t-
                                online.de/491/ntf-wordlist.zip we used for testing another clip some
                                month ago.)

                                Is this limitation definable in any way?

                                Flo
                                 
                              • Sheri
                                ... Hi Flo, I don t think it is definable per se. You could test generated patterns in clips with ^!IfRegexOK. You can retrieve the error message (if not ok)
                                Message 15 of 30 , Jun 21, 2007
                                View Source
                                • 0 Attachment
                                  Flo wrote:
                                  > Sheri wrote...
                                  >
                                  >
                                  >> How many keywords? If not more than a few hundred could
                                  >> possibly use something like this (uses regular expression
                                  >> matching).
                                  >>
                                  >> ^!Setlistdelimiter ^P ;next is one long line ^!Set
                                  >> %linesout%=^$GetDocMatchAll("(?-
                                  >> i)^.*(comprehensive|switch|system).*^%dollar%";0)$ ;end long
                                  >> line ^!Toolbar New Document ^!InsertText ^%linesout%
                                  >>
                                  >
                                  > In fact, the alternation to be used with ^$GetDocMatchAll$ seems to
                                  > be limited. When testing this with a file of 250 keywords, and a text
                                  > of 16,000 lines, it works fine. It fails when taking those 250
                                  > keywords as text, and 16.000 words as keywords. NT5 reacts with the
                                  > message...
                                  >
                                  > "Regex error: internal error: overran compiling workspace".
                                  >
                                  > (You may test it with those files at http://flogehrke.homepage.t-
                                  > online.de/491/ntf-wordlist.zip we used for testing another clip some
                                  > month ago.)
                                  >
                                  > Is this limitation definable in any way?
                                  >
                                  > Flo
                                  >
                                  >
                                  >
                                  Hi Flo,

                                  I don't think it is definable per se. You could test generated patterns
                                  in clips with ^!IfRegexOK. You can retrieve the error message (if not
                                  ok) with ^$GetRegexErrorMsg$. A clip could possibly take corrective
                                  action for some errors (like reducing the number of alternatives to
                                  processed at one time).

                                  PCRE 7.2 was just released, and it says it corrected this:

                                  "A pattern with a very large number of alternatives (more than several
                                  hundred) was running out of internal workspace during the pre-compile
                                  phase, where pcre_compile() figures out how much memory will be needed.
                                  A bit of new cunning has reduced the workspace needed for groups with
                                  alternatives. The 1000-alternative test pattern now uses 12 bytes of
                                  workspace instead of running out of the 4096 that are available."

                                  I don't think it will be too long before NoteTab incorporates the
                                  update. However, there are other factors besides "internal workspace"
                                  that affect how many alternatives will work. When working on the stop
                                  list clip, I remember an error message that the regular expression was
                                  "too long". In one of the stop list clips, I applied the keywords in
                                  approximately 10K chunks and that worked at that time (think it was pcre
                                  6.7 then).

                                  Regards,
                                  Sheri
                                • paulmaser
                                  You could probably replace the first two lines below with one command, that would look something like this: ^!Replace ( r n)+ | AWRS
                                  Message 16 of 30 , Jun 21, 2007
                                  View Source
                                  • 0 Attachment
                                    You could probably replace the first two lines below with one command,
                                    that would look something like this:
                                    ^!Replace "(\r\n)+" >> "|" AWRS


                                    > ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                                    > ^!Replace "\r\n" >> "|" AWRS
                                    > ^!Replace "\|\Z" >> "" AWRS
                                  • Sheri
                                    ... Hi again, I haven t been following this thread in detail, but if he just wants to remove lines having a keyword, wouldn t it be better to use a replace
                                    Message 17 of 30 , Jun 21, 2007
                                    View Source
                                    • 0 Attachment
                                      --- In ntb-clips@yahoogroups.com, "Flo" <flo.gehrke@...> wrote:
                                      >
                                      > Sheri wrote...
                                      >
                                      > > How many keywords? If not more than a few hundred could
                                      > > possibly use something like this (uses regular expression
                                      > > matching).
                                      > >
                                      > > ^!Setlistdelimiter ^P ;next is one long line ^!Set
                                      > > %linesout%=^$GetDocMatchAll("(?-
                                      > > i)^.*(comprehensive|switch|system).*^%dollar%";0)$ ;end long
                                      > > line ^!Toolbar New Document ^!InsertText ^%linesout%
                                      >
                                      > In fact, the alternation to be used with ^$GetDocMatchAll$ seems to
                                      > be limited. When testing this with a file of 250 keywords, and a text
                                      > of 16,000 lines, it works fine. It fails when taking those 250
                                      > keywords as text, and 16.000 words as keywords. NT5 reacts with the
                                      > message...
                                      >
                                      > "Regex error: internal error: overran compiling workspace".
                                      >
                                      > (You may test it with those files at http://flogehrke.homepage.t-
                                      > online.de/491/ntf-wordlist.zip we used for testing another clip some
                                      > month ago.)
                                      >
                                      > Is this limitation definable in any way?
                                      >
                                      > Flo
                                      >
                                      >

                                      Hi again,

                                      I haven't been following this thread in detail, but if he just wants
                                      to remove lines having a keyword, wouldn't it be better to use a
                                      replace command (replacing keyword lines with "") instead of using
                                      getdocmatchall?

                                      Seems to me the stop word task was more complicated because you wanted
                                      to not only delete lines matching a stop word, but also eliminate
                                      duplicates that were not stop words.

                                      Using ^!Replace all(s) would be fast (though you still have to keep
                                      your alternates lists reasonably sized for PCRE).

                                      Regards,
                                      Sheri
                                    • Flo
                                      ... command, ... Thanks, Paul. You are right. ^!Replace ( r n)+ | AWRS will do the job. By the way: The clip wouldn t even need to open and to process
                                      Message 18 of 30 , Jun 21, 2007
                                      View Source
                                      • 0 Attachment
                                        "paulmaser" <paul@...> wrote:
                                        >
                                        > You could probably replace the first two lines below with one
                                        command,
                                        > that would look something like this:
                                        > ^!Replace "(\r\n)+" >> "|" AWRS
                                        >
                                        >
                                        > > ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                                        > > ^!Replace "\r\n" >> "|" AWRS
                                        > > ^!Replace "\|\Z" >> "" AWRS

                                        Thanks, Paul. You are right. "^!Replace "(\r\n)+" >> "|" AWRS" will
                                        do the job.

                                        By the way: The clip wouldn't even need to open and to process the
                                        keyword list if we make sure from the outset that it doesn't contain
                                        any empty lines. Thus we could replace all the lines from "^!Open ^%
                                        Keywords%" to "^!Close ^%Keywords% Discard" with...

                                        ^!SetClipboard ^$GetFileText(^%Keywords%)$
                                        ^!SetClipboard=^$StrReplace(^%NL%;|;^$GetClipboard$;0;0)$
                                        ^!Set %Search%=^$GetClipboard$

                                        This could speed up the clip even more ;-)

                                        Flo
                                         
                                      • Flo
                                        ... command, ... Thanks, Paul. You are right. ^!Replace ( r n)+ | AWRS will do the job. By the way: The clip wouldn t even need to open and to process
                                        Message 19 of 30 , Jun 21, 2007
                                        View Source
                                        • 0 Attachment
                                          "paulmaser" <paul@...> wrote:
                                          >
                                          > You could probably replace the first two lines below with one
                                          command,
                                          > that would look something like this:
                                          > ^!Replace "(\r\n)+" >> "|" AWRS
                                          >
                                          >
                                          > > ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                                          > > ^!Replace "\r\n" >> "|" AWRS
                                          > > ^!Replace "\|\Z" >> "" AWRS

                                          Thanks, Paul. You are right. "^!Replace "(\r\n)+" >> "|" AWRS" will
                                          do the job.

                                          By the way: The clip wouldn't even need to open and to process the
                                          keyword list if we make sure from the outset that it doesn't contain
                                          any empty lines. Thus we could replace all the lines from "^!Open ^%
                                          Keywords%" to "^!Close ^%Keywords% Discard" with...

                                          ^!SetClipboard ^$GetFileText(^%Keywords%)$
                                          ^!SetClipboard=^$StrReplace(^%NL%;|;^$GetClipboard$;0;0)$
                                          ^!Set %Search%=^$GetClipboard$

                                          This could speed up the clip even more ;-)

                                          Flo
                                           
                                        • Flo
                                          Thanks for that information, Sheri! I remember those 10K chunks . Members who want to read up on that issue - it s in message # 15213 (see ^!Select
                                          Message 20 of 30 , Jun 21, 2007
                                          View Source
                                          • 0 Attachment
                                            Thanks for that information, Sheri!

                                            I remember those "10K chunks". Members who want to read up on that
                                            issue - it's in message # 15213 (see ^!Select +10000...).

                                            Flo
                                             
                                          • Flo
                                            Sheri wrote... ... Indeed - why not this way... ^!SetScreenUpdate Off ^!SetHintInfo Working... ^!Set %Doc%=^$GetDocIndex$ ^!Set %Keywords%=^?[(T=O;F= Textfiles
                                            Message 21 of 30 , Jun 21, 2007
                                            View Source
                                            • 0 Attachment
                                              Sheri wrote...

                                              > I haven't been following this thread in detail, but if he just wants
                                              > to remove lines having a keyword, wouldn't it be better to use a
                                              > replace command (replacing keyword lines with "") instead of using
                                              > getdocmatchall?

                                              Indeed - why not this way...


                                              ^!SetScreenUpdate Off
                                              ^!SetHintInfo Working...
                                              ^!Set %Doc%=^$GetDocIndex$
                                              ^!Set %Keywords%=^?[(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                                              File:]
                                              ^!Set %Case%=^?[Case-sensitive search:==Yes^=(?-i)|_No^=(?i)]
                                              ^!Open ^%Keywords%
                                              ^!Replace "(\r\n)+" >> "|" AWRS
                                              ^!Replace "\|\Z" >> "" AWRS
                                              ^!Replace "\A\|" >> "" AWRS
                                              ^!Set %Search%=^$GetText$
                                              ^!Close ^%Keywords% Discard
                                              ^!SetDocIndex ^%Doc%
                                              ^!Menu Edit/Copy All
                                              ^!Menu Edit/Paste New
                                              ^!Replace "^%Case%^.*(^%Search%).*\r\n" >> "" AWRS
                                              ^!Info Finished!


                                              Regards,
                                              Flo
                                               
                                            • Sheri
                                              ... Great! If interested in making further improvements, here are a few more enhancements to consider. When a clip makes use of the clipboard, its nice to
                                              Message 22 of 30 , Jun 22, 2007
                                              View Source
                                              • 0 Attachment
                                                --- In ntb-clips@yahoogroups.com, "Flo" <flo.gehrke@...> wrote:
                                                >
                                                > Sheri wrote...
                                                >
                                                > > I haven't been following this thread in detail, but if he just wants
                                                > > to remove lines having a keyword, wouldn't it be better to use a
                                                > > replace command (replacing keyword lines with "") instead of using
                                                > > getdocmatchall?
                                                >
                                                > Indeed - why not this way...
                                                >
                                                >
                                                > ^!SetScreenUpdate Off
                                                > ^!SetHintInfo Working...
                                                > ^!Set %Doc%=^$GetDocIndex$
                                                > ^!Set %Keywords%=^?[(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                                                > File:]
                                                > ^!Set %Case%=^?[Case-sensitive search:==Yes^=(?-i)|_No^=(?i)]
                                                > ^!Open ^%Keywords%
                                                > ^!Replace "(\r\n)+" >> "|" AWRS
                                                > ^!Replace "\|\Z" >> "" AWRS
                                                > ^!Replace "\A\|" >> "" AWRS
                                                > ^!Set %Search%=^$GetText$
                                                > ^!Close ^%Keywords% Discard
                                                > ^!SetDocIndex ^%Doc%
                                                > ^!Menu Edit/Copy All
                                                > ^!Menu Edit/Paste New
                                                > ^!Replace "^%Case%^.*(^%Search%).*\r\n" >> "" AWRS
                                                > ^!Info Finished!
                                                >
                                                >
                                                > Regards,
                                                > Flo
                                                >
                                                >

                                                Great! If interested in making further improvements, here are a few
                                                more enhancements to consider.

                                                When a clip makes use of the clipboard, its nice to restore its
                                                original contents at the end.

                                                You are closing the keyword document, before navigating to the
                                                original document. You need to be sure the keyword document was not
                                                already open when the clip was started. If it gets closed from a lower
                                                docindex than the starting document, you would not return to the
                                                original document when you set your docindex. You'd have to navigate
                                                to the original docindex and then close discard the keywords document.

                                                Normally it would be a good idea to reverse sort alternates when
                                                constructing a regular expression, but since whole lines containing
                                                alternates are being deleted, in this case that wouldn't make any
                                                difference. The reason they should normally be reverse sorted is,
                                                alternates are searched from left to right. If there's a keyword "be"
                                                and a keyword "before", "be|before" will never find "before" in the
                                                text. Using \b's before and after the alternates would also work, if
                                                the keywords are meant to be whole words only.

                                                If there are any characters that might get interpreted by the regex
                                                engine as metacharacters in the keyword document, they should be
                                                escaped with a backslash prior to using them in the alternates.

                                                When constructing a regular expression with code, its probably a good
                                                idea to check ^!IfRegexOK before using the expression in a "real"
                                                statement. If there is an error, you'd have an opportunity to show a
                                                message and still do clean up tasks (like restore the clipboard).

                                                Regards,
                                                Sheri
                                              • Flo
                                                Hi Sheri, I m grateful to you for all these recommendations, and I tried to apply them to this clip... ... That s not given here, isn t it? But I think it
                                                Message 23 of 30 , Jun 23, 2007
                                                View Source
                                                • 0 Attachment
                                                  Hi Sheri,

                                                  I'm grateful to you for all these recommendations, and I tried to
                                                  apply them to this clip...

                                                  > When a clip makes use of the clipboard, its nice to restore its
                                                  > original contents at the end.

                                                  That's not given here, isn't it? But I think it could easily be done
                                                  by saving its contents in a variable, and afterwards pasting it back
                                                  to the clipboard like...

                                                    ^!Set %Var%=^$GetClipboard$ ... ^!SetClipboard ^%Var%

                                                  > You'd have to navigate to the original docindex and then close
                                                  > discard the keywords document.

                                                  I changed the order of these command lines.

                                                  By the way: Isn't it even safer to work with the document name? Given
                                                  that the clip always gets started from the original document, we
                                                  could replace...

                                                    ^!Set %Doc%=^$GetDocIndex$^  with  ^!Set %Doc%=^GetDocName

                                                  and

                                                    ^!SetDocIndex ^%Doc%  with  ^!Open ^%Doc%

                                                  (According to the help file, I suppose that ^!Open also selects a
                                                  document that is open already.)

                                                  > Normally it would be a good idea to reverse sort alternates...

                                                  See line #8, and 9 now

                                                  > metacharacters in the keyword document...should be escaped
                                                  > with a backslash

                                                  Certainly, this would be a professional solution. In message # 15199
                                                  you created a subclip GetRegEscape that would do this job.

                                                  > its probably a good idea to check ^!IfRegexOK before using the
                                                  > expression in a "real" statement.

                                                  I hope I've done it the right way.

                                                  > Using \b's before and after the alternates would also work, if
                                                  > the keywords are meant to be whole words only.

                                                  This has been added too.

                                                  In addition to that, I've combined the \b's with a negative
                                                  lookbehind and lookahead. They do not allow certain characters before
                                                  or behind a search word that is being treated as a whole word. This
                                                  is mainly aiming at words hyphenated with - (ANSI 45) and the
                                                  apostrophe ' (ANSI 39). For example: If "McDonald" is defined as a
                                                  keyword it normally matches "McDonald's" too even if embraced with \b
                                                  since - and ' are interpreted as word delimiters. Consequently, the
                                                  clip would delete a line like...

                                                      "eating a hamburger at McDonald's"

                                                  although it isn't really matched by "McDonald" as a whole word.
                                                  Or "self-service" would be matched by "self" and "service" as well
                                                  although they possibly are regarded as substrings of "self-service"
                                                  only. It depends, of course, on the way you look at "lexical
                                                  problems" like that, and also on the sort of text to be processed.
                                                  Certainly, this construction needs some more testing...

                                                  How to deal with compound nouns written with a space (ANSI 32)? For
                                                  example: "Express" would delete "American Express" although we
                                                  possibly don't regard it as a match of that compound. The only
                                                  solution I can see for that is to enter "American Express" with a
                                                  protected space (ANSI 160) in order to distinguish it from the normal
                                                  space (ANSI 32). With regard to this, we could extend the Lookarounds
                                                  with \xA0 in order to match ANSI 160. Maybe there's a better solution
                                                  (or even more problems)...

                                                  Regards,
                                                  Flo


                                                  ^!SetScreenUpdate Off
                                                  ^!SetHintInfo Working...
                                                  ^!Set %Doc%=^$GetDocIndex$
                                                  ^!Set %Keywords%=^?[(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                                                  File:]
                                                  ^!Set %Case%=^?[Case-sensitive search:==Yes^=(?-i)|_No^=(?i)]
                                                  ^!Set %Substr%=^?[Search whole words only:==Yes^=1|_No^=0]
                                                  ^!Open ^%Keywords%
                                                  ^!Select All
                                                  ^$StrSort("^$GetSelection$";0;0;1)$
                                                  ^!Replace "(\r\n)+" >> "|" AWRS
                                                  ^!Replace "\|\Z" >> "" AWRS
                                                  ^!Replace "\A\|" >> "" AWRS
                                                  ^!Set %Search%=^$GetText$
                                                  ^!SetDocIndex ^%Doc%
                                                  ^!Close ^%Keywords% Discard
                                                  ^!IfTrue ^%Substr% Next Else Skip_2
                                                  ;^!Set %Expr%="^%Case%^.*\b(^%Search%)\b.*\r\n"
                                                  ; start of long line
                                                  ^!Set %Expr%="^%Case%^.*\b(?<![[:punct:]])(^%Search%)(?![[:punct:]])
                                                  \b.*\r\n"
                                                  ; end of long line
                                                  ^!Goto Skip
                                                  ^!Set %Expr%="^%Case%^.*(^%Search%).*\r\n"
                                                  ; Try next line for testing RegEx error ;-)
                                                  ;^!Set %Expr%="[[:punkt:]]+"
                                                  ^!IfRegExOK "^%Expr%" Next Else Message
                                                  ^!Menu Edit/Copy All
                                                  ^!Menu Edit/Paste New
                                                  ^!Replace "^%Expr%" >> "" AWRS
                                                  ^!Info Finished!
                                                  ^!Goto End

                                                  :Message
                                                  ^!Prompt ^$GetRegexErrorMsg$
                                                • Sheri
                                                  Hi Flo, ... Well you do ^!Menu Edit/Copy All near the end so you can paste the result to a new document. As is, that ends up remaining on the clipboard after
                                                  Message 24 of 30 , Jun 24, 2007
                                                  View Source
                                                  • 0 Attachment
                                                    Hi Flo,

                                                    --- In ntb-clips@yahoogroups.com, "Flo" <flo.gehrke@...> wrote:
                                                    >
                                                    > I'm grateful to you for all these recommendations, and I tried to
                                                    > apply them to this clip...
                                                    >
                                                    > > When a clip makes use of the clipboard, its nice to restore its
                                                    > > original contents at the end.
                                                    >
                                                    > That's not given here, isn't it?

                                                    Well you do "^!Menu Edit/Copy All" near the end so you can paste the
                                                    result to a new document. As is, that ends up remaining on the
                                                    clipboard after the clip has finished.

                                                    > But I think it could easily be done by saving its contents in a
                                                    > variable, and afterwards pasting it back to the clipboard like..
                                                    >
                                                    > ^!Set %Var%=^$GetClipboard$ ... ^!SetClipboard ^%Var%

                                                    See ^!ClipboardSave and ^!ClipboardRestore

                                                    >
                                                    > > You'd have to navigate to the original docindex and then close
                                                    > > discard the keywords document.
                                                    >
                                                    > I changed the order of these command lines.
                                                    >
                                                    > By the way: Isn't it even safer to work with the document name?
                                                    > Given that the clip always gets started from the original
                                                    > document, we could replace...

                                                    >
                                                    > ^!Set %Doc%=^$GetDocIndex$^ with ^!Set %Doc%=^GetDocName
                                                    >
                                                    > and
                                                    >
                                                    > ^!SetDocIndex ^%Doc% with ^!Open ^%Doc%

                                                    Yes, that should work. But then NoteTab has to find the docindex,
                                                    maybe slightly faster if you save and restore the docindex yourself.

                                                    >
                                                    > (According to the help file, I suppose that ^!Open also selects a
                                                    > document that is open already.)
                                                    >
                                                    > > Normally it would be a good idea to reverse sort alternates...
                                                    >
                                                    > See line #8, and 9 now
                                                    >
                                                    > > metacharacters in the keyword document...should be escaped
                                                    > > with a backslash
                                                    >
                                                    > Certainly, this would be a professional solution. In message # 15199
                                                    > you created a subclip GetRegEscape that would do this job.

                                                    Since you're using a document buffer, you could use a single ^!Replace
                                                    to replace any metacharacters (alternates -- be sure to escape them)
                                                    with "\\$0"; the GetRegEscape clip approach is necessary only when
                                                    acting on a string instead of a document. There is currently no
                                                    provision in NoteTab to do regex string operations.

                                                    >
                                                    > > its probably a good idea to check ^!IfRegexOK before using the
                                                    > > expression in a "real" statement.
                                                    >
                                                    > I hope I've done it the right way.

                                                    Haven't tried it, but it looks good to me :)

                                                    I haven't made use of classes like punct before myself, so you're
                                                    blazing a trail :)

                                                    >
                                                    > > Using \b's before and after the alternates would also work, if
                                                    > > the keywords are meant to be whole words only.
                                                    >
                                                    > This has been added too.
                                                    >
                                                    > In addition to that, I've combined the \b's with a negative
                                                    > lookbehind and lookahead. They do not allow certain characters
                                                    > before or behind a search word that is being treated as a whole
                                                    > word. This is mainly aiming at words hyphenated with - (ANSI 45)
                                                    > and the apostrophe ' (ANSI 39). For example: If "McDonald" is
                                                    > defined as a keyword it normally matches "McDonald's" too even if
                                                    > embraced with \b
                                                    > since - and ' are interpreted as word delimiters. Consequently, the
                                                    > clip would delete a line like...
                                                    >
                                                    > "eating a hamburger at McDonald's"
                                                    >
                                                    > although it isn't really matched by "McDonald" as a whole word.
                                                    > Or "self-service" would be matched by "self" and "service" as well
                                                    > although they possibly are regarded as substrings of "self-service"
                                                    > only. It depends, of course, on the way you look at "lexical
                                                    > problems" like that, and also on the sort of text to be processed.
                                                    > Certainly, this construction needs some more testing...

                                                    > How to deal with compound nouns written with a space (ANSI 32)?
                                                    > For example: "Express" would delete "American Express" although
                                                    > we possibly don't regard it as a match of that compound. The only
                                                    > solution I can see for that is to enter "American Express" with a
                                                    > protected space (ANSI 160) in order to distinguish it from the
                                                    > normal space (ANSI 32). With regard to this, we could extend the
                                                    > Lookarounds with \xA0 in order to match ANSI 160. Maybe there's a
                                                    > better solution (or even more problems)...

                                                    Hmn, you bring up some interersting points. "American Express" would
                                                    be its own keyword as would "Express". In the case of the "Express"
                                                    alternate, it could use a negative look behind, to make sure it it not
                                                    preceded by "American\x20". Obviously would require some fine tuning
                                                    of the keywords or alternates before applying them to customize them
                                                    to that extent.

                                                    Regards,
                                                    Sheri

                                                    >
                                                    >
                                                    > ^!SetScreenUpdate Off
                                                    > ^!SetHintInfo Working...
                                                    > ^!Set %Doc%=^$GetDocIndex$
                                                    > ^!Set %Keywords%=^?[(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                                                    > File:]
                                                    > ^!Set %Case%=^?[Case-sensitive search:==Yes^=(?-i)|_No^=(?i)]
                                                    > ^!Set %Substr%=^?[Search whole words only:==Yes^=1|_No^=0]
                                                    > ^!Open ^%Keywords%
                                                    > ^!Select All
                                                    > ^$StrSort("^$GetSelection$";0;0;1)$
                                                    > ^!Replace "(\r\n)+" >> "|" AWRS
                                                    > ^!Replace "\|\Z" >> "" AWRS
                                                    > ^!Replace "\A\|" >> "" AWRS
                                                    > ^!Set %Search%=^$GetText$
                                                    > ^!SetDocIndex ^%Doc%
                                                    > ^!Close ^%Keywords% Discard
                                                    > ^!IfTrue ^%Substr% Next Else Skip_2
                                                    > ;^!Set %Expr%="^%Case%^.*\b(^%Search%)\b.*\r\n"
                                                    > ; start of long line
                                                    > ^!Set %Expr%="^%Case%^.*\b(?<![[:punct:]])(^%Search%)(?![[:punct:]])
                                                    > \b.*\r\n"
                                                    > ; end of long line
                                                    > ^!Goto Skip
                                                    > ^!Set %Expr%="^%Case%^.*(^%Search%).*\r\n"
                                                    > ; Try next line for testing RegEx error ;-)
                                                    > ;^!Set %Expr%="[[:punkt:]]+"
                                                    > ^!IfRegExOK "^%Expr%" Next Else Message
                                                    > ^!Menu Edit/Copy All
                                                    > ^!Menu Edit/Paste New
                                                    > ^!Replace "^%Expr%" >> "" AWRS
                                                    > ^!Info Finished!
                                                    > ^!Goto End
                                                    >
                                                    > :Message
                                                    > ^!Prompt ^$GetRegexErrorMsg$
                                                    >
                                                  • hsavage
                                                    ... tried to apply them to this clip... ... Flo, If you re insistent about restoring the clipboard to its previous state after running a clip you might want to
                                                    Message 25 of 30 , Jun 25, 2007
                                                    View Source
                                                    • 0 Attachment
                                                      Flo wrote:
                                                      > Hi Sheri,
                                                      >
                                                      > I'm grateful to you for all these recommendations, and I
                                                      tried to apply them to this clip...
                                                      >
                                                      >> When a clip makes use of the clipboard, its nice to
                                                      >> restore its original contents at the end.
                                                      >
                                                      > That's not given here, isn't it? But I think it could
                                                      >> easily be done by saving its contents in a variable, and
                                                      >> afterwards pasting it back to the clipboard like...
                                                      >
                                                      > ^!Set %Var%=^$GetClipboard$ ... ^!SetClipboard ^%Var%

                                                      Flo,

                                                      If you're insistent about restoring the clipboard to its previous state
                                                      after running a clip you might want to check into the following 2 clip
                                                      commands.

                                                      ^!ClipBoardSave
                                                      ^!ClipBoardRestore [+]


                                                      ºvº SL-6-199 -created- 2007.06.25 - 19.48.24

                                                      "Party Etiquette; Drinking Your Fair Share."
                                                      ¤ ø ¤ hrs ø hsavage@...
                                                    • Flo
                                                      The latest version of this clip splits the keyword list into chunks of 500 lines in order to meet the restrictions of the alternation. In my tests, that error
                                                      Message 26 of 30 , Jun 27, 2007
                                                      View Source
                                                      • 0 Attachment
                                                        The latest version of this clip splits the keyword list into chunks
                                                        of 500 lines in order to meet the restrictions of the alternation. In
                                                        my tests, that error message (mentioned above) appeared from 818
                                                        keywords on. Now it works with an unlimited amount of keywords. It's
                                                        designed to delete certain keywords (i.e. stopwords) in a word list,
                                                        or complete lines in a list, that contain these keywords. In full-
                                                        text it will delete whole paragraphs containing the keyword (or
                                                        substrings).

                                                        Also metacharacters in the keyword list are escaped now (e.g.,
                                                        replace ? with \?).

                                                        H=Delete Keywords
                                                        ^!SetScreenUpdate Off
                                                        ^!SetHintInfo Working...
                                                        ; Save clipboard, and restore it later on (recommended by Sheri)
                                                        ^!ClipBoardSave
                                                        ; Store the index of active document
                                                        ^!Set %Doc%=^$GetDocIndex$
                                                        ; Choose keyword (stopword) file, case, and whole words
                                                        ^!Set %Keywords%=^?[(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                                                        File:]
                                                        ^!Set %Case%=^?[Case-sensitive search:==Yes^=(?-i)|_No^=(?i)]
                                                        ^!Set %WholeWords%=^?[Search whole words only:==Yes^=1|_No^=0]
                                                        ^!Open ^%Keywords%
                                                        ; Reverse sort of keywords (to put longer words before shorter words)
                                                        ^!Select All
                                                        ^$StrSort("^$GetSelection$";0;0;1)$
                                                        ; Escape metacharacters (next one long line)
                                                        ^!Replace "\\|\^|\!|\$|\?|\.|\*|\<|\>|\+|\(|\)|\[|\]|\{|\}|\=|\||\:"
                                                        >> "\\$0" AWRST
                                                        ; Divide document into chunks of 500 lines to meet the
                                                        ; restrictons of alternation
                                                        ^!Set %ChunkIndex%=1
                                                        ^!Jump 1

                                                        :Loop_1
                                                        ^!Select 500
                                                        ^!Toolbar Copy
                                                        ; Make alternation by replacing NL with vertical bar
                                                        ^!SetClipboard ^$StrReplace(^%NL%;|;^$GetClipboard$;0;0)$
                                                        ; Remove vertical bar at end of string to avoid empty
                                                        ; alternative; note: (A|B|) matches A or B or anything.
                                                        ; You may do the same at start of string, or watch empty lines
                                                        ; at the start of keyword list
                                                        ^!IfSame "^$StrCopyRight(^$GetClipboard$;1)$" "|" Next Else Skip
                                                        ^!SetClipboard ^$StrDeleteRight(^$GetClipboard$;1)$
                                                        ; Save chunks in variables %Chunk1%, %Chunk2%, etc.
                                                        ^!Set %Chunk^%ChunkIndex%%=^$GetClipboard$
                                                        ^!Jump +1
                                                        ^!If ^$GetRow$=^$GetLineCount$ Replace
                                                        ^!Inc %ChunkIndex%
                                                        ^!Goto Loop_1

                                                        :Replace
                                                        ; Return to active document
                                                        ^!SetDocIndex ^%Doc%
                                                        ; Close keyword file and copy active document to new document
                                                        ^!Close ^%Keywords% Discard
                                                        ^!Menu Edit/Copy All
                                                        ^!Menu Edit/Paste New
                                                        ^!Set %RepIndex%=1

                                                        :Loop_2
                                                        ^!If ^%RepIndex% > ^%ChunkIndex% Finish
                                                        ; Grab %Chunk1%, %Chunk2%, etc. for search
                                                        ^!Set %Search%=^%Chunk^%RepIndex%%
                                                        ; If "whole words", use word delimiters in RegEx; lookarounds
                                                        ; prevent hyphenated words from being deleted
                                                        ^!IfTrue ^%WholeWords% Next Else Skip_2
                                                        ^!Set %Expr%="^%Case%^.*\b(?<![-])(^%Search%)(?![-])\b.*(\r\n|\z)"
                                                        ^!Goto Skip
                                                        ^!Set %Expr%="^%Case%^.*(^%Search%).*(\r\n|\z)"
                                                        ; Check syntax of RegEx
                                                        ^!IfRegExOK "^%Expr%" Next Else Message
                                                        ; Delete matching words and lines
                                                        ^!Replace "^%Expr%" >> "" AWRS
                                                        ^!Inc %RepIndex%
                                                        ^!Goto Loop_2

                                                        :Finish
                                                        ^!Info Finished!
                                                        ^!ClipBoardRestore
                                                        ^!Goto End

                                                        :Message
                                                        ^!Prompt ^$GetRegexErrorMsg$
                                                        ; end of clip


                                                        The clip prevents terms hyphenated with - (ANSI 45) from being
                                                        deleted by substrings, e.g. "self" would not delete "self-catering"
                                                        (unless you choose deleting of substrings).

                                                        Regarding apostrophes and compound nouns with space I've been on the
                                                        wrong track. This issue is much more complicated, and I don't think
                                                        it could be solved by a general RegEx that would match all
                                                        eventualities. The apostrophe, for example, is used in a company name
                                                        like "McDonald's". This name will be deleted by a substring "Mc", and
                                                        by "McDonald" defined as a whole word as well since the apostrophe is
                                                        interpreted as a word delimiter. On the other hand, it indicates the
                                                        genitive of a lemma that possibly should be deleted, e.g. "Dickens'
                                                        works".

                                                        Another idea is to process the source file with the following clip
                                                        before running the "Delete Keywords" clip (of course, it also may be
                                                        integrated into "Delete Keywords").

                                                        Look at the following company names...

                                                        McDonald's
                                                        General Electric
                                                        Bank of America

                                                        In order to protect these names from being deleted
                                                        by "McDonald", "electric", or "bank", the Protect Keywords clip
                                                        replaces the apostrophe and space with _apo_ and _spc_ (even more
                                                        characters may be added that function as word delimiters). Thus the
                                                        names are interpreted as whole words. After running "Delete Keywords"
                                                        we can reverse this replacement.

                                                        First of all, you have to create a PROTECT.TXT file that contains a
                                                        list of terms like those three company names mentioned above.

                                                        Please note that "Protect Keywords" is meant to be run on the source
                                                        file, not on the keyword (or stopword) list!


                                                        H=Protect Keywords
                                                        ^!SetScreenUpdate Off
                                                        ^!SetHintInfo Working...
                                                        ^!Goto=^?[Choose action:==Protect Words^=Protect|Remove
                                                        Protection^=Remove]

                                                        :Protect
                                                        ^!Set %Doc%=^$GetDocIndex$
                                                        ; Choose the list of words to be protected, e.g. PROTECT.TXT
                                                        ^!Set %ProFile%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Protected
                                                        List:}
                                                        ^!Open ^%ProFile%
                                                        ^!Jump Doc_End
                                                        ^!IfFalse ^$IsEmpty(^$GetLine$)$ Next Else Skip
                                                        ^!InsertText ^%NL%
                                                        ^!Set %LineIndex%=^$GetTextLineCount$

                                                        :Loop_1
                                                        ^!Jump ^%LineIndex%
                                                        ^!SetClipboard ^$StrReplace("'";"_apo_";"^$GetLine$";0;0)$
                                                        ^!SetClipboard ^$StrReplace("^%Space%";"_spc_";"^$GetClipboard$";0;0)$
                                                        ^!Jump Line_End
                                                        ^!InsertText "^P^$GetClipboard$"
                                                        ^!If ^%LineIndex%=1 Replace
                                                        ^!Dec %LineIndex%
                                                        ^!Goto Loop_1

                                                        :Replace
                                                        ^!Select All
                                                        ^!SetListDelimiter ^p
                                                        ^!SetArray %Except%=^$GetSelection$
                                                        ^!SetDocIndex ^%Doc%
                                                        ^!Close ^%ProFile% Discard
                                                        ^!Jump 1
                                                        ^!Set %Count%=1

                                                        :Loop_2
                                                        ^!If ^%Count%=^%Except0% End
                                                        ^!Set %Search%="^%Except^%Count%%"
                                                        ^!Inc %Count%
                                                        ^!Set %Repl%="^%Except^%Count%%"
                                                        ^!Replace "^%Search%" >> "^%Repl%" AWRS
                                                        ^!Inc %Count%
                                                        ^!Goto Loop_2

                                                        :Remove
                                                        ^!Replace "_spc_" >> "^%Space%" AWST
                                                        ^!Replace "_apo_" >> "'" AWST

                                                        :End
                                                        ^!Info Finished!


                                                        Regards,
                                                        Flo
                                                         
                                                      • ebbtidalflats
                                                        Hi Flo, I m curious about a line in your clips, where you replace the text in the document with ^$StrSort. I see what you re doing, but am wondering why you
                                                        Message 27 of 30 , Jun 28, 2007
                                                        View Source
                                                        • 0 Attachment
                                                          Hi Flo,

                                                          I'm curious about a line in your clips, where you replace the text in
                                                          the document with ^$StrSort.

                                                          I see what you're doing, but am wondering why you chose the function,
                                                          rather than the menu command?

                                                          ^!Menu Modify/Lines/Sort/Descending

                                                          to select and sort all in one step, instead of using three different
                                                          functions.

                                                          > ^!Select All
                                                          > ^$StrSort("^$GetSelection$";0;0;1)$

                                                          Also, why sort the short words to the bottom? I know you put a lot of
                                                          effort into this, but didn't the original poster's (who we havn't
                                                          heard from for some time) example call for finding partial words? If
                                                          so, wouldn't finding the partials speed up the search by eliminating a
                                                          lot of lines from the search for the longer words?

                                                          Just curious.


                                                          One more Question. Do you have a specific use in mind for this keyword
                                                          manipulation? Is this a comparison of two keyword lists, or what? Or
                                                          was this just a clipcoding exercise?


                                                          Thanks,


                                                          Eb
                                                        • Flo
                                                          ... Eb, ... The menu command follows the settings in Options | Tools . ^$StrSort$ allows to define the sorting independently of these settings. ... This has
                                                          Message 28 of 30 , Jun 29, 2007
                                                          View Source
                                                          • 0 Attachment
                                                            --- In ntb-clips@yahoogroups.com, "ebbtidalflats" <ebbtidalflats@...>
                                                            wrote:
                                                            >
                                                            > Hi Flo,
                                                            >
                                                            > I'm curious about a line in your clips,...

                                                            Eb,

                                                            > ...why you chose the function, rather than the menu command?

                                                            The menu command follows the settings in "Options | Tools".
                                                            ^$StrSort$ allows to define the sorting independently of these
                                                            settings.

                                                            > Also, why sort the short words to the bottom?

                                                            This has been described by Sheri before. Sheri also explained why
                                                            this isn't really necessary when running the clip on word lists and
                                                            lines.

                                                            > wouldn't finding the partials speed up the search

                                                            I think it isn't a matter of speed, and the difference would scarcely
                                                            be measurable. What really matters is what you want to achieve.
                                                            That's why you can choose substrings or whole words.

                                                            > Do you have a specific use in mind for this keyword
                                                            > manipulation? Is this a comparison of two keyword lists, or
                                                            > what?

                                                            One use, I suppose, has sufficiently been described (protection of
                                                            certain terms and word forms from being deleted by substrings). There
                                                            are many more applications I could think of. Why not comparing two
                                                            word lists, e.g. by subtracting list A from list B in order to get
                                                            the difference? For me, dealing with word lists is mainly related to
                                                            Text Retrieval and indexing of text databases, and NT has become an
                                                            indispensable tool in this field.

                                                            Several members have contributed to this thread. I just tried to find
                                                            out how these proposals could be integrated into this clip. It isn't
                                                            more than a box of building blocks. Maybe you could pick out some
                                                            ideas matching your own needs...

                                                            Flo
                                                             
                                                          • ebbtidalflats
                                                            Flo, ... I asked, because that approach is counter to the original request. Not that there was a whole lot of input from the requester. However, he did furnish
                                                            Message 29 of 30 , Jun 30, 2007
                                                            View Source
                                                            • 0 Attachment
                                                              Flo,

                                                              --- In ntb-clips@yahoogroups.com, "Flo" <flo.gehrke@...> wrote:
                                                              >
                                                              > > Also, why sort the short words to the bottom?
                                                              >
                                                              > This has been described by Sheri before. Sheri also explained why
                                                              > this isn't really necessary when running the clip on word lists and
                                                              > lines.

                                                              I asked, because that approach is counter to the original request.
                                                              Not that there was a whole lot of input from the requester.

                                                              However, he did furnish an example, that specifically searched for
                                                              partial words. Hence my curiosity.


                                                              > are many more applications I could think of. Why not comparing two
                                                              > word lists, e.g. by subtracting list A from list B in order to get
                                                              > the difference?

                                                              Ahh! Good idea.

                                                              > For me, dealing with word lists is mainly related to
                                                              > Text Retrieval and indexing of text databases, and NT has become an
                                                              > indispensable tool in this field.

                                                              Hm, mine is more in the area of glossaries, but NT is just as
                                                              indispensable to me.


                                                              Thanks for your comments.


                                                              Eb
                                                            Your message has been successfully submitted and would be delivered to recipients shortly.