Loading ...
Sorry, an error occurred while loading the content.

Creation of clip

Expand Messages
  • idisnick
    Can you help me create this clip, I can t figure it out. I have a long list, and a second list of keywords, I would like to have NoteTab take the keywords list
    Message 1 of 30 , Jun 13, 2007
    View Source
    • 0 Attachment
      Can you help me create this clip, I can't figure it out.
      I have a long list, and a second list of keywords,
      I would like to have NoteTab take the keywords list and search the
      first list for them and parse out all the lines (entire lines) that
      contain those keywods. If you want a fee to do this let me know. Thanks.
    • Sheri
      ... How many keywords? If not more than a few hundred could possibly use something like this (uses regular expression matching). ^!Setlistdelimiter ^P ;next is
      Message 2 of 30 , Jun 14, 2007
      View Source
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, "idisnick" <idisnick@...> wrote:
        >
        > Can you help me create this clip, I can't figure it out.
        > I have a long list, and a second list of keywords,
        > I would like to have NoteTab take the keywords list and search the
        > first list for them and parse out all the lines (entire lines) that
        > contain those keywods. If you want a fee to do this let me know. Thanks.
        >

        How many keywords? If not more than a few hundred could possibly use
        something like this (uses regular expression matching).

        ^!Setlistdelimiter ^P
        ;next is one long line
        ^!Set
        %linesout%=^$GetDocMatchAll("(?-i)^.*(comprehensive|switch|system).*^%dollar%";0)$
        ;end long line
        ^!Toolbar New Document
        ^!InsertText ^%linesout%

        Run on with the What's New Text file showing and lines containing any
        of those three words get pasted to a new buffer.

        Does not work in versions of NoteTab earlier than 5.1.

        The (?-i) part is causing the search to be case sensitive. If you want
        it to be case insensitive, use (?i) instead. If you leave it off, it
        seems to be defaulting to (?i) which is not what I expected.

        Regards,
        Sheri
      • buralex@gmail.com
        idisnick said on 06/13/2007 5:47:08 PM -0400 ... We had a long discussion about this at the beginning of the year (search the list for
        Message 3 of 30 , Jun 14, 2007
        View Source
        • 0 Attachment
          "idisnick" <idisnick@...> said on 06/13/2007 5:47:08 PM -0400
          > Can you help me create this clip, I can't figure it out.
          > I have a long list, and a second list of keywords,
          > I would like to have NoteTab take the keywords list and search the
          > first list for them and parse out all the lines (entire lines) that
          > contain those keywods. If you want a fee to do this let me know. Thanks.
          We had a long discussion about this at the beginning of the year (search
          the list for "concordance"). It has a working clip.

          An open-source program to do this is: TextStat.
          > "TextSTAT - Free concordance software for Windows and Linux
          > TextSTAT is a simple programme for the analysis of texts. It reads
          > ASCII/ANSI texts (in different encodings) and HTML files (directly
          > from the internet) and ...
          > www.niederlandistik.fu-berlin.de/textstat/software-en.html"
          > http://www.google.com/search?q=TextStat&sourceid=navclient-ff&ie=UTF-8&rls=GGGL,GGGL:2006-39,GGGL:en
          > <http://www.google.com/search?q=TextStat&sourceid=navclient-ff&ie=UTF-8&rls=GGGL,GGGL:2006-39,GGGL:en>

          Regards ... Alec -- buralex-gmail
          --
        • Flo
          I understand that idisnick is working with a keyword list. So I would propose to extend Sheri s concept as follows... ^!Set %Doc%=^$GetDocIndex$ ^!Set
          Message 4 of 30 , Jun 15, 2007
          View Source
          • 0 Attachment
            I understand that idisnick is working with a keyword list. So I would
            propose to extend Sheri's concept as follows...


            ^!Set %Doc%=^$GetDocIndex$
            ^!Set %Keywords%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
            File:}
            ^!Open ^%Keywords%
            ; Remove empty lines in keyword list
            ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
            ; Put keywords into alternation
            ^!Replace "\r\n" >> "|" AWRS
            ; Remove "empty alternation" at list end
            ^!Replace "\|\Z" >> "" AWRS
            ^!Select All
            ^!Set %Search%=^$GetSelection$
            ^!Close ^%Keywords% Discard
            ^!SetDocIndex ^%Doc%
            ^!SetListDelimiter ^P
            ^!Set %linesout%=^$GetDocMatchAll("(?-i)^.*(^%Search%).*^%dollar%";0)$
            ^!Toolbar New Document
            ^!InsertText ^%linesout%

            Example: The active document is...

            Froggy Frog
            Lives in a well
            If you want him
            Pull the bell.

            The keyword list is...

            bell
            Frog
            well

            The clip outputs line #1, 2, and 4. May be it needs some improvements
            to work with bigger texts and word lists ;-)

            Flo
             
          • idisnick
            Hi, I saved this as a clb file and added it to my library bar, but I can t get it to work, it doesn t ask for my keyword list. Can you tell me how it works.
            Message 5 of 30 , Jun 19, 2007
            View Source
            • 0 Attachment
              Hi,
              I saved this as a clb file and added it to my library bar, but I
              can't get it to work, it' doesn't ask for my keyword list. Can you
              tell me how it works. THanks!




              --- In ntb-clips@yahoogroups.com, "Flo" <flo.gehrke@...> wrote:
              >
              > I understand that idisnick is working with a keyword list. So I
              would
              > propose to extend Sheri's concept as follows...
              >
              >
              > ^!Set %Doc%=^$GetDocIndex$
              > ^!Set %Keywords%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
              > File:}
              > ^!Open ^%Keywords%
              > ; Remove empty lines in keyword list
              > ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
              > ; Put keywords into alternation
              > ^!Replace "\r\n" >> "|" AWRS
              > ; Remove "empty alternation" at list end
              > ^!Replace "\|\Z" >> "" AWRS
              > ^!Select All
              > ^!Set %Search%=^$GetSelection$
              > ^!Close ^%Keywords% Discard
              > ^!SetDocIndex ^%Doc%
              > ^!SetListDelimiter ^P
              > ^!Set %linesout%=^$GetDocMatchAll("(?-i)^.*(^%Search%).*^%
              dollar%";0)$
              > ^!Toolbar New Document
              > ^!InsertText ^%linesout%
              >
              > Example: The active document is...
              >
              > Froggy Frog
              > Lives in a well
              > If you want him
              > Pull the bell.
              >
              > The keyword list is...
              >
              > bell
              > Frog
              > well
              >
              > The clip outputs line #1, 2, and 4. May be it needs some
              improvements
              > to work with bigger texts and word lists ;-)
              >
              > Flo
              >  
              >
            • Don - HtmlFixIt.com
              ... ^!Set %Keywords%=^?{(T=O;F= Textfiles (*.txt)|*.txt )Choose Keyword File:} Most clips when sent via email need to be unwrapped if long lines get wrapped.
              Message 6 of 30 , Jun 19, 2007
              View Source
              • 0 Attachment
                idisnick wrote:
                > Hi,
                > I saved this as a clb file and added it to my library bar, but I
                > can't get it to work, it' doesn't ask for my keyword list. Can you
                > tell me how it works. THanks!
                >

                ^!Set %Keywords%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                File:}


                Most clips when sent via email need to be "unwrapped" if long lines get
                wrapped.

                The above line for example was wrapped for me ^^^

                I had to unwrap it.

                Lines should for the most part start with either a ^! or a ; in most clips.
              • idisnick
                Thanks, I can test it now. This clip is actually doing the opposite of what I want though. I wanted it to parse out all lines thay contain the keywords, and
                Message 7 of 30 , Jun 19, 2007
                View Source
                • 0 Attachment
                  Thanks, I can test it now. This clip is actually doing the opposite
                  of what I want though. I wanted it to parse out all lines thay
                  contain the keywords, and instead it is keeping all the line that
                  have the keywords, and deleting all the other lines.



                  --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...>
                  wrote:
                  >
                  > idisnick wrote:
                  > > Hi,
                  > > I saved this as a clb file and added it to my library bar, but I
                  > > can't get it to work, it' doesn't ask for my keyword list. Can
                  you
                  > > tell me how it works. THanks!
                  > >
                  >
                  > ^!Set %Keywords%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                  > File:}
                  >
                  >
                  > Most clips when sent via email need to be "unwrapped" if long lines
                  get
                  > wrapped.
                  >
                  > The above line for example was wrapped for me ^^^
                  >
                  > I had to unwrap it.
                  >
                  > Lines should for the most part start with either a ^! or a ; in
                  most clips.
                  >
                • Don - HtmlFixIt.com
                  Give this a try (note one long line): ;clip to remove lines that do contain keywords ;keywords go in a file with one keyword per line ;start with the file to
                  Message 8 of 30 , Jun 19, 2007
                  View Source
                  • 0 Attachment
                    Give this a try (note one long line):
                    ;clip to remove lines that do contain keywords
                    ;keywords go in a file with one keyword per line
                    ;start with the file to be parsed open
                    ;whacked at by don at htmlfixit dot com
                    ;used regex proposed by pat

                    :StartOfClip
                    ;long line follows
                    ^!Set %Keywords%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword File:}
                    ;long line preceeds

                    :GetKeywords
                    ^!Open ^%Keywords%
                    ; Remove empty lines in keyword list
                    ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                    ; Put keywords into alternation
                    ^!Replace "\r\n" >> "|" AWRS
                    ; Remove "empty alternation" at list end
                    ^!Replace "\|\Z" >> "" AWRS
                    ^!Select All
                    ^!Set %Search%=^$GetSelection$
                    ^!Close Discard

                    ;be sure we are back on proper document
                    ^!Set %Doc%=^$GetDocIndex$
                    ^!SetWordWrap Off
                    ^!Jump Doc_Start
                    ^!SetDebug On


                    :Loop
                    ^!Select Eol
                    ^!Find ".*(^%Search%).*" TIHRS
                    ;this deletes line if contains keyword
                    ^!IfError Continue
                    ^!InsertText ^%EMPTY%
                    ^!Goto Loop

                    :Continue
                    ;finish if end of file is reached
                    ^!If ^$GetRow$ = ^$GetLinecount$ Finish
                    ;this moves to next line if keyword not found
                    ^!Jump +1
                    ^!Goto Loop

                    :Finish
                    ;clean up empty line at start if exists
                    ^!Jump Doc_Start
                    ^!Select Eol
                    ^!If "getselection" = "" Skip_2
                    ^!SelectTo 2:1
                    ^!InsertText ^%EMPTY%

                    :EmptyLinesOut
                    ;clean out all empty lines after deletion
                    ^!Replace "^P^P" >> "^P" ACIWS
                    ^!IfError End
                  • Don - HtmlFixIt.com
                    Ok that was not right ... try this one instead ... ;clip to remove lines that do contain keywords ;keywords go in a file with one keyword per line ;start with
                    Message 9 of 30 , Jun 19, 2007
                    View Source
                    • 0 Attachment
                      Ok that was not right ... try this one instead ...

                      ;clip to remove lines that do contain keywords
                      ;keywords go in a file with one keyword per line
                      ;start with the file to be parsed open
                      ;whacked at by don at htmlfixit dot com
                      ;used regex proposed by pat

                      :StartOfClip
                      ;long line follows
                      ^!Set %Keywords%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword File:}
                      ;long line preceeds
                      ^!Set %Doc%=^$GetDocIndex$

                      :GetKeywords
                      ^!Open ^%Keywords%
                      ; Remove empty lines in keyword list
                      ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                      ; Put keywords into alternation
                      ^!Replace "\r\n" >> "|" AWRS
                      ; Remove "empty alternation" at list end
                      ^!Replace "\|\Z" >> "" AWRS
                      ^!Select All
                      ^!Set %Search%=^$GetSelection$
                      ^!Close Discard

                      ;be sure we are back on proper document
                      ^!SetDocIndex ^%Doc%
                      ^!SetWordWrap Off
                      ^!Jump Doc_Start


                      :Loop
                      ^!Select Eol
                      ^!Find ".*(^%Search%).*" TIHRS
                      ;this deletes line if contains keyword
                      ^!IfError Continue
                      ^!InsertText ^%EMPTY%
                      ^!Goto Loop

                      :Continue
                      ;finish if end of file is reached
                      ^!If ^$GetRow$ = ^$GetLinecount$ Finish
                      ;this moves to next line if keyword not found
                      ^!Jump +1
                      ^!Goto Loop

                      :Finish
                      ;clean up any empty line(s) at start if exist(s)
                      ^!Jump Doc_Start
                      ^!Select Eol
                      ^!If "^$GetSelection$" <> "" EmptyLinesOut
                      ^!SelectTo 2:1
                      ^!InsertText ^%EMPTY%
                      ^!Goto Finish

                      :EmptyLinesOut
                      ;clean out all empty lines after deletion
                      ^!Replace "^P^P" >> "^P" ACIWS
                      ^!IfError End
                      ^!Goto EmptyLinesOut
                    • idisnick
                      Hi, It worked but it left empty spaces unlike the last one. Can you make it so it closes the space? Also it took a few minutes to do a list of 24,000, any way
                      Message 10 of 30 , Jun 19, 2007
                      View Source
                      • 0 Attachment
                        Hi,
                        It worked but it left empty spaces unlike the last one. Can you make
                        it so it closes the space? Also it took a few minutes to do a list of
                        24,000, any way to speed it up?
                        Thanks!



                        --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...>
                        wrote:
                        >
                        > Give this a try (note one long line):
                        > ;clip to remove lines that do contain keywords
                        > ;keywords go in a file with one keyword per line
                        > ;start with the file to be parsed open
                        > ;whacked at by don at htmlfixit dot com
                        > ;used regex proposed by pat
                        >
                        > :StartOfClip
                        > ;long line follows
                        > ^!Set %Keywords%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                        File:}
                        > ;long line preceeds
                        >
                        > :GetKeywords
                        > ^!Open ^%Keywords%
                        > ; Remove empty lines in keyword list
                        > ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                        > ; Put keywords into alternation
                        > ^!Replace "\r\n" >> "|" AWRS
                        > ; Remove "empty alternation" at list end
                        > ^!Replace "\|\Z" >> "" AWRS
                        > ^!Select All
                        > ^!Set %Search%=^$GetSelection$
                        > ^!Close Discard
                        >
                        > ;be sure we are back on proper document
                        > ^!Set %Doc%=^$GetDocIndex$
                        > ^!SetWordWrap Off
                        > ^!Jump Doc_Start
                        > ^!SetDebug On
                        >
                        >
                        > :Loop
                        > ^!Select Eol
                        > ^!Find ".*(^%Search%).*" TIHRS
                        > ;this deletes line if contains keyword
                        > ^!IfError Continue
                        > ^!InsertText ^%EMPTY%
                        > ^!Goto Loop
                        >
                        > :Continue
                        > ;finish if end of file is reached
                        > ^!If ^$GetRow$ = ^$GetLinecount$ Finish
                        > ;this moves to next line if keyword not found
                        > ^!Jump +1
                        > ^!Goto Loop
                        >
                        > :Finish
                        > ;clean up empty line at start if exists
                        > ^!Jump Doc_Start
                        > ^!Select Eol
                        > ^!If "getselection" = "" Skip_2
                        > ^!SelectTo 2:1
                        > ^!InsertText ^%EMPTY%
                        >
                        > :EmptyLinesOut
                        > ;clean out all empty lines after deletion
                        > ^!Replace "^P^P" >> "^P" ACIWS
                        > ^!IfError End
                        >
                      • idisnick
                        This one seems to work great! ... File:}
                        Message 11 of 30 , Jun 19, 2007
                        View Source
                        • 0 Attachment
                          This one seems to work great!

                          --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...>
                          wrote:
                          >
                          > Ok that was not right ... try this one instead ...
                          >
                          > ;clip to remove lines that do contain keywords
                          > ;keywords go in a file with one keyword per line
                          > ;start with the file to be parsed open
                          > ;whacked at by don at htmlfixit dot com
                          > ;used regex proposed by pat
                          >
                          > :StartOfClip
                          > ;long line follows
                          > ^!Set %Keywords%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                          File:}
                          > ;long line preceeds
                          > ^!Set %Doc%=^$GetDocIndex$
                          >
                          > :GetKeywords
                          > ^!Open ^%Keywords%
                          > ; Remove empty lines in keyword list
                          > ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                          > ; Put keywords into alternation
                          > ^!Replace "\r\n" >> "|" AWRS
                          > ; Remove "empty alternation" at list end
                          > ^!Replace "\|\Z" >> "" AWRS
                          > ^!Select All
                          > ^!Set %Search%=^$GetSelection$
                          > ^!Close Discard
                          >
                          > ;be sure we are back on proper document
                          > ^!SetDocIndex ^%Doc%
                          > ^!SetWordWrap Off
                          > ^!Jump Doc_Start
                          >
                          >
                          > :Loop
                          > ^!Select Eol
                          > ^!Find ".*(^%Search%).*" TIHRS
                          > ;this deletes line if contains keyword
                          > ^!IfError Continue
                          > ^!InsertText ^%EMPTY%
                          > ^!Goto Loop
                          >
                          > :Continue
                          > ;finish if end of file is reached
                          > ^!If ^$GetRow$ = ^$GetLinecount$ Finish
                          > ;this moves to next line if keyword not found
                          > ^!Jump +1
                          > ^!Goto Loop
                          >
                          > :Finish
                          > ;clean up any empty line(s) at start if exist(s)
                          > ^!Jump Doc_Start
                          > ^!Select Eol
                          > ^!If "^$GetSelection$" <> "" EmptyLinesOut
                          > ^!SelectTo 2:1
                          > ^!InsertText ^%EMPTY%
                          > ^!Goto Finish
                          >
                          > :EmptyLinesOut
                          > ;clean out all empty lines after deletion
                          > ^!Replace "^P^P" >> "^P" ACIWS
                          > ^!IfError End
                          > ^!Goto EmptyLinesOut
                          >
                        • Flo
                          ... Try another one. In a test, it parsed 16.000 lines within a few seconds. This will remind some members of an earlier discussion (see Removing stopwords
                          Message 12 of 30 , Jun 20, 2007
                          View Source
                          • 0 Attachment
                            --- In ntb-clips@yahoogroups.com, "idisnick" <idisnick@...> wrote:
                            > Hi,...
                            > Also it took a few minutes to do a list of
                            > 24,000, any way to speed it up?

                            Try another one. In a test, it "parsed" 16.000 lines within a few
                            seconds. This will remind some members of an earlier discussion
                            (see "Removing stopwords from word list")...;-)

                            Flo


                            ^!Set %Doc%=^$GetDocIndex$
                            ^!Set %Keywords%=^?[(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                            File:]
                            ^!Open ^%Keywords%
                            ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                            ^!Replace "\r\n" >> "|" AWRS
                            ^!Replace "\|\Z" >> "" AWRS
                            ^!Select All
                            ^!Set %Search%=^$GetSelection$
                            ^!Close ^%Keywords% Discard
                            ^!SetDocIndex ^%Doc%
                            ^!SetListDelimiter ^%Space%^P
                            ^!Set %linesout%=^$GetDocMatchAll("^.*(^%Search%).*^%Dollar%";0)$^%
                            Space%
                            ^!Menu Edit/Copy All
                            ^!Toolbar Paste New
                            ^!Jump Doc_End
                            ^!IfFalse ^$IsEmpty(^$GetLine$)$ Next Else Skip
                            ^!Keyboard Enter
                            ^!Select All
                            ^!Menu Modify/Lines/Trim Blanks
                            ^!Jump Doc_End
                            ^%linesout%^%NL%
                            ^!Select All
                            ^$StrSort("^$GetSelection$";0;1;1)$
                            ^!Replace "^(.+)\r\n\1 (\r\n)" >> "" AWIRS
                            ^!Replace "^(.+) \r\n" >> "" AWRS
                            ^!Info Finished!
                            ;end of clip
                          • Jeff Scism
                            You should probably avoid the Keyboard ENTER and other Keyboard commands, (They do not work for everyone, it seems) also things move along quicker with
                            Message 13 of 30 , Jun 20, 2007
                            View Source
                            • 0 Attachment
                              You should probably avoid the Keyboard ENTER and other Keyboard
                              commands, (They do not work for everyone, it seems) also things move
                              along quicker with ^!SetScreenupdate OFF.

                              Flo wrote:
                              >
                              > --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com>,
                              > "idisnick" <idisnick@...> wrote:
                              > > Hi,...
                              > > Also it took a few minutes to do a list of
                              > > 24,000, any way to speed it up?
                              >
                              > Try another one. In a test, it "parsed" 16.000 lines within a few
                              > seconds. This will remind some members of an earlier discussion
                              > (see "Removing stopwords from word list")...;-)
                              >
                              > Flo
                              >
                              > ^!Set %Doc%=^$GetDocIndex$
                              > ^!Set %Keywords%=^?[(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                              > File:]
                              > ^!Open ^%Keywords%
                              > ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                              > ^!Replace "\r\n" >> "|" AWRS
                              > ^!Replace "\|\Z" >> "" AWRS
                              > ^!Select All
                              > ^!Set %Search%=^$GetSelection$
                              > ^!Close ^%Keywords% Discard
                              > ^!SetDocIndex ^%Doc%
                              > ^!SetListDelimiter ^%Space%^P
                              > ^!Set %linesout%=^$GetDocMatchAll("^.*(^%Search%).*^%Dollar%";0)$^%
                              > Space%
                              > ^!Menu Edit/Copy All
                              > ^!Toolbar Paste New
                              > ^!Jump Doc_End
                              > ^!IfFalse ^$IsEmpty(^$GetLine$)$ Next Else Skip
                              > ^!Keyboard Enter
                              > ^!Select All
                              > ^!Menu Modify/Lines/Trim Blanks
                              > ^!Jump Doc_End
                              > ^%linesout%^%NL%
                              > ^!Select All
                              > ^$StrSort("^$GetSelection$";0;1;1)$
                              > ^!Replace "^(.+)\r\n\1 (\r\n)" >> "" AWIRS
                              > ^!Replace "^(.+) \r\n" >> "" AWRS
                              > ^!Info Finished!
                              > ;end of clip
                              >
                              >
                            • Flo
                              ... move ... OK, Jeff. So we better add ^!SetScreenUpdate Off , and replace ^! Keyboard Enter with ^!InsertText ^%NL% . I also added another prompt. Now
                              Message 14 of 30 , Jun 21, 2007
                              View Source
                              • 0 Attachment
                                --- In ntb-clips@yahoogroups.com, Jeff Scism <Scismgenie@...> wrote:
                                >
                                > You should probably avoid the Keyboard ENTER and other Keyboard
                                > commands, (They do not work for everyone, it seems) also things
                                move
                                > along quicker with ^!SetScreenupdate OFF.

                                OK, Jeff. So we better add "^!SetScreenUpdate Off", and replace "^!
                                Keyboard Enter" with "^!InsertText ^%NL%".

                                I also added another prompt. Now you can choose case-sensitive
                                search, or ignore the case.


                                ^!SetScreenUpdate Off
                                ^!SetHintInfo Working...
                                ^!Set %Doc%=^$GetDocIndex$
                                ^!Set %Keywords%=^?[(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                                File:]
                                ^!Set %Case%=^?[Case-sensitive search:==Yes^=(?-i)|_No^=(?i)]
                                ^!Open ^%Keywords%
                                ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                                ^!Replace "\r\n" >> "|" AWRS
                                ^!Replace "\|\Z" >> "" AWRS
                                ^!Select All
                                ^!Set %Search%=^$GetSelection$
                                ^!Close ^%Keywords% Discard
                                ^!SetDocIndex ^%Doc%
                                ^!SetListDelimiter ^%Space%^P
                                ^!Set %linesout%=^$GetDocMatchAll("^%Case%^.*(^%Search%).*^%
                                Dollar%";0)$^%Space%
                                ^!Set %linesout%=^$GetDocMatchAll("^.*(^%Search%).*^%Dollar%";0)$^%
                                Space%
                                ^!Menu Edit/Copy All
                                ^!Toolbar Paste New
                                ^!Jump Doc_End
                                ^!IfFalse ^$IsEmpty(^$GetLine$)$ Next Else Skip
                                ^!InsertText ^%NL%
                                ^!Select All
                                ^!Menu Modify/Lines/Trim Blanks
                                ^!Jump Doc_End
                                ^%linesout%^%NL%
                                ^!Select All
                                ^$StrSort("^$GetSelection$";0;1;1)$
                                ^!Replace "^(.+)\r\n\1 (\r\n)" >> "" AWIRS
                                ^!Replace "^(.+) \r\n" >> "" AWRS
                                ^!Info Finished!
                                ; end of clip


                                So far, there is only one problem with this clip: In a test, it
                                worked fine with 250 keywords, but it failed with 16.000. See my
                                reply to Sheri in this thread...

                                Flo
                                 
                              • Flo
                                Sheri wrote... ... In fact, the alternation to be used with ^$GetDocMatchAll$ seems to be limited. When testing this with a file of 250 keywords, and a text of
                                Message 15 of 30 , Jun 21, 2007
                                View Source
                                • 0 Attachment
                                  Sheri wrote...

                                  > How many keywords? If not more than a few hundred could
                                  > possibly use something like this (uses regular expression
                                  > matching).
                                  >
                                  > ^!Setlistdelimiter ^P ;next is one long line ^!Set
                                  > %linesout%=^$GetDocMatchAll("(?-
                                  > i)^.*(comprehensive|switch|system).*^%dollar%";0)$ ;end long
                                  > line ^!Toolbar New Document ^!InsertText ^%linesout%

                                  In fact, the alternation to be used with ^$GetDocMatchAll$ seems to
                                  be limited. When testing this with a file of 250 keywords, and a text
                                  of 16,000 lines, it works fine. It fails when taking those 250
                                  keywords as text, and 16.000 words as keywords. NT5 reacts with the
                                  message...

                                  "Regex error: internal error: overran compiling workspace".

                                  (You may test it with those files at http://flogehrke.homepage.t-
                                  online.de/491/ntf-wordlist.zip we used for testing another clip some
                                  month ago.)

                                  Is this limitation definable in any way?

                                  Flo
                                   
                                • Sheri
                                  ... Hi Flo, I don t think it is definable per se. You could test generated patterns in clips with ^!IfRegexOK. You can retrieve the error message (if not ok)
                                  Message 16 of 30 , Jun 21, 2007
                                  View Source
                                  • 0 Attachment
                                    Flo wrote:
                                    > Sheri wrote...
                                    >
                                    >
                                    >> How many keywords? If not more than a few hundred could
                                    >> possibly use something like this (uses regular expression
                                    >> matching).
                                    >>
                                    >> ^!Setlistdelimiter ^P ;next is one long line ^!Set
                                    >> %linesout%=^$GetDocMatchAll("(?-
                                    >> i)^.*(comprehensive|switch|system).*^%dollar%";0)$ ;end long
                                    >> line ^!Toolbar New Document ^!InsertText ^%linesout%
                                    >>
                                    >
                                    > In fact, the alternation to be used with ^$GetDocMatchAll$ seems to
                                    > be limited. When testing this with a file of 250 keywords, and a text
                                    > of 16,000 lines, it works fine. It fails when taking those 250
                                    > keywords as text, and 16.000 words as keywords. NT5 reacts with the
                                    > message...
                                    >
                                    > "Regex error: internal error: overran compiling workspace".
                                    >
                                    > (You may test it with those files at http://flogehrke.homepage.t-
                                    > online.de/491/ntf-wordlist.zip we used for testing another clip some
                                    > month ago.)
                                    >
                                    > Is this limitation definable in any way?
                                    >
                                    > Flo
                                    >
                                    >
                                    >
                                    Hi Flo,

                                    I don't think it is definable per se. You could test generated patterns
                                    in clips with ^!IfRegexOK. You can retrieve the error message (if not
                                    ok) with ^$GetRegexErrorMsg$. A clip could possibly take corrective
                                    action for some errors (like reducing the number of alternatives to
                                    processed at one time).

                                    PCRE 7.2 was just released, and it says it corrected this:

                                    "A pattern with a very large number of alternatives (more than several
                                    hundred) was running out of internal workspace during the pre-compile
                                    phase, where pcre_compile() figures out how much memory will be needed.
                                    A bit of new cunning has reduced the workspace needed for groups with
                                    alternatives. The 1000-alternative test pattern now uses 12 bytes of
                                    workspace instead of running out of the 4096 that are available."

                                    I don't think it will be too long before NoteTab incorporates the
                                    update. However, there are other factors besides "internal workspace"
                                    that affect how many alternatives will work. When working on the stop
                                    list clip, I remember an error message that the regular expression was
                                    "too long". In one of the stop list clips, I applied the keywords in
                                    approximately 10K chunks and that worked at that time (think it was pcre
                                    6.7 then).

                                    Regards,
                                    Sheri
                                  • paulmaser
                                    You could probably replace the first two lines below with one command, that would look something like this: ^!Replace ( r n)+ | AWRS
                                    Message 17 of 30 , Jun 21, 2007
                                    View Source
                                    • 0 Attachment
                                      You could probably replace the first two lines below with one command,
                                      that would look something like this:
                                      ^!Replace "(\r\n)+" >> "|" AWRS


                                      > ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                                      > ^!Replace "\r\n" >> "|" AWRS
                                      > ^!Replace "\|\Z" >> "" AWRS
                                    • Sheri
                                      ... Hi again, I haven t been following this thread in detail, but if he just wants to remove lines having a keyword, wouldn t it be better to use a replace
                                      Message 18 of 30 , Jun 21, 2007
                                      View Source
                                      • 0 Attachment
                                        --- In ntb-clips@yahoogroups.com, "Flo" <flo.gehrke@...> wrote:
                                        >
                                        > Sheri wrote...
                                        >
                                        > > How many keywords? If not more than a few hundred could
                                        > > possibly use something like this (uses regular expression
                                        > > matching).
                                        > >
                                        > > ^!Setlistdelimiter ^P ;next is one long line ^!Set
                                        > > %linesout%=^$GetDocMatchAll("(?-
                                        > > i)^.*(comprehensive|switch|system).*^%dollar%";0)$ ;end long
                                        > > line ^!Toolbar New Document ^!InsertText ^%linesout%
                                        >
                                        > In fact, the alternation to be used with ^$GetDocMatchAll$ seems to
                                        > be limited. When testing this with a file of 250 keywords, and a text
                                        > of 16,000 lines, it works fine. It fails when taking those 250
                                        > keywords as text, and 16.000 words as keywords. NT5 reacts with the
                                        > message...
                                        >
                                        > "Regex error: internal error: overran compiling workspace".
                                        >
                                        > (You may test it with those files at http://flogehrke.homepage.t-
                                        > online.de/491/ntf-wordlist.zip we used for testing another clip some
                                        > month ago.)
                                        >
                                        > Is this limitation definable in any way?
                                        >
                                        > Flo
                                        >
                                        >

                                        Hi again,

                                        I haven't been following this thread in detail, but if he just wants
                                        to remove lines having a keyword, wouldn't it be better to use a
                                        replace command (replacing keyword lines with "") instead of using
                                        getdocmatchall?

                                        Seems to me the stop word task was more complicated because you wanted
                                        to not only delete lines matching a stop word, but also eliminate
                                        duplicates that were not stop words.

                                        Using ^!Replace all(s) would be fast (though you still have to keep
                                        your alternates lists reasonably sized for PCRE).

                                        Regards,
                                        Sheri
                                      • Flo
                                        ... command, ... Thanks, Paul. You are right. ^!Replace ( r n)+ | AWRS will do the job. By the way: The clip wouldn t even need to open and to process
                                        Message 19 of 30 , Jun 21, 2007
                                        View Source
                                        • 0 Attachment
                                          "paulmaser" <paul@...> wrote:
                                          >
                                          > You could probably replace the first two lines below with one
                                          command,
                                          > that would look something like this:
                                          > ^!Replace "(\r\n)+" >> "|" AWRS
                                          >
                                          >
                                          > > ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                                          > > ^!Replace "\r\n" >> "|" AWRS
                                          > > ^!Replace "\|\Z" >> "" AWRS

                                          Thanks, Paul. You are right. "^!Replace "(\r\n)+" >> "|" AWRS" will
                                          do the job.

                                          By the way: The clip wouldn't even need to open and to process the
                                          keyword list if we make sure from the outset that it doesn't contain
                                          any empty lines. Thus we could replace all the lines from "^!Open ^%
                                          Keywords%" to "^!Close ^%Keywords% Discard" with...

                                          ^!SetClipboard ^$GetFileText(^%Keywords%)$
                                          ^!SetClipboard=^$StrReplace(^%NL%;|;^$GetClipboard$;0;0)$
                                          ^!Set %Search%=^$GetClipboard$

                                          This could speed up the clip even more ;-)

                                          Flo
                                           
                                        • Flo
                                          ... command, ... Thanks, Paul. You are right. ^!Replace ( r n)+ | AWRS will do the job. By the way: The clip wouldn t even need to open and to process
                                          Message 20 of 30 , Jun 21, 2007
                                          View Source
                                          • 0 Attachment
                                            "paulmaser" <paul@...> wrote:
                                            >
                                            > You could probably replace the first two lines below with one
                                            command,
                                            > that would look something like this:
                                            > ^!Replace "(\r\n)+" >> "|" AWRS
                                            >
                                            >
                                            > > ^!Replace "(\r\n){2,}" >> "\r\n" AWRS
                                            > > ^!Replace "\r\n" >> "|" AWRS
                                            > > ^!Replace "\|\Z" >> "" AWRS

                                            Thanks, Paul. You are right. "^!Replace "(\r\n)+" >> "|" AWRS" will
                                            do the job.

                                            By the way: The clip wouldn't even need to open and to process the
                                            keyword list if we make sure from the outset that it doesn't contain
                                            any empty lines. Thus we could replace all the lines from "^!Open ^%
                                            Keywords%" to "^!Close ^%Keywords% Discard" with...

                                            ^!SetClipboard ^$GetFileText(^%Keywords%)$
                                            ^!SetClipboard=^$StrReplace(^%NL%;|;^$GetClipboard$;0;0)$
                                            ^!Set %Search%=^$GetClipboard$

                                            This could speed up the clip even more ;-)

                                            Flo
                                             
                                          • Flo
                                            Thanks for that information, Sheri! I remember those 10K chunks . Members who want to read up on that issue - it s in message # 15213 (see ^!Select
                                            Message 21 of 30 , Jun 21, 2007
                                            View Source
                                            • 0 Attachment
                                              Thanks for that information, Sheri!

                                              I remember those "10K chunks". Members who want to read up on that
                                              issue - it's in message # 15213 (see ^!Select +10000...).

                                              Flo
                                               
                                            • Flo
                                              Sheri wrote... ... Indeed - why not this way... ^!SetScreenUpdate Off ^!SetHintInfo Working... ^!Set %Doc%=^$GetDocIndex$ ^!Set %Keywords%=^?[(T=O;F= Textfiles
                                              Message 22 of 30 , Jun 21, 2007
                                              View Source
                                              • 0 Attachment
                                                Sheri wrote...

                                                > I haven't been following this thread in detail, but if he just wants
                                                > to remove lines having a keyword, wouldn't it be better to use a
                                                > replace command (replacing keyword lines with "") instead of using
                                                > getdocmatchall?

                                                Indeed - why not this way...


                                                ^!SetScreenUpdate Off
                                                ^!SetHintInfo Working...
                                                ^!Set %Doc%=^$GetDocIndex$
                                                ^!Set %Keywords%=^?[(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                                                File:]
                                                ^!Set %Case%=^?[Case-sensitive search:==Yes^=(?-i)|_No^=(?i)]
                                                ^!Open ^%Keywords%
                                                ^!Replace "(\r\n)+" >> "|" AWRS
                                                ^!Replace "\|\Z" >> "" AWRS
                                                ^!Replace "\A\|" >> "" AWRS
                                                ^!Set %Search%=^$GetText$
                                                ^!Close ^%Keywords% Discard
                                                ^!SetDocIndex ^%Doc%
                                                ^!Menu Edit/Copy All
                                                ^!Menu Edit/Paste New
                                                ^!Replace "^%Case%^.*(^%Search%).*\r\n" >> "" AWRS
                                                ^!Info Finished!


                                                Regards,
                                                Flo
                                                 
                                              • Sheri
                                                ... Great! If interested in making further improvements, here are a few more enhancements to consider. When a clip makes use of the clipboard, its nice to
                                                Message 23 of 30 , Jun 22, 2007
                                                View Source
                                                • 0 Attachment
                                                  --- In ntb-clips@yahoogroups.com, "Flo" <flo.gehrke@...> wrote:
                                                  >
                                                  > Sheri wrote...
                                                  >
                                                  > > I haven't been following this thread in detail, but if he just wants
                                                  > > to remove lines having a keyword, wouldn't it be better to use a
                                                  > > replace command (replacing keyword lines with "") instead of using
                                                  > > getdocmatchall?
                                                  >
                                                  > Indeed - why not this way...
                                                  >
                                                  >
                                                  > ^!SetScreenUpdate Off
                                                  > ^!SetHintInfo Working...
                                                  > ^!Set %Doc%=^$GetDocIndex$
                                                  > ^!Set %Keywords%=^?[(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                                                  > File:]
                                                  > ^!Set %Case%=^?[Case-sensitive search:==Yes^=(?-i)|_No^=(?i)]
                                                  > ^!Open ^%Keywords%
                                                  > ^!Replace "(\r\n)+" >> "|" AWRS
                                                  > ^!Replace "\|\Z" >> "" AWRS
                                                  > ^!Replace "\A\|" >> "" AWRS
                                                  > ^!Set %Search%=^$GetText$
                                                  > ^!Close ^%Keywords% Discard
                                                  > ^!SetDocIndex ^%Doc%
                                                  > ^!Menu Edit/Copy All
                                                  > ^!Menu Edit/Paste New
                                                  > ^!Replace "^%Case%^.*(^%Search%).*\r\n" >> "" AWRS
                                                  > ^!Info Finished!
                                                  >
                                                  >
                                                  > Regards,
                                                  > Flo
                                                  >
                                                  >

                                                  Great! If interested in making further improvements, here are a few
                                                  more enhancements to consider.

                                                  When a clip makes use of the clipboard, its nice to restore its
                                                  original contents at the end.

                                                  You are closing the keyword document, before navigating to the
                                                  original document. You need to be sure the keyword document was not
                                                  already open when the clip was started. If it gets closed from a lower
                                                  docindex than the starting document, you would not return to the
                                                  original document when you set your docindex. You'd have to navigate
                                                  to the original docindex and then close discard the keywords document.

                                                  Normally it would be a good idea to reverse sort alternates when
                                                  constructing a regular expression, but since whole lines containing
                                                  alternates are being deleted, in this case that wouldn't make any
                                                  difference. The reason they should normally be reverse sorted is,
                                                  alternates are searched from left to right. If there's a keyword "be"
                                                  and a keyword "before", "be|before" will never find "before" in the
                                                  text. Using \b's before and after the alternates would also work, if
                                                  the keywords are meant to be whole words only.

                                                  If there are any characters that might get interpreted by the regex
                                                  engine as metacharacters in the keyword document, they should be
                                                  escaped with a backslash prior to using them in the alternates.

                                                  When constructing a regular expression with code, its probably a good
                                                  idea to check ^!IfRegexOK before using the expression in a "real"
                                                  statement. If there is an error, you'd have an opportunity to show a
                                                  message and still do clean up tasks (like restore the clipboard).

                                                  Regards,
                                                  Sheri
                                                • Flo
                                                  Hi Sheri, I m grateful to you for all these recommendations, and I tried to apply them to this clip... ... That s not given here, isn t it? But I think it
                                                  Message 24 of 30 , Jun 23, 2007
                                                  View Source
                                                  • 0 Attachment
                                                    Hi Sheri,

                                                    I'm grateful to you for all these recommendations, and I tried to
                                                    apply them to this clip...

                                                    > When a clip makes use of the clipboard, its nice to restore its
                                                    > original contents at the end.

                                                    That's not given here, isn't it? But I think it could easily be done
                                                    by saving its contents in a variable, and afterwards pasting it back
                                                    to the clipboard like...

                                                      ^!Set %Var%=^$GetClipboard$ ... ^!SetClipboard ^%Var%

                                                    > You'd have to navigate to the original docindex and then close
                                                    > discard the keywords document.

                                                    I changed the order of these command lines.

                                                    By the way: Isn't it even safer to work with the document name? Given
                                                    that the clip always gets started from the original document, we
                                                    could replace...

                                                      ^!Set %Doc%=^$GetDocIndex$^  with  ^!Set %Doc%=^GetDocName

                                                    and

                                                      ^!SetDocIndex ^%Doc%  with  ^!Open ^%Doc%

                                                    (According to the help file, I suppose that ^!Open also selects a
                                                    document that is open already.)

                                                    > Normally it would be a good idea to reverse sort alternates...

                                                    See line #8, and 9 now

                                                    > metacharacters in the keyword document...should be escaped
                                                    > with a backslash

                                                    Certainly, this would be a professional solution. In message # 15199
                                                    you created a subclip GetRegEscape that would do this job.

                                                    > its probably a good idea to check ^!IfRegexOK before using the
                                                    > expression in a "real" statement.

                                                    I hope I've done it the right way.

                                                    > Using \b's before and after the alternates would also work, if
                                                    > the keywords are meant to be whole words only.

                                                    This has been added too.

                                                    In addition to that, I've combined the \b's with a negative
                                                    lookbehind and lookahead. They do not allow certain characters before
                                                    or behind a search word that is being treated as a whole word. This
                                                    is mainly aiming at words hyphenated with - (ANSI 45) and the
                                                    apostrophe ' (ANSI 39). For example: If "McDonald" is defined as a
                                                    keyword it normally matches "McDonald's" too even if embraced with \b
                                                    since - and ' are interpreted as word delimiters. Consequently, the
                                                    clip would delete a line like...

                                                        "eating a hamburger at McDonald's"

                                                    although it isn't really matched by "McDonald" as a whole word.
                                                    Or "self-service" would be matched by "self" and "service" as well
                                                    although they possibly are regarded as substrings of "self-service"
                                                    only. It depends, of course, on the way you look at "lexical
                                                    problems" like that, and also on the sort of text to be processed.
                                                    Certainly, this construction needs some more testing...

                                                    How to deal with compound nouns written with a space (ANSI 32)? For
                                                    example: "Express" would delete "American Express" although we
                                                    possibly don't regard it as a match of that compound. The only
                                                    solution I can see for that is to enter "American Express" with a
                                                    protected space (ANSI 160) in order to distinguish it from the normal
                                                    space (ANSI 32). With regard to this, we could extend the Lookarounds
                                                    with \xA0 in order to match ANSI 160. Maybe there's a better solution
                                                    (or even more problems)...

                                                    Regards,
                                                    Flo


                                                    ^!SetScreenUpdate Off
                                                    ^!SetHintInfo Working...
                                                    ^!Set %Doc%=^$GetDocIndex$
                                                    ^!Set %Keywords%=^?[(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                                                    File:]
                                                    ^!Set %Case%=^?[Case-sensitive search:==Yes^=(?-i)|_No^=(?i)]
                                                    ^!Set %Substr%=^?[Search whole words only:==Yes^=1|_No^=0]
                                                    ^!Open ^%Keywords%
                                                    ^!Select All
                                                    ^$StrSort("^$GetSelection$";0;0;1)$
                                                    ^!Replace "(\r\n)+" >> "|" AWRS
                                                    ^!Replace "\|\Z" >> "" AWRS
                                                    ^!Replace "\A\|" >> "" AWRS
                                                    ^!Set %Search%=^$GetText$
                                                    ^!SetDocIndex ^%Doc%
                                                    ^!Close ^%Keywords% Discard
                                                    ^!IfTrue ^%Substr% Next Else Skip_2
                                                    ;^!Set %Expr%="^%Case%^.*\b(^%Search%)\b.*\r\n"
                                                    ; start of long line
                                                    ^!Set %Expr%="^%Case%^.*\b(?<![[:punct:]])(^%Search%)(?![[:punct:]])
                                                    \b.*\r\n"
                                                    ; end of long line
                                                    ^!Goto Skip
                                                    ^!Set %Expr%="^%Case%^.*(^%Search%).*\r\n"
                                                    ; Try next line for testing RegEx error ;-)
                                                    ;^!Set %Expr%="[[:punkt:]]+"
                                                    ^!IfRegExOK "^%Expr%" Next Else Message
                                                    ^!Menu Edit/Copy All
                                                    ^!Menu Edit/Paste New
                                                    ^!Replace "^%Expr%" >> "" AWRS
                                                    ^!Info Finished!
                                                    ^!Goto End

                                                    :Message
                                                    ^!Prompt ^$GetRegexErrorMsg$
                                                  • Sheri
                                                    Hi Flo, ... Well you do ^!Menu Edit/Copy All near the end so you can paste the result to a new document. As is, that ends up remaining on the clipboard after
                                                    Message 25 of 30 , Jun 24, 2007
                                                    View Source
                                                    • 0 Attachment
                                                      Hi Flo,

                                                      --- In ntb-clips@yahoogroups.com, "Flo" <flo.gehrke@...> wrote:
                                                      >
                                                      > I'm grateful to you for all these recommendations, and I tried to
                                                      > apply them to this clip...
                                                      >
                                                      > > When a clip makes use of the clipboard, its nice to restore its
                                                      > > original contents at the end.
                                                      >
                                                      > That's not given here, isn't it?

                                                      Well you do "^!Menu Edit/Copy All" near the end so you can paste the
                                                      result to a new document. As is, that ends up remaining on the
                                                      clipboard after the clip has finished.

                                                      > But I think it could easily be done by saving its contents in a
                                                      > variable, and afterwards pasting it back to the clipboard like..
                                                      >
                                                      > ^!Set %Var%=^$GetClipboard$ ... ^!SetClipboard ^%Var%

                                                      See ^!ClipboardSave and ^!ClipboardRestore

                                                      >
                                                      > > You'd have to navigate to the original docindex and then close
                                                      > > discard the keywords document.
                                                      >
                                                      > I changed the order of these command lines.
                                                      >
                                                      > By the way: Isn't it even safer to work with the document name?
                                                      > Given that the clip always gets started from the original
                                                      > document, we could replace...

                                                      >
                                                      > ^!Set %Doc%=^$GetDocIndex$^ with ^!Set %Doc%=^GetDocName
                                                      >
                                                      > and
                                                      >
                                                      > ^!SetDocIndex ^%Doc% with ^!Open ^%Doc%

                                                      Yes, that should work. But then NoteTab has to find the docindex,
                                                      maybe slightly faster if you save and restore the docindex yourself.

                                                      >
                                                      > (According to the help file, I suppose that ^!Open also selects a
                                                      > document that is open already.)
                                                      >
                                                      > > Normally it would be a good idea to reverse sort alternates...
                                                      >
                                                      > See line #8, and 9 now
                                                      >
                                                      > > metacharacters in the keyword document...should be escaped
                                                      > > with a backslash
                                                      >
                                                      > Certainly, this would be a professional solution. In message # 15199
                                                      > you created a subclip GetRegEscape that would do this job.

                                                      Since you're using a document buffer, you could use a single ^!Replace
                                                      to replace any metacharacters (alternates -- be sure to escape them)
                                                      with "\\$0"; the GetRegEscape clip approach is necessary only when
                                                      acting on a string instead of a document. There is currently no
                                                      provision in NoteTab to do regex string operations.

                                                      >
                                                      > > its probably a good idea to check ^!IfRegexOK before using the
                                                      > > expression in a "real" statement.
                                                      >
                                                      > I hope I've done it the right way.

                                                      Haven't tried it, but it looks good to me :)

                                                      I haven't made use of classes like punct before myself, so you're
                                                      blazing a trail :)

                                                      >
                                                      > > Using \b's before and after the alternates would also work, if
                                                      > > the keywords are meant to be whole words only.
                                                      >
                                                      > This has been added too.
                                                      >
                                                      > In addition to that, I've combined the \b's with a negative
                                                      > lookbehind and lookahead. They do not allow certain characters
                                                      > before or behind a search word that is being treated as a whole
                                                      > word. This is mainly aiming at words hyphenated with - (ANSI 45)
                                                      > and the apostrophe ' (ANSI 39). For example: If "McDonald" is
                                                      > defined as a keyword it normally matches "McDonald's" too even if
                                                      > embraced with \b
                                                      > since - and ' are interpreted as word delimiters. Consequently, the
                                                      > clip would delete a line like...
                                                      >
                                                      > "eating a hamburger at McDonald's"
                                                      >
                                                      > although it isn't really matched by "McDonald" as a whole word.
                                                      > Or "self-service" would be matched by "self" and "service" as well
                                                      > although they possibly are regarded as substrings of "self-service"
                                                      > only. It depends, of course, on the way you look at "lexical
                                                      > problems" like that, and also on the sort of text to be processed.
                                                      > Certainly, this construction needs some more testing...

                                                      > How to deal with compound nouns written with a space (ANSI 32)?
                                                      > For example: "Express" would delete "American Express" although
                                                      > we possibly don't regard it as a match of that compound. The only
                                                      > solution I can see for that is to enter "American Express" with a
                                                      > protected space (ANSI 160) in order to distinguish it from the
                                                      > normal space (ANSI 32). With regard to this, we could extend the
                                                      > Lookarounds with \xA0 in order to match ANSI 160. Maybe there's a
                                                      > better solution (or even more problems)...

                                                      Hmn, you bring up some interersting points. "American Express" would
                                                      be its own keyword as would "Express". In the case of the "Express"
                                                      alternate, it could use a negative look behind, to make sure it it not
                                                      preceded by "American\x20". Obviously would require some fine tuning
                                                      of the keywords or alternates before applying them to customize them
                                                      to that extent.

                                                      Regards,
                                                      Sheri

                                                      >
                                                      >
                                                      > ^!SetScreenUpdate Off
                                                      > ^!SetHintInfo Working...
                                                      > ^!Set %Doc%=^$GetDocIndex$
                                                      > ^!Set %Keywords%=^?[(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                                                      > File:]
                                                      > ^!Set %Case%=^?[Case-sensitive search:==Yes^=(?-i)|_No^=(?i)]
                                                      > ^!Set %Substr%=^?[Search whole words only:==Yes^=1|_No^=0]
                                                      > ^!Open ^%Keywords%
                                                      > ^!Select All
                                                      > ^$StrSort("^$GetSelection$";0;0;1)$
                                                      > ^!Replace "(\r\n)+" >> "|" AWRS
                                                      > ^!Replace "\|\Z" >> "" AWRS
                                                      > ^!Replace "\A\|" >> "" AWRS
                                                      > ^!Set %Search%=^$GetText$
                                                      > ^!SetDocIndex ^%Doc%
                                                      > ^!Close ^%Keywords% Discard
                                                      > ^!IfTrue ^%Substr% Next Else Skip_2
                                                      > ;^!Set %Expr%="^%Case%^.*\b(^%Search%)\b.*\r\n"
                                                      > ; start of long line
                                                      > ^!Set %Expr%="^%Case%^.*\b(?<![[:punct:]])(^%Search%)(?![[:punct:]])
                                                      > \b.*\r\n"
                                                      > ; end of long line
                                                      > ^!Goto Skip
                                                      > ^!Set %Expr%="^%Case%^.*(^%Search%).*\r\n"
                                                      > ; Try next line for testing RegEx error ;-)
                                                      > ;^!Set %Expr%="[[:punkt:]]+"
                                                      > ^!IfRegExOK "^%Expr%" Next Else Message
                                                      > ^!Menu Edit/Copy All
                                                      > ^!Menu Edit/Paste New
                                                      > ^!Replace "^%Expr%" >> "" AWRS
                                                      > ^!Info Finished!
                                                      > ^!Goto End
                                                      >
                                                      > :Message
                                                      > ^!Prompt ^$GetRegexErrorMsg$
                                                      >
                                                    • hsavage
                                                      ... tried to apply them to this clip... ... Flo, If you re insistent about restoring the clipboard to its previous state after running a clip you might want to
                                                      Message 26 of 30 , Jun 25, 2007
                                                      View Source
                                                      • 0 Attachment
                                                        Flo wrote:
                                                        > Hi Sheri,
                                                        >
                                                        > I'm grateful to you for all these recommendations, and I
                                                        tried to apply them to this clip...
                                                        >
                                                        >> When a clip makes use of the clipboard, its nice to
                                                        >> restore its original contents at the end.
                                                        >
                                                        > That's not given here, isn't it? But I think it could
                                                        >> easily be done by saving its contents in a variable, and
                                                        >> afterwards pasting it back to the clipboard like...
                                                        >
                                                        > ^!Set %Var%=^$GetClipboard$ ... ^!SetClipboard ^%Var%

                                                        Flo,

                                                        If you're insistent about restoring the clipboard to its previous state
                                                        after running a clip you might want to check into the following 2 clip
                                                        commands.

                                                        ^!ClipBoardSave
                                                        ^!ClipBoardRestore [+]


                                                        ºvº SL-6-199 -created- 2007.06.25 - 19.48.24

                                                        "Party Etiquette; Drinking Your Fair Share."
                                                        ¤ ø ¤ hrs ø hsavage@...
                                                      • Flo
                                                        The latest version of this clip splits the keyword list into chunks of 500 lines in order to meet the restrictions of the alternation. In my tests, that error
                                                        Message 27 of 30 , Jun 27, 2007
                                                        View Source
                                                        • 0 Attachment
                                                          The latest version of this clip splits the keyword list into chunks
                                                          of 500 lines in order to meet the restrictions of the alternation. In
                                                          my tests, that error message (mentioned above) appeared from 818
                                                          keywords on. Now it works with an unlimited amount of keywords. It's
                                                          designed to delete certain keywords (i.e. stopwords) in a word list,
                                                          or complete lines in a list, that contain these keywords. In full-
                                                          text it will delete whole paragraphs containing the keyword (or
                                                          substrings).

                                                          Also metacharacters in the keyword list are escaped now (e.g.,
                                                          replace ? with \?).

                                                          H=Delete Keywords
                                                          ^!SetScreenUpdate Off
                                                          ^!SetHintInfo Working...
                                                          ; Save clipboard, and restore it later on (recommended by Sheri)
                                                          ^!ClipBoardSave
                                                          ; Store the index of active document
                                                          ^!Set %Doc%=^$GetDocIndex$
                                                          ; Choose keyword (stopword) file, case, and whole words
                                                          ^!Set %Keywords%=^?[(T=O;F="Textfiles (*.txt)|*.txt")Choose Keyword
                                                          File:]
                                                          ^!Set %Case%=^?[Case-sensitive search:==Yes^=(?-i)|_No^=(?i)]
                                                          ^!Set %WholeWords%=^?[Search whole words only:==Yes^=1|_No^=0]
                                                          ^!Open ^%Keywords%
                                                          ; Reverse sort of keywords (to put longer words before shorter words)
                                                          ^!Select All
                                                          ^$StrSort("^$GetSelection$";0;0;1)$
                                                          ; Escape metacharacters (next one long line)
                                                          ^!Replace "\\|\^|\!|\$|\?|\.|\*|\<|\>|\+|\(|\)|\[|\]|\{|\}|\=|\||\:"
                                                          >> "\\$0" AWRST
                                                          ; Divide document into chunks of 500 lines to meet the
                                                          ; restrictons of alternation
                                                          ^!Set %ChunkIndex%=1
                                                          ^!Jump 1

                                                          :Loop_1
                                                          ^!Select 500
                                                          ^!Toolbar Copy
                                                          ; Make alternation by replacing NL with vertical bar
                                                          ^!SetClipboard ^$StrReplace(^%NL%;|;^$GetClipboard$;0;0)$
                                                          ; Remove vertical bar at end of string to avoid empty
                                                          ; alternative; note: (A|B|) matches A or B or anything.
                                                          ; You may do the same at start of string, or watch empty lines
                                                          ; at the start of keyword list
                                                          ^!IfSame "^$StrCopyRight(^$GetClipboard$;1)$" "|" Next Else Skip
                                                          ^!SetClipboard ^$StrDeleteRight(^$GetClipboard$;1)$
                                                          ; Save chunks in variables %Chunk1%, %Chunk2%, etc.
                                                          ^!Set %Chunk^%ChunkIndex%%=^$GetClipboard$
                                                          ^!Jump +1
                                                          ^!If ^$GetRow$=^$GetLineCount$ Replace
                                                          ^!Inc %ChunkIndex%
                                                          ^!Goto Loop_1

                                                          :Replace
                                                          ; Return to active document
                                                          ^!SetDocIndex ^%Doc%
                                                          ; Close keyword file and copy active document to new document
                                                          ^!Close ^%Keywords% Discard
                                                          ^!Menu Edit/Copy All
                                                          ^!Menu Edit/Paste New
                                                          ^!Set %RepIndex%=1

                                                          :Loop_2
                                                          ^!If ^%RepIndex% > ^%ChunkIndex% Finish
                                                          ; Grab %Chunk1%, %Chunk2%, etc. for search
                                                          ^!Set %Search%=^%Chunk^%RepIndex%%
                                                          ; If "whole words", use word delimiters in RegEx; lookarounds
                                                          ; prevent hyphenated words from being deleted
                                                          ^!IfTrue ^%WholeWords% Next Else Skip_2
                                                          ^!Set %Expr%="^%Case%^.*\b(?<![-])(^%Search%)(?![-])\b.*(\r\n|\z)"
                                                          ^!Goto Skip
                                                          ^!Set %Expr%="^%Case%^.*(^%Search%).*(\r\n|\z)"
                                                          ; Check syntax of RegEx
                                                          ^!IfRegExOK "^%Expr%" Next Else Message
                                                          ; Delete matching words and lines
                                                          ^!Replace "^%Expr%" >> "" AWRS
                                                          ^!Inc %RepIndex%
                                                          ^!Goto Loop_2

                                                          :Finish
                                                          ^!Info Finished!
                                                          ^!ClipBoardRestore
                                                          ^!Goto End

                                                          :Message
                                                          ^!Prompt ^$GetRegexErrorMsg$
                                                          ; end of clip


                                                          The clip prevents terms hyphenated with - (ANSI 45) from being
                                                          deleted by substrings, e.g. "self" would not delete "self-catering"
                                                          (unless you choose deleting of substrings).

                                                          Regarding apostrophes and compound nouns with space I've been on the
                                                          wrong track. This issue is much more complicated, and I don't think
                                                          it could be solved by a general RegEx that would match all
                                                          eventualities. The apostrophe, for example, is used in a company name
                                                          like "McDonald's". This name will be deleted by a substring "Mc", and
                                                          by "McDonald" defined as a whole word as well since the apostrophe is
                                                          interpreted as a word delimiter. On the other hand, it indicates the
                                                          genitive of a lemma that possibly should be deleted, e.g. "Dickens'
                                                          works".

                                                          Another idea is to process the source file with the following clip
                                                          before running the "Delete Keywords" clip (of course, it also may be
                                                          integrated into "Delete Keywords").

                                                          Look at the following company names...

                                                          McDonald's
                                                          General Electric
                                                          Bank of America

                                                          In order to protect these names from being deleted
                                                          by "McDonald", "electric", or "bank", the Protect Keywords clip
                                                          replaces the apostrophe and space with _apo_ and _spc_ (even more
                                                          characters may be added that function as word delimiters). Thus the
                                                          names are interpreted as whole words. After running "Delete Keywords"
                                                          we can reverse this replacement.

                                                          First of all, you have to create a PROTECT.TXT file that contains a
                                                          list of terms like those three company names mentioned above.

                                                          Please note that "Protect Keywords" is meant to be run on the source
                                                          file, not on the keyword (or stopword) list!


                                                          H=Protect Keywords
                                                          ^!SetScreenUpdate Off
                                                          ^!SetHintInfo Working...
                                                          ^!Goto=^?[Choose action:==Protect Words^=Protect|Remove
                                                          Protection^=Remove]

                                                          :Protect
                                                          ^!Set %Doc%=^$GetDocIndex$
                                                          ; Choose the list of words to be protected, e.g. PROTECT.TXT
                                                          ^!Set %ProFile%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Choose Protected
                                                          List:}
                                                          ^!Open ^%ProFile%
                                                          ^!Jump Doc_End
                                                          ^!IfFalse ^$IsEmpty(^$GetLine$)$ Next Else Skip
                                                          ^!InsertText ^%NL%
                                                          ^!Set %LineIndex%=^$GetTextLineCount$

                                                          :Loop_1
                                                          ^!Jump ^%LineIndex%
                                                          ^!SetClipboard ^$StrReplace("'";"_apo_";"^$GetLine$";0;0)$
                                                          ^!SetClipboard ^$StrReplace("^%Space%";"_spc_";"^$GetClipboard$";0;0)$
                                                          ^!Jump Line_End
                                                          ^!InsertText "^P^$GetClipboard$"
                                                          ^!If ^%LineIndex%=1 Replace
                                                          ^!Dec %LineIndex%
                                                          ^!Goto Loop_1

                                                          :Replace
                                                          ^!Select All
                                                          ^!SetListDelimiter ^p
                                                          ^!SetArray %Except%=^$GetSelection$
                                                          ^!SetDocIndex ^%Doc%
                                                          ^!Close ^%ProFile% Discard
                                                          ^!Jump 1
                                                          ^!Set %Count%=1

                                                          :Loop_2
                                                          ^!If ^%Count%=^%Except0% End
                                                          ^!Set %Search%="^%Except^%Count%%"
                                                          ^!Inc %Count%
                                                          ^!Set %Repl%="^%Except^%Count%%"
                                                          ^!Replace "^%Search%" >> "^%Repl%" AWRS
                                                          ^!Inc %Count%
                                                          ^!Goto Loop_2

                                                          :Remove
                                                          ^!Replace "_spc_" >> "^%Space%" AWST
                                                          ^!Replace "_apo_" >> "'" AWST

                                                          :End
                                                          ^!Info Finished!


                                                          Regards,
                                                          Flo
                                                           
                                                        • ebbtidalflats
                                                          Hi Flo, I m curious about a line in your clips, where you replace the text in the document with ^$StrSort. I see what you re doing, but am wondering why you
                                                          Message 28 of 30 , Jun 28, 2007
                                                          View Source
                                                          • 0 Attachment
                                                            Hi Flo,

                                                            I'm curious about a line in your clips, where you replace the text in
                                                            the document with ^$StrSort.

                                                            I see what you're doing, but am wondering why you chose the function,
                                                            rather than the menu command?

                                                            ^!Menu Modify/Lines/Sort/Descending

                                                            to select and sort all in one step, instead of using three different
                                                            functions.

                                                            > ^!Select All
                                                            > ^$StrSort("^$GetSelection$";0;0;1)$

                                                            Also, why sort the short words to the bottom? I know you put a lot of
                                                            effort into this, but didn't the original poster's (who we havn't
                                                            heard from for some time) example call for finding partial words? If
                                                            so, wouldn't finding the partials speed up the search by eliminating a
                                                            lot of lines from the search for the longer words?

                                                            Just curious.


                                                            One more Question. Do you have a specific use in mind for this keyword
                                                            manipulation? Is this a comparison of two keyword lists, or what? Or
                                                            was this just a clipcoding exercise?


                                                            Thanks,


                                                            Eb
                                                          • Flo
                                                            ... Eb, ... The menu command follows the settings in Options | Tools . ^$StrSort$ allows to define the sorting independently of these settings. ... This has
                                                            Message 29 of 30 , Jun 29, 2007
                                                            View Source
                                                            • 0 Attachment
                                                              --- In ntb-clips@yahoogroups.com, "ebbtidalflats" <ebbtidalflats@...>
                                                              wrote:
                                                              >
                                                              > Hi Flo,
                                                              >
                                                              > I'm curious about a line in your clips,...

                                                              Eb,

                                                              > ...why you chose the function, rather than the menu command?

                                                              The menu command follows the settings in "Options | Tools".
                                                              ^$StrSort$ allows to define the sorting independently of these
                                                              settings.

                                                              > Also, why sort the short words to the bottom?

                                                              This has been described by Sheri before. Sheri also explained why
                                                              this isn't really necessary when running the clip on word lists and
                                                              lines.

                                                              > wouldn't finding the partials speed up the search

                                                              I think it isn't a matter of speed, and the difference would scarcely
                                                              be measurable. What really matters is what you want to achieve.
                                                              That's why you can choose substrings or whole words.

                                                              > Do you have a specific use in mind for this keyword
                                                              > manipulation? Is this a comparison of two keyword lists, or
                                                              > what?

                                                              One use, I suppose, has sufficiently been described (protection of
                                                              certain terms and word forms from being deleted by substrings). There
                                                              are many more applications I could think of. Why not comparing two
                                                              word lists, e.g. by subtracting list A from list B in order to get
                                                              the difference? For me, dealing with word lists is mainly related to
                                                              Text Retrieval and indexing of text databases, and NT has become an
                                                              indispensable tool in this field.

                                                              Several members have contributed to this thread. I just tried to find
                                                              out how these proposals could be integrated into this clip. It isn't
                                                              more than a box of building blocks. Maybe you could pick out some
                                                              ideas matching your own needs...

                                                              Flo
                                                               
                                                            • ebbtidalflats
                                                              Flo, ... I asked, because that approach is counter to the original request. Not that there was a whole lot of input from the requester. However, he did furnish
                                                              Message 30 of 30 , Jun 30, 2007
                                                              View Source
                                                              • 0 Attachment
                                                                Flo,

                                                                --- In ntb-clips@yahoogroups.com, "Flo" <flo.gehrke@...> wrote:
                                                                >
                                                                > > Also, why sort the short words to the bottom?
                                                                >
                                                                > This has been described by Sheri before. Sheri also explained why
                                                                > this isn't really necessary when running the clip on word lists and
                                                                > lines.

                                                                I asked, because that approach is counter to the original request.
                                                                Not that there was a whole lot of input from the requester.

                                                                However, he did furnish an example, that specifically searched for
                                                                partial words. Hence my curiosity.


                                                                > are many more applications I could think of. Why not comparing two
                                                                > word lists, e.g. by subtracting list A from list B in order to get
                                                                > the difference?

                                                                Ahh! Good idea.

                                                                > For me, dealing with word lists is mainly related to
                                                                > Text Retrieval and indexing of text databases, and NT has become an
                                                                > indispensable tool in this field.

                                                                Hm, mine is more in the area of glossaries, but NT is just as
                                                                indispensable to me.


                                                                Thanks for your comments.


                                                                Eb
                                                              Your message has been successfully submitted and would be delivered to recipients shortly.