Loading ...
Sorry, an error occurred while loading the content.
 

Removing stopwords from word list

Expand Messages
  • jonas_ramus
    Hi, I tried to develop a clip that would remove all words from a word list WL which are contained in a stopword list XL. If we have two lists... Word List WL
    Message 1 of 30 , Jul 12, 2006
      Hi,

      I tried to develop a clip that would remove all words from a word
      list WL which are contained in a stopword list XL. If we have two
      lists...

      Word List WL Stopword List XL
      A B
      B D
      B E
      D
      E
      G
      G
      H

      ...the result should be...

      A
      G
      H

      That is, the clip shouldn't delete duplicates only but it also
      should totally remove all stopwords from the word list WL.

      I managed to find the following solution which, at first glance,
      seems to work perfectly. First, you have to copy the stopword list
      XL and to paste it into the word list WL. After that, you start the
      following clip:


      ^!Menu Edit/Copy All
      ^!SetClipboard ^$StrSort("^$GetClipboard$";0;1;0)$
      ^!Select All
      ^!Toolbar Paste
      ^!ClearVariables
      ^!Jump 1

      :Loop
      ^!If ^$GetRow$=^$GetParaCount$ End Else Next
      ^!Set %A%=^$GetLine$
      ^!Jump +1
      ^!Set %B%=^$GetLine$
      ; Next line avoids an error message in case the last line is empty
      ^!IfTrue ^$IsEmpty(^$GetLine$)$ End Else Next
      ^!IfSame ^%A% ^%B% Next Else Loop
      ^!Jump -1

      :Remove
      ^!DeleteLine
      ^!Set %B%=^$GetLine$
      ^!IfTrue ^$IsEmpty(^$GetLine$)$ End Else Next
      ^!IfSame ^%A% ^%B% Remove Else Loop

      :End
      ^!Info Finished!

      However, there are the following problems:

      1. This clip doesn't work if the stopword list XL is bigger than the
      word list WL. In this case, among the stopwords that were pasted
      into word list WL, all those stopwords will survive that didn't
      exist in the WL list before because a duplicate is missing. Well,
      this could be solved by pasting the stopword list twice into WL but
      this doesn't seem to be a smart solution.

      2. I would prefer a solution that picks up both files, WL and XL,
      from the HD and outputs the result in a third file.

      Does anybody have a solution for that? Thanks for your replies!

      Flo
    • Don - HtmlFixIt.com
      first sort the word list removing any double or greater instances of a word, but leaving one instance of the word then append the stop list to the word list
      Message 2 of 30 , Jul 12, 2006
        first sort the word list removing any double or greater instances of a
        word, but leaving one instance of the word

        then append the stop list to the word list and sort it again

        now remove all instances of any duplicates

        when I have a minute I could do it as a clip, but that is how I would do it
      • Sheri
        Think this will do it for you Flo, but you would need to install the new Notetab version 5 beta. Without the second clip (which is called from the first clip)
        Message 3 of 30 , Jul 12, 2006
          Think this will do it for you Flo, but you would need to install the
          new Notetab version 5 beta.

          Without the second clip (which is called from the first clip) it
          wouldn't work if any of the word lines happen to have symbols in them
          which have special meaning in regular expressions.

          Beware of wrapped lines. Every line should begin with a caret, colon
          or semicolon. I'm afraid Yahoo will wrap every line in the second
          clip.

          I've assumed each "word" is on its own line, in each file. Hope
          that's right. It processes whole lines (trimmed of any spaces at the
          end of the line) so there could be spaces within these "words". It is
          case insensitive.

          Let me know how it works out!

          Regards,
          Sheri

          H="Word List"
          ;2006-07-12 Created by Sheri Pierce
          ;Requires NoteTab version 5 beta
          ^!Set %xlfile%=^?{(T=O)Browse to xl list=^%xlfile%}
          ^!Open ^%xlfile%
          ^!SetWordWrap Off
          ^!IfError Error
          ^!Select All
          ^!Menu Modify/Lines/Trim Blanks
          ^!Set %xl%="^$StrTrim(^$GetText$)$"
          ^!Set %xl%="^$StrSort("^%xl%";No;No;Yes)$"
          ^!Set %xl%="^$StrReplace("^%NL%";"|";"^%xl%";No;No)$"
          ^!Set %myvarname%="xl"
          ^!Clip GetRegEscape ^%xl%
          ^!Set %xl%="^$StrReplace("\|";"|";"^%xl%";No;No)$"
          ^!Close Discard
          ^!Toolbar New Document
          ^!Set %wordfile%=^?{(T=O)Browse to word list=^%wordfile%}
          ^!InsertFile ^%wordfile%
          ^!IfError Error
          ^!SetWordWrap Off
          ^!Select All
          ^!Menu Modify/Lines/Trim Blanks
          ^!Replace "(?si)(^%xl%)\s+" >> "" RAWS
          ^!ClearVariable %xl%
          ^!Set %word%="^$StrTrim(^$GetText$)$"
          ^!Set %word%="^$StrSort("^%word%";No;Yes;Yes)$"
          ^!Select All
          ^!InsertText ^%word%
          ^!ClearVariable %word%
          ^!Info Finished List of Unique Words
          ^!Goto End
          :Error
          ^!Prompt Unable to open file
          ;end of clip

          H="GetRegEscape"
          ;2006-07-12 Created by Sheri Pierce
          ;To use this clip, calling clip must set the name of variable to be
          returned
          ;e.g., Set %myvarname%="keyword"
          ;then ^!Set %keyword%="string to be escaped"
          ;call using ^!Clip GetRegEscape ^%keyword%
          ;the value in %keyword% after ^!Clip will be the escaped string
          ^!IfTrue ^$IsEmpty(^%myvarname%)$ ErrorNoName
          ;A variable name can be made up of any character except those that
          delimit words (space, tab, punctuation, carriage return, etc.).
          ;^!IfMatch "\W" "^%myvarname" ErrorInvalidName
          ^!Set %^%myvarname%%=^&
          ^!Set %^%myvarname%%=^$StrReplace("\";"\\";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace("^";"\^";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace("!";"\!";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace("$";"\$";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace("?";"\?";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace(".";"\.";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace("*";"\*";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace("<";"\<";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace(">";"\>";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace("+";"\+";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace("(";"\(";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace(")";"\)";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace("[";"\[";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace("]";"\]";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace("{";"\{";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace("}";"\}";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace("=";"\=";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace("|";"\|";"^%^%myvarname%
          %";False;False)$
          ^!Set %^%myvarname%%=^$StrReplace(":";"\:";"^%^%myvarname%
          %";False;False)$
          ^!ClearVariable %myvarname%
          ^!Goto End
          :ErrorNoName
          ^!Prompt called clip ^$GetClipName$ failed because variable %
          myvarname% must be set before calling.
          ^!Goto Exit
          :ErrorInvalidName
          ^!Prompt The called clip ^$GetClipName$ failed because variable %
          myvarname% contained illegal characters.
          ^!Goto Exit
          ;end of clip
        • Sheri
          Think this will do it for you Flo, but you would need to install the new Notetab version 5 beta. Without the second clip (which is called from the first clip)
          Message 4 of 30 , Jul 12, 2006
            Think this will do it for you Flo, but you would need to install the
            new Notetab version 5 beta.

            Without the second clip (which is called from the first clip) it
            wouldn't work if any of the word lines happen to have symbols in them
            which have special meaning in regular expressions.

            Beware of wrapped lines. Every line should begin with a caret, colon
            or semicolon. I'm afraid Yahoo will wrap every line in the second
            clip.

            I've assumed each "word" is on its own line, in each file. Hope
            that's right. It processes whole lines (trimmed of any spaces at the
            end of the line) so there could be spaces within these "words". It is
            case insensitive.

            Let me know how it works out!

            Regards,
            Sheri

            H="Word List"
            ;2006-07-12 Created by Sheri Pierce
            ;Requires NoteTab version 5 beta
            ^!Set %xlfile%=^?{(T=O)Browse to xl list=^%xlfile%}
            ^!Open ^%xlfile%
            ^!SetWordWrap Off
            ^!IfError Error
            ^!Select All
            ^!Menu Modify/Lines/Trim Blanks
            ^!Set %xl%="^$StrTrim(^$GetText$)$"
            ^!Set %xl%="^$StrSort("^%xl%";No;No;Yes)$"
            ^!Set %xl%="^$StrReplace("^%NL%";"|";"^%xl%";No;No)$"
            ^!Set %myvarname%="xl"
            ^!Clip GetRegEscape ^%xl%
            ^!Set %xl%="^$StrReplace("\|";"|";"^%xl%";No;No)$"
            ^!Close Discard
            ^!Toolbar New Document
            ^!Set %wordfile%=^?{(T=O)Browse to word list=^%wordfile%}
            ^!InsertFile ^%wordfile%
            ^!IfError Error
            ^!SetWordWrap Off
            ^!Select All
            ^!Menu Modify/Lines/Trim Blanks
            ^!Replace "(?si)(^%xl%)\s+" >> "" RAWS
            ^!ClearVariable %xl%
            ^!Set %word%="^$StrTrim(^$GetText$)$"
            ^!Set %word%="^$StrSort("^%word%";No;Yes;Yes)$"
            ^!Select All
            ^!InsertText ^%word%
            ^!ClearVariable %word%
            ^!Info Finished List of Unique Words
            ^!Goto End
            :Error
            ^!Prompt Unable to open file
            ;end of clip

            H="GetRegEscape"
            ;2006-07-12 Created by Sheri Pierce
            ;To use this clip, calling clip must set the name of variable to be
            returned
            ;e.g., Set %myvarname%="keyword"
            ;then ^!Set %keyword%="string to be escaped"
            ;call using ^!Clip GetRegEscape ^%keyword%
            ;the value in %keyword% after ^!Clip will be the escaped string
            ^!IfTrue ^$IsEmpty(^%myvarname%)$ ErrorNoName
            ;A variable name can be made up of any character except those that
            delimit words (space, tab, punctuation, carriage return, etc.).
            ;^!IfMatch "\W" "^%myvarname" ErrorInvalidName
            ^!Set %^%myvarname%%=^&
            ^!Set %^%myvarname%%=^$StrReplace("\";"\\";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace("^";"\^";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace("!";"\!";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace("$";"\$";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace("?";"\?";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace(".";"\.";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace("*";"\*";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace("<";"\<";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace(">";"\>";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace("+";"\+";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace("(";"\(";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace(")";"\)";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace("[";"\[";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace("]";"\]";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace("{";"\{";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace("}";"\}";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace("=";"\=";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace("|";"\|";"^%^%myvarname%
            %";False;False)$
            ^!Set %^%myvarname%%=^$StrReplace(":";"\:";"^%^%myvarname%
            %";False;False)$
            ^!ClearVariable %myvarname%
            ^!Goto End
            :ErrorNoName
            ^!Prompt called clip ^$GetClipName$ failed because variable %
            myvarname% must be set before calling.
            ^!Goto Exit
            :ErrorInvalidName
            ^!Prompt The called clip ^$GetClipName$ failed because variable %
            myvarname% contained illegal characters.
            ^!Goto Exit
            ;end of clip
          • jonas_ramus
            Hi Sheri, Thanks for your clip! I tested it very thoroughly with NoteTab 5.0 Beta. With just a few lines it worked perfectly. But with a file of 16.000 words
            Message 5 of 30 , Jul 13, 2006
              Hi Sheri,

              Thanks for your clip! I tested it very thoroughly with NoteTab 5.0
              Beta. With just a few lines it worked perfectly. But with a file of
              16.000 words (i.e. lines) it failed.

              For reproducing my test, please download my wordlist.zip file from
              <a href="http://flogehrke.homepage.t-online.de/194/ntf-
              wordlist.zip">here</a>. This archive contains three files:

              1. ntf-wordlist.txt: 16.000 words (15.500 unique words plus 500
              duplicates). The 15.500 unique words begin with letters from A to Y.
              All 500 duplicates begin with Z.

              2. ntf-stopwords.txt: This file contains 250 words beginning with Z.
              These words were duplicated and inserted into ntf-wordlist.txt in
              order to get those 16.000 words in ntf-wordlist.txt.

              3. ntf-15304-uniques.txt: The result I get with your clip.

              If the clip would properly remove the 250 stopwords from the
              wordlist, we should get exactly 15.500 unique words from A to Y and
              no words beginning with Z.

              Unfortunately, I get 15.304 unique words from A to Y. The stopwords
              with Z have been removed correctly. So the difference of 15.500
              minus 15.304 = 196 words must come from an overkill among A to Y. I
              tried to detect that overkill by subtracting the result from the
              15.500 original words, but I couldn't find any rule that might cause
              the overkill.

              Surprisingly, the simple clip I posted to the forum, outputs exactly
              those 15.500 words from A to Y without Z that represent the correct
              result. Since I run that clip with NoteTab Pro v.4.95, there may be
              some problems with 5.0 Beta? Perhaps you could find out what's up
              here...

              Flo


              @To Don: Don, Thanks for your proposal! That's almost according with
              my idea but we better had something like the clip that has been
              posted by Sheri. I'm still grateful for the help you gave me with
              wordlist clips in 2004!
            • Don - HtmlFixIt.com
              ... Mine would not use regex and thus would be much faster and reliable I suspect. I just have not had time yet. I looked a Sheri s but it wasn t commented
              Message 6 of 30 , Jul 13, 2006
                > @To Don: Don, Thanks for your proposal! That's almost according with
                > my idea but we better had something like the clip that has been
                > posted by Sheri. I'm still grateful for the help you gave me with
                > wordlist clips in 2004!

                Mine would not use regex and thus would be much faster and reliable I
                suspect. I just have not had time yet.

                I looked a Sheri's but it wasn't commented enough for me to unravel it
                in the few minutes I would have had.

                Don
              • Sheri
                Sigh, thanks to Yahoo I lost a long message I had written as reply here. I fixed the clip, the problem was in how I was treating line ends. I think in the
                Message 7 of 30 , Jul 13, 2006
                  Sigh, thanks to Yahoo I lost a long message I had written as reply
                  here.

                  I fixed the clip, the problem was in how I was treating line ends. I
                  think in the result of the original the 196 words are there, just not
                  separated from the next (or previous) word. It would help to know
                  what they are to figure out if it was really a problem based on how I
                  was doing it.

                  Flo, how much larger files do you plan to use this with? There does
                  seem to be a limit to how long the list of alternates (stop words)
                  can be (or maybe the more confining limit is in how big the search
                  text can be in a ^!Replace command). I tried to use the 16304 "wrong
                  result" list as stop words to the 16500 "right result" list. The ^!
                  Replace command fails to make any replacements.

                  Regards,
                  Sheri

                  H="Word List"
                  ;2006-07-13 Revisions
                  ;2006-07-12 Created by Sheri Pierce
                  ;Requires NoteTab version 5 beta
                  ^!Set %xlfile%=^?{(T=O)Browse to xl list=^%xlfile%}
                  ^!Open ^%xlfile%
                  ^!SetWordWrap Off
                  ^!IfError Error
                  ^!Select All
                  ^!Menu Modify/Lines/Trim Blanks
                  ;Next 6 lines make a big pattern of alternates for regexp
                  ^!Set %xl%="^$StrTrim(^$GetText$)$"
                  ^!Set %xl%="^$StrSort("^%xl%";No;No;Yes)$"
                  ^!Set %xl%="^$StrReplace("^%NL%";"|";"^%xl%";No;No)$"
                  ^!Set %myvarname%="xl"
                  ^!Clip GetRegEscape ^%xl%
                  ^!Set %xl%="^$StrReplace("\|";"|";"^%xl%";No;No)$"
                  ^!Close Discard
                  ^!Toolbar New Document
                  ^!Set %wordfile%=^?{(T=O)Browse to word list=^%wordfile%}
                  ^!InsertFile ^%wordfile%
                  ^!IfError Error
                  ^!SetWordWrap Off
                  ^!Select All
                  ^!Menu Modify/Lines/Trim Blanks
                  ^!Replace "(?i)^(^%xl%)$" >> "" RAWS
                  ^!ClearVariable %xl%
                  ^!Set %word%="^$StrTrim(^$GetText$)$"
                  ^!Set %word%="^$StrSort("^%word%";No;Yes;Yes)$"
                  ^!Select All
                  ^!InsertText ^%word%
                  ^!ClearVariable %word%
                  ^!Replace "^\r?\n" >> "" RSAW
                  ^!Info Finished List of Unique Words
                  ^!Goto End
                  :Error
                  ^!Prompt Unable to open file
                  ;end of clip
                • Alan_C
                  ... sort, uniq, and tr came to mind. http://gnuwin32.sourceforge.net/packages/textutils.htm
                  Message 8 of 30 , Jul 13, 2006
                    On Wednesday 12 July 2006 10:38, jonas_ramus wrote:
                    > Hi,
                    >
                    > I tried to develop a clip that would remove all words from a word
                    > list WL which are contained in a stopword list XL. If we have two
                    > lists...
                    >
                    > Word List WL Stopword List XL
                    > A B
                    > B D
                    > B E
                    > D
                    > E
                    > G
                    > G
                    > H
                    >
                    > ...the result should be...
                    >
                    > A
                    > G
                    > H
                    >
                    > That is, the clip shouldn't delete duplicates only but it also
                    > should totally remove all stopwords from the word list WL.

                    sort, uniq, and tr came to mind.

                    http://gnuwin32.sourceforge.net/packages/textutils.htm

                    http://www.google.com/search?q=uniq+for+windows&start=0&ie=utf-8&oe=utf-8&client=firefox-a&rls=org.mozilla:en-US:official
                    --
                  • jonas_ramus
                    Sheri, Congratulations! The new clip supplies a perfect output of 15.500 words from A to Y only. Nevertheless, I have to confirm that error when working with
                    Message 9 of 30 , Jul 13, 2006
                      Sheri,

                      Congratulations! The new clip supplies a perfect output of 15.500
                      words from A to Y only.

                      Nevertheless, I have to confirm that error when working with bigger
                      stopword-lists.

                      We also get a wrong result if the stopword-list is bigger than the
                      wordlist. I tested this taking the A-words (1,057) and the B-words
                      (1,176) from ntf-wordlist.txt. Together they produce a stopword-list
                      of 2,233 words. Then I took the B-words as a wordlist of 1,176
                      lines. On that condition, the clip outputs exactly the 1,176 B-words
                      whereas it should end in an empty file.

                      In principle, the size of these lists is not limited since they are
                      derived from big text databases.

                      Flo


                      @To Alan: Thanks for these hints! However, it's not a matter of
                      sorting or deleting duplicate lines onnly but of completely removing
                      stopwords...
                    • rpdooling
                      ... Maybe we should form a band of renegades, run off and start a separate newsgroup on Google Groups. I m game if anybody else want to go. I m tired of this
                      Message 10 of 30 , Jul 13, 2006
                        >> Sigh, thanks to Yahoo I lost a long message
                        >> I had written as reply here.

                        Maybe we should form a band of renegades, run off and start a separate
                        newsgroup on Google Groups.

                        I'm game if anybody else want to go. I'm tired of this interface.

                        rick
                      • Alan_C
                        On Thursday 13 July 2006 14:44, jonas_ramus wrote: [ . . ] ... Oh, I didn t provide sufficient context. You re unaware of the power that these originally
                        Message 11 of 30 , Jul 13, 2006
                          On Thursday 13 July 2006 14:44, jonas_ramus wrote:
                          [ . . ]
                          > @To Alan: Thanks for these hints! However, it's not a matter of
                          > sorting or deleting duplicate lines onnly but of completely removing
                          > stopwords...

                          Oh, I didn't provide sufficient context. You're unaware of the power that
                          these originally unix/linux (sort, uniq, tr) text manipulation apps/utilities
                          have.

                          Yes I know you need to remove stop words. That's what the tr is for (after
                          use of sort and uniq). But, maybe, maybe not there's a way to read the stop
                          words from a file and feed them to tr because I'm not that familiar with the
                          Win/DOS command line. (yes definitely probably could use a clip with an array
                          to feed the stop words to tr).

                          The idea is that the three utilities are powerful specialty items with strong
                          reputation of proven tool for the task that can extend Notetab and cut down
                          on the amount and the complexity of clip code writing.

                          I use Linux and know it's command line, bash shell syntax, and fair amount of
                          Perl. Perl perhaps can be equalled but not bettered when it comes to text
                          file manipulation.

                          If you willing to install cygwin then all of these are in it.

                          Significantly less code than Notetab needs to do the same job (this task).

                          Not that Notetab can't. NTB is handy for lots. But for this particular task,
                          given if I were doing it, I'd use some Perl and the other three utilities
                          mentioned which would be much shorter and quicker for me than if I were to
                          limit only to the clip only route.

                          I've been there, done it both ways. For me it's a quicker faster cut to the
                          chase to either not clip it or perhaps to make clip that merely controls/runs
                          those three mentioned utilities.

                          I'm not alone on this. But what few others that know of it/this likely busy
                          elsewhere and may not have time to follow this thread and respond.

                          --
                          Alan.
                        • jonas_ramus
                          Alan, Thanks again. Now I better understood your recommendations. In my view, however, it s too complicated for an average user to use three tools for one
                          Message 12 of 30 , Jul 14, 2006
                            Alan,

                            Thanks again. Now I better understood your recommendations. In my
                            view, however, it's too complicated for an average user to use three
                            tools for one simple task. When viewing the CygWin Website, it
                            appears even rather difficult to comprehend what to download and
                            what to install.

                            Unless we don't get another solution from Sheri, Don, or anybody
                            else, I'll have to carry on with other tools. There is, for example,
                            an excellent freeware tool called AntConc
                            (http://www.antlab.sci.waseda.ac.jp/software.html). Among other
                            features, it creates word lists, and also excludes stop words from
                            that list by matching it with an imported stop word list. So far, I
                            didn't experience any limitations as to the size of that lists.

                            By the way, this task could even be solved with the free-form text
                            database from which these lists are derived (www.askSam.com).
                            However, this needs some additional procedures. So I was dreaming of
                            a more simple solution with NoteTab...

                            Flo

                            Moderator of the German askSam Forum
                            http://tinyurl.com/86nr3
                          • Sheri
                            Hi Flo, In this revision, it applies the stop list in approximately 10K chunks. Although there may be practical limits depending on your equipment (because
                            Message 13 of 30 , Jul 14, 2006
                              Hi Flo,

                              In this revision, it applies the stop list in approximately 10K
                              chunks. Although there may be practical limits depending on your
                              equipment (because even though it is applying the stop words in
                              smaller chunks it initially reads them all into memory), it seems to
                              be reasonably fast and reliable. I added a few unique words to your
                              wordlist and then used the original wordlist as a stop word list. It
                              took about 30 seconds on my pc. I was surprised that one of my added
                              words wasn't there. But surprise, there was another occurrence of
                              that word elsewhere in the file (and therefore was in the stop list).
                              I have an older athlon and 1 gb of memory.

                              Regards,
                              Sheri


                              H="Safer Word List"
                              ;2006-07-14 Revision
                              ;2006-07-12 Created by Sheri Pierce
                              ;Requires NoteTab version 5 beta
                              ^!SetScreenUpdate Off
                              ^!Set %xlfile%=^?{(T=O)Browse to xl list=^%xlfile%}
                              ^!Set %wordfile%=^?{(T=O)Browse to word list=^%wordfile%}
                              ^!Set %starttime%=^$GetDate(tt)$
                              ^!Open ^%xlfile%
                              ^!SetWordWrap Off
                              ^!IfError Error
                              ^!Select All
                              ^!Menu Modify/Lines/Trim Blanks
                              ^!Select All
                              ^$StrTrim(^$GetSelection$)$
                              ^!Select All
                              ^$StrSort("^$GetSelection$";No;No;Yes)$
                              ^!Jump Doc_Start
                              ^!Set %index%=0
                              :Loop
                              ^!Inc %index%
                              ^!Set %pos_start%=^$GetRow$:^$GetCol$
                              ^!Select +10000
                              ^!Jump Select_End
                              ^!Jump Line_End
                              ^!Set %pos_end%=^$GetRow$:^$GetCol$
                              ^!SelectTo ^%pos_start%
                              ;Next 6 lines make a big pattern of alternates for regexp
                              ^!Set %xl%="^$StrReplace("^%NL%";"|";"^$GetSelection$";No;No)$"
                              ^!Set %myvarname%="xl"
                              ^!Clip GetRegEscape ^%xl%
                              ^!IfError ClipNotFound
                              ^!Set %xl%="^$StrReplace("\|";"|";"^%xl%";No;No)$"
                              ^!Set %Pat^%index%%="^%xl%"
                              ^!Set %xl%=""
                              ^!Jump Select_End
                              ^!Jump +1
                              ^!If ^$GetRow$=^$GetLineCount$ Last Else Loop
                              :Last
                              ^!Close Discard
                              ^!Toolbar New Document
                              ^!SetScreenUpdate On
                              ^!InsertFile ^%wordfile%
                              ^!IfError Error
                              ^!SetWordWrap Off
                              ^!SetScreenUpdate On
                              ^!Select All
                              ^!SetScreenUpdate Off
                              ^!Menu Modify/Lines/Trim Blanks
                              ^!Set %repindex%=0
                              :Replace
                              ^!Inc %repindex%
                              ^!If ^%repindex%>^%index% Done
                              ^!Replace "^(?i:^%Pat^%repindex%%)$" >> "" RAWS
                              ;^!IfError Next Else Skip_2
                              ;^!Info pattern ^%repindex% failed
                              ;^!AppendToFile "C:\Documents and Settings\Sheri\My
                              Documents\NotetabBetaTest\badregexpreplace.txt" Pattern ^%repindex%:
                              ^%Pat^%repindex%%
                              ^!ClearVariable %^%Pat^%repindex%%
                              ^!Goto Replace
                              :Done
                              ^!Set %word%="^$StrTrim(^$GetText$)$"
                              ^!Set %word%="^$StrSort("^%word%";No;Yes;Yes)$"
                              ^!Select All
                              ^!InsertText ^%word%
                              ^!Replace "^\r\n" >> "" RSAW
                              ^!ClearVariable %word%
                              ^!Set %endtime%=^$GetDate(tt)$
                              ^!Info Start: ^%starttime%^%NL%End: ^%endtime%^%NL%Complete List of
                              Unique Words with Stop Words Removed.
                              ^!Goto End
                              :Error
                              ^!Prompt Unable to open file
                              ^!Goto End
                              :ClipNotFound
                              ^!Prompt The clip GetRegEscape needs to be installed in this library.
                              ;end of clip
                            • jonas_ramus
                              Sheri, I immediately tested this third version. There are some problems with the output as follows: 1. When taking as... word list: ntf-wordlist.txt (16,000)
                              Message 14 of 30 , Jul 14, 2006
                                Sheri,

                                I immediately tested this third version. There are some problems
                                with the output as follows:

                                1. When taking as...

                                word list: ntf-wordlist.txt (16,000)
                                stop words: ntf-stopwords (250)

                                the last lines of the output are (lines numbers added)...

                                Line
                                15500 Yucca
                                15501 Yumen
                                15002 Zweifacher|Zwei-Tank-Systeme|Zwei|Zweckverband|....
                                15503 Unique Words with Stop Words Removed.

                                That is, it outputs all the stop words (Z-words). Line 15503 outputs
                                the text of the final message.

                                2. When taking as...

                                word list: B-words (1,176)
                                stop words: A+B-words (2,233)

                                the last lines of the output are (line numbers added)...

                                Line
                                1057 A­-Klasse
                                1057 A­-Klasse-Prototypen
                                1058 Binnennachfrage|Binnenmarkt|Binnenland...
                                1059 Büssem|Bürotechnik|Büros|Büroleitung
                                1060 Documents\NotetabBetaTest\badregexpreplace.txt" Pattern 1:
                                1061 Unique Words with Stop Words Removed.

                                That is, out of 2,233 stop words, it outputs 1,526 stop words with
                                the result plus some additional text.

                                Thanks for your great help and patience in this matter!

                                Flo
                              • Sheri
                                Hi Flo, Yahoo has wrapped a couple of long lines. The first was actually commented out, it begins with. ;^!AppendtoFile The wrapped portion is getting written
                                Message 15 of 30 , Jul 14, 2006
                                  Hi Flo,

                                  Yahoo has wrapped a couple of long lines. The first was actually
                                  commented out, it begins with.

                                  ;^!AppendtoFile

                                  The wrapped portion is getting written into your file. It looks like
                                  it actually wrapped twice, once at "Documents" and again at "^%Pat"

                                  Another is nearer the bottom, and ^!Info command, looks like it
                                  wrapped at the word "Unique".

                                  Please unwrap those long lines and try it again. You could actually
                                  remove the long comment line along with the two comments lines above
                                  it.

                                  Regards,
                                  Sheri

                                  --- In ntb-clips@yahoogroups.com, "jonas_ramus" <jonas_ramus@...>
                                  wrote:
                                  >
                                  > Sheri,
                                  >
                                  > I immediately tested this third version. There are some problems
                                  > with the output as follows:
                                  >
                                  > 1. When taking as...
                                  >
                                  > word list: ntf-wordlist.txt (16,000)
                                  > stop words: ntf-stopwords (250)
                                  >
                                  > the last lines of the output are (lines numbers added)...
                                  >
                                  > Line
                                  > 15500 Yucca
                                  > 15501 Yumen
                                  > 15002 Zweifacher|Zwei-Tank-Systeme|Zwei|Zweckverband|....
                                  > 15503 Unique Words with Stop Words Removed.
                                  >
                                  > That is, it outputs all the stop words (Z-words). Line 15503
                                  outputs
                                  > the text of the final message.
                                  >
                                  > 2. When taking as...
                                  >
                                  > word list: B-words (1,176)
                                  > stop words: A+B-words (2,233)
                                  >
                                  > the last lines of the output are (line numbers added)...
                                  >
                                  > Line
                                  > 1057 A­-Klasse
                                  > 1057 A­-Klasse-Prototypen
                                  > 1058 Binnennachfrage|Binnenmarkt|Binnenland...
                                  > 1059 Büssem|Bürotechnik|Büros|Büroleitung
                                  > 1060 Documents\NotetabBetaTest\badregexpreplace.txt" Pattern 1:
                                  > 1061 Unique Words with Stop Words Removed.
                                  >
                                  > That is, out of 2,233 stop words, it outputs 1,526 stop words with
                                  > the result plus some additional text.
                                  >
                                  > Thanks for your great help and patience in this matter!
                                  >
                                  > Flo
                                  >
                                • abairheart
                                  ... Do you get a commission from google for transfering groups to them?
                                  Message 16 of 30 , Jul 14, 2006
                                    --- In ntb-clips@yahoogroups.com, "rpdooling" <rpdooling@...> wrote:
                                    >
                                    > >> Sigh, thanks to Yahoo I lost a long message
                                    > >> I had written as reply here.
                                    >
                                    > Maybe we should form a band of renegades, run off and start a separate
                                    > newsgroup on Google Groups.
                                    >
                                    > I'm game if anybody else want to go. I'm tired of this interface.
                                    >
                                    > rick
                                    >


                                    Do you get a commission from google for transfering groups to them?
                                  • adrien
                                    ... Well, first: I do not understand why Fookes software run not its own mailinglist. Second: I did lost also several messages, because for some reason my
                                    Message 17 of 30 , Jul 14, 2006
                                      Op vr, 14-07-2006 te 16:21 +0000, schreef abairheart:
                                      > Do you get a commission from google for transfering groups to them?

                                      Well, first: I do not understand why Fookes software run not its own
                                      mailinglist. Second: I did lost also several messages, because for some
                                      reason my provider (I think, i just asked this to them) classified the
                                      Yahoo messages as spam!

                                      (So, where can I download the beta version?)
                                      --
                                      adrien <adrien.verlee@...>
                                    • jonas_ramus
                                      Sheri, Now I ve got it. Thanks again. I m much obliged to you for your help. I think that s a perfect solution now. Flo
                                      Message 18 of 30 , Jul 14, 2006
                                        Sheri,

                                        Now I've got it. Thanks again. I'm much obliged to you for your help. I
                                        think that's a perfect solution now.

                                        Flo
                                      • rpdooling
                                        ... This is a joke without the smiley face, right? rd
                                        Message 19 of 30 , Jul 14, 2006
                                          >> Do you get a commission from google for
                                          >> transfering groups to them?

                                          This is a joke without the smiley face, right?

                                          rd
                                        • Bob McAllister
                                          ... Flo Here is a solution that I think is close to your original idea. H= Remove stop words and duplicates ^!Set %wlfile%=^?[(T=O)Select file to be
                                          Message 20 of 30 , Jul 14, 2006
                                            On 7/14/06, jonas_ramus <jonas_ramus@...> wrote:

                                            >
                                            > Unless we don't get another solution from Sheri, Don, or anybody
                                            > else, I'll have to carry on with other tools.

                                            > By the way, this task could even be solved with the free-form text
                                            > database from which these lists are derived (www.askSam.com).
                                            > However, this needs some additional procedures. So I was dreaming of
                                            > a more simple solution with NoteTab...
                                            >
                                            > Flo
                                            >

                                            Flo

                                            Here is a "solution" that I think is close to your original idea.

                                            H="Remove stop words and duplicates"
                                            ^!Set %wlfile%=^?[(T=O)Select file to be cleaned]
                                            ^!Set %xlfile%=^?[(T=O)Select list of stop words]
                                            ^!SetScreenUpdate Off
                                            ^!StatusShow Setting up ...
                                            ^!Open ^%xlfile%
                                            ^!SetListDelimiter ^p
                                            ^!Select All
                                            ^!SetArray %stopwords%=^$StrSort("^$GetSelection$";False;True;True)$
                                            ^!Close ^%xlfile% Discard
                                            ^!StatusShow Processing file ... may take some time!
                                            ^!SetScreenUpdate Off
                                            ^!Open ^%wlfile%
                                            ^!Set %wlcount%=^$Calc(^$GetParaCount$-1;0)$
                                            ^!Select All
                                            ^$StrSort("^$GetSelection$";False;True;True)$
                                            ^!Set %index%=1
                                            ^!Jump 1
                                            :loop
                                            ^!Replace "^%stopwords^%index%%^p" >> "" CS
                                            ^!Inc %index%
                                            ^!If ^%index%=^%stopwords0% skip
                                            ^!Goto loop
                                            ^!StatusClose
                                            ^!jump 1
                                            ^!Set %nlcount%=^$Calc(^$GetParaCount$-1;0)$
                                            ^!Save As ^$GetPath(^%wlfile%)$newlist.txt
                                            ^!Close Discard
                                            ^!Info Task complete.^%NL%^%NL%Original ^%wlcount% lines in
                                            ^%wlfile%^%NL% were reduced to ^%nlcount% in
                                            ^$GetPath(^%wlfile%)$newlist.txt

                                            Note that the final line (beginning ^!Info) may break were it wraps in
                                            your email software and need to be rejoined.

                                            The clip asks you to select two files (Word List and StopWord List)
                                            and reads the stop words into an array to be used as the basis for
                                            searching the list to be cleaned. A reduced list is then saved back
                                            into the folder that Word List came from.

                                            The maximum size (unknown to me) of a NoteTab array will put a limit
                                            on the size of the StopWord files handled, but it copes with your
                                            samples.

                                            I have used the final parameter on ^$StrSort()$ to remove the
                                            duplicates from the lists as the sort occurs.

                                            Sorting both lists on the same basis lets me remove the W (whole file)
                                            parameter from ^!Replace for a huge speed increase. Even so, it is not
                                            a super-fast process on my old PII.

                                            I hope this helps.

                                            Bob McAllister
                                          • jonas_ramus
                                            ... Bob, Thanks! This is also an interesting solution. Evidently, it works with NT 5.0 Beta and NT 4.95 as well. I tested this clip with the above mentioned
                                            Message 21 of 30 , Jul 15, 2006
                                              --- In ntb-clips@yahoogroups.com, "Bob McAllister" <fortiter@...>
                                              wrote:
                                              >
                                              > Flo
                                              >
                                              > Here is a "solution" that I think is close to your original idea...

                                              Bob,

                                              Thanks! This is also an interesting solution. Evidently, it works
                                              with NT 5.0 Beta and NT 4.95 as well.

                                              I tested this clip with the above mentioned files - see
                                              http://flogehrke.homepage.t-online.de/194/ntf-wordlist.zip

                                              It properly reduces the ntf-wordlist.txt from 16,000 to 15,500 words
                                              when using ntf-stopwords.txt (250 stop words).

                                              It runs into some problems, however, if the stop word list is bigger
                                              than the word list. This works correctly with just a few words. When
                                              replacing the word list (16,000) and the stop word list (250) with
                                              each other, the output file should be empty since all the stop words
                                              occur in the word list. In this case, your clip is reducing the 250
                                              (stop) words to 247 words for me.

                                              You also have to take care to enter an empty line at the end of the
                                              stop word list. Without that, the last stop word will not be removed.

                                              Flo
                                            • Sheri
                                              Hi Flo, Here is one more approach. It is more along the lines of Don s original suggestion, but it still needs regular expressions. The bottleneck in this one
                                              Message 22 of 30 , Jul 15, 2006
                                                Hi Flo,

                                                Here is one more approach. It is more along the lines of Don's
                                                original suggestion, but it still needs regular expressions. The
                                                bottleneck in this one is sorting the combined list of stop words and
                                                word list (if they are big). With my equipment, using two lists of
                                                16000 words took about 10 seconds longer than my other clip. But
                                                there could be instances where the size of the lists, speed of your
                                                computer and amount of memory might make this one faster. On my
                                                equipment this clip seems to be about equal in speed to the other one
                                                removing the 250 stop words.

                                                While doing this clip I discovered that the clip function ^$StrSort$
                                                and the menu command for sorting lines do not have the same collating
                                                order for symbols and such. I thought the menu command seemed to
                                                process the combined list faster, but the sequencing was not
                                                compatible with my approach and using it led to wrong results.

                                                Please be sure to unwrap the long line near the end.

                                                Regards,
                                                Sheri


                                                H="New Word List Approach"
                                                ;2006-06-15 Created by Sheri Pierce
                                                ;requires Notetab 5 beta
                                                ^!SetClipboard ""
                                                ^!SetScreenUpdate Off
                                                ^!Set %xlfile%=^?{(T=O)Browse to stop list=^%xlfile%}
                                                ^!Set %wordfile%=^?{(T=O)Browse to word list=^%wordfile%}
                                                ^!Set %starttime%=^$GetDate(tt)$
                                                ^!Open ^%xlfile%
                                                ^!Select All
                                                ^!Menu Modify/Lines/Trim Blanks
                                                ;remove any empty lines
                                                ^!Replace "^(\r\n)+" >> "" RAWS
                                                ^!Set %word%="^$StrTrim(^$GetText$)$"
                                                ;^$StrSort("Str";CaseSensitive;Ascending;RemoveDuplicates)$
                                                ^!Set %word%="^$StrSort("^%word%";NO;Yes;Yes)$"
                                                ^!Select All
                                                ^!InsertText ^%word%
                                                ;append " " to each line of stop list
                                                ^!Replace ^([^\r\n]+)(?=\r|\n|\z) >> "$0 " RAWS
                                                ^!Replace "(?<= )(?=\r?\n\z)" >> "!" RAWS
                                                ^!Set %word%=^$GetText$
                                                ^!Close Discard
                                                ;Load word list
                                                ^!Toolbar New Document
                                                ^!SetScreenUpdate On
                                                ^!InsertFile ^%wordfile%
                                                ^!IfError Error
                                                ^!SetWordWrap Off
                                                ^!Select All
                                                ^!Menu Modify/Lines/Trim Blanks
                                                ;sort ascending remove duplicates
                                                ^!Select All
                                                ^$StrSort("^$GetSelection$";NO;Yes;Yes)$
                                                ;make sure last line ends with crlf
                                                ^!Jump Doc_End
                                                ^!InsertText ^P
                                                ;remove any empty lines
                                                ^!Replace "^(\r\n)+" >> "" RAWS
                                                ^!Jump Doc_End
                                                ^%word%
                                                ^!Set %word%=""
                                                ^!Jump Doc_Start
                                                ^!StatusShow Sorting...
                                                ;^$StrSort("Str";CaseSensitive;Ascending;RemoveDuplicates)$
                                                ^!Set %word%=^$StrSort("^$GetText$";NO;Yes;NO)$
                                                ^!Select All
                                                ^%word%
                                                ^!Set ^%word%=""
                                                ^!StatusClose
                                                ^!Replace "^([^\r\n]+)\r?\n\1 \!?((\r?\n)|\z)" >> "" RAWS
                                                ^!Replace "^([^\r\n]+) \!?((\r?\n)|\z)" >> "" RAWS
                                                ^!Jump Doc_End
                                                ^!Set %endtime%=^$GetDate(tt)$
                                                ^!Info Start: ^%starttime%^%NL%End: ^%endtime%^%NL%Complete List of
                                                Unique Words with Stop Words Removed.
                                                ^!Goto End
                                                ;end of clip
                                              • Sheri
                                                Hi, Isn t Google groups is just a web interface to non-binary Usenet groups? Regards, Sheri ... separate
                                                Message 23 of 30 , Jul 15, 2006
                                                  Hi,

                                                  Isn't Google groups is just a web interface to non-binary Usenet
                                                  groups?

                                                  Regards,
                                                  Sheri



                                                  --- In ntb-clips@yahoogroups.com, "abairheart" <abairheart@...> wrote:
                                                  >
                                                  > --- In ntb-clips@yahoogroups.com, "rpdooling" <rpdooling@> wrote:
                                                  > >
                                                  > > >> Sigh, thanks to Yahoo I lost a long message
                                                  > > >> I had written as reply here.
                                                  > >
                                                  > > Maybe we should form a band of renegades, run off and start a
                                                  separate
                                                  > > newsgroup on Google Groups.
                                                  > >
                                                  > > I'm game if anybody else want to go. I'm tired of this interface.
                                                  > >
                                                  > > rick
                                                  > >
                                                  >
                                                  >
                                                  > Do you get a commission from google for transfering groups to them?
                                                  >
                                                • jonas_ramus
                                                  Sheri, Another perfect solution indeed! It gets to correct results with all combinations of the test lists. Although each clip represents another approach,
                                                  Message 24 of 30 , Jul 15, 2006
                                                    Sheri,

                                                    Another perfect solution indeed! It gets to correct results with all
                                                    combinations of the test lists. Although each clip represents another
                                                    approach, there is no considerable difference in the running time on my
                                                    PC. Maybe it's measurable with much bigger lists.

                                                    Thank you so much for all the trouble you have been taking in this
                                                    matter! I hope that even more NT users will profit from these solutions.

                                                    Flo
                                                  • rpdooling
                                                    ... Sheri, I m told it can be a web interface to already existing usenet groups. Truth be told, I don t know. I access it via the web. You can mark threads to
                                                    Message 25 of 30 , Jul 15, 2006
                                                      >> Isn't Google groups is just a web interface
                                                      >> to non-binary Usenet groups?

                                                      Sheri,

                                                      I'm told it can be a web interface to already existing usenet groups.
                                                      Truth be told, I don't know. I access it via the web.

                                                      You can mark threads to follow, if there is new activity on a thread,
                                                      it appears at upper right.

                                                      Look at the Python group for a sample

                                                      http://groups.google.com/group/comp.lang.python

                                                      rd
                                                    • Bob McAllister
                                                      ... Flo Your comment that all the stop words occur in the word list tipped me off to my error. The speed modification that I made (running Replace without a
                                                      Message 26 of 30 , Jul 15, 2006
                                                        On 7/15/06, jonas_ramus <jonas_ramus@...> wrote:

                                                        > Bob,
                                                        >
                                                        > Thanks! This is also an interesting solution. Evidently, it works
                                                        > with NT 5.0 Beta and NT 4.95 as well.
                                                        >
                                                        >
                                                        > It runs into some problems, however, if the stop word list is bigger
                                                        > than the word list. This works correctly with just a few words. When
                                                        > replacing the word list (16,000) and the stop word list (250) with
                                                        > each other, the output file should be empty since all the stop words
                                                        > occur in the word list.
                                                        >

                                                        Flo

                                                        Your comment that "all the stop words occur in the word list" tipped
                                                        me off to my error. The speed modification that I made (running
                                                        Replace without a W switch) breaks down if there is a stopword that
                                                        does not occur in the word list being cleaned.

                                                        If you modify your test files by adding a few words to your
                                                        ntf-stopwords.txt file that are NOT contained in ntf-wordlist.txt,
                                                        then you will catch my error when using the files as planned as well
                                                        as backwards.

                                                        The compromise solution is to reset the search to the top of the file
                                                        whenever this situation occurs by adding ^!IfError Jump 1 in the loop
                                                        as shown below.
                                                        :loop
                                                        ^!Replace "^%stopwords^%index%%^p" >> "" CS
                                                        ^!IfError ^!Jump 1
                                                        ^!Inc %index%
                                                        ^!If ^%index%=^%stopwords0% skip
                                                        ^!Goto loop

                                                        Bob McAllister
                                                      • jonas_ramus
                                                        Bob, You wrote... ... So I did, and I repeated the test with a stop word list that is bigger than the file to be cleaned (using ntf-wordlist.txt as stop words
                                                        Message 27 of 30 , Jul 16, 2006
                                                          Bob,

                                                          You wrote...

                                                          > If you modify your test files by adding a few words to your
                                                          > ntf-stopwords.txt file that are NOT contained in ntf-wordlist.txt,
                                                          > then you will catch my error...

                                                          So I did, and I repeated the test with a stop word list that is
                                                          bigger than the file to be cleaned (using ntf-wordlist.txt as stop
                                                          words and ntf-stopwords.txt as file to be cleaned). Now the
                                                          newlist.txt contains those three additional words, which is correct.
                                                          But again, it contains 248 of 250 stop words (Z-words in ntf-
                                                          stopwords.txt), which is not correct since this processing now
                                                          should output those three additional words only.

                                                          Next, I removed the three additional words from the stop words list
                                                          again and added that "^!IfError ^!Jump 1" line. Again, the result of
                                                          the same test isn't completely correct. When finished, the message
                                                          is...

                                                          Task complete.
                                                          Original 250 lines in [path]\ntf-stopwords.txt
                                                          were reduced to 0 in [path]\newlist.txt

                                                          The newlist.txt, however, contains the following line

                                                          Zucker-Zwei-Tank-

                                                          This appears as a concatenation and shortening of two compounds in
                                                          the stop words list: probably "Zucker-Aktiengesellschaft" or "Zucker-
                                                          Marktordnung", and "Zwei-Tank-Systeme".

                                                          Furthermore, it seems to me that you didn't deal with the "empty
                                                          line problem" in the stop word list I mentioned before. Maybe we
                                                          have to add something like...

                                                          ^!Jump Doc_End
                                                          ^!IfFalse ^$IsEmpty(^$GetLine$)$ Next Else Skip
                                                          ^!Keyboard Enter

                                                          to be executed in the opened stop word list in order to take care
                                                          that it ends with an empty line. Without that empty line, the
                                                          newlist.txt now contains...

                                                          Zucker-Zwei-Tank-Zweifacher

                                                          As mentioned before, we now also get the last stop word "Zweifacher".

                                                          Flo
                                                        • Alan_C
                                                          FWIW As far as the (sourceforge project) gnuwin32 utilities (though this may or may not be easy due to I on Win 2K was missing/needed MSVCP60.DLL which is a C,
                                                          Message 28 of 30 , Jul 17, 2006
                                                            FWIW

                                                            As far as the (sourceforge project) gnuwin32 utilities

                                                            (though this may or may not be easy due to I on Win 2K was missing/needed
                                                            MSVCP60.DLL which is a C, C++ library that I wasn't pre informed about)

                                                            I got as far as getting duplicates removed (I'm short on time). Next would be
                                                            to either 1. Notetab array of stopwords to a clip !^Replace command (replace
                                                            with nothing) or 2. Notetab array of stopwords fed to the gnuwin32 tr utility
                                                            so as to substitute each stopword with nothing. I didn't yet get the Notetab
                                                            clip command line for to run the gnuwin32 utilities. So far what you see
                                                            below is I manually did the command line in a console.

                                                            1. DL'd the file: coreutils-5.3.0.exe

                                                            2. DL'd the file: msvcp60.zip (has the MSVCP60.DLL)

                                                            Ran #1 (the installer file for gnuwin32 coreutils)

                                                            copied MSVCP60.DLL to c:\winnt\system32

                                                            added the gunwin32\bin folder to my PATH statement

                                                            c:\test.txt (sorted ascending in Notetab) (gnuwin32 also has a sort)

                                                            (news occurs twice). uniq eliminate duplicates

                                                            C:\>cat test.txt
                                                            blue
                                                            bravo
                                                            charlie
                                                            favorite
                                                            group
                                                            mountain
                                                            news
                                                            news
                                                            omega
                                                            purple
                                                            river
                                                            valley

                                                            C:\>uniq test.txt > testout.txt

                                                            C:\>cat testout.txt
                                                            blue
                                                            bravo
                                                            charlie
                                                            favorite
                                                            group
                                                            mountain
                                                            news
                                                            omega
                                                            purple
                                                            river
                                                            valley

                                                            C:\>
                                                            ------------------

                                                            That's the extent so far of my gnuwin32 Linux/Unix like utilities foray on
                                                            Win.

                                                            --
                                                            Alan.
                                                          • Jody
                                                            Hi Everybody Reading This, I copied the OffTopic list this message so that for those that want to continue discussing it can do so over there. Please reply to
                                                            Message 29 of 30 , Jul 18, 2006
                                                              Hi Everybody Reading This,

                                                              I copied the OffTopic list this message so that for those that
                                                              want to continue discussing it can do so over there. Please reply
                                                              to the [OT] copy of this and not the [Clips]. TIA Oh, see my
                                                              sigline or a [NTB] footer to get on the OffTopic list which we
                                                              need to go on for further discussion. This is basically about
                                                              "why we use and will continue to use Yahoogroups, so you can
                                                              delete now if you don't care or could care less. I'm pretty well
                                                              firm on the idea of staying with Yahoogroups...

                                                              SHERI WROTE...

                                                              >> Sigh, thanks to Yahoo I lost a long message I had written as
                                                              >> reply here.

                                                              I'm sorry about you loosing your long message. I normally have my
                                                              PasteBoard on for anytime I am ordering something, filling about
                                                              forms that have message/comments boxes, for setting up things
                                                              like updating Yahoogroups messages sent out, and a host of other
                                                              things. The reason for whatever unknown reasons the pages get
                                                              wiped out or terminated for taking to long to fill out something,
                                                              I have it still in NoteTab to Paste in again when retrying. This
                                                              disconnect/drop off, or whatever causes happens to me from all
                                                              sorts of places. My point is to say that Yahoogroups caused it is
                                                              an unfair comment. I know that those things happen for all kinds
                                                              of reasons. Having my PasteBoard active during the times I don't
                                                              want to loose 15 minutes of re-typing saves me all kinds of time
                                                              and aggravation. Just the fact that I have the need to make a
                                                              PasteBoard tells me that it happens a lot and in other places
                                                              than Yahoogroups. In fact, I don't ever remember Yahoogroups cut
                                                              me off.

                                                              RICK WROTE...

                                                              > Maybe we should form a band of renegades, run off and start a
                                                              > separate newsgroup on Google Groups.

                                                              That's your freedom to speak and freedom to do what you want. You
                                                              are more than welcome becoming a renegade. :-)

                                                              > I'm game if anybody else want to go. I'm tired of this interface.

                                                              You won't find me being a renegade in this case. <g>

                                                              > rick

                                                              Thanks! ;) More...

                                                              >Op vr, 14-07-2006 te 16:21 +0000, schreef abairheart:
                                                              >> Do you get a commission from google for transfering groups to them?

                                                              ADRIEN WROTE...

                                                              >Well, first: I do not understand why Fookes software run not its
                                                              >own mailinglist. Second: I did lost also several messages,
                                                              >because for some reason my provider (I think, i just asked this
                                                              >to them) classified the Yahoo messages as spam!

                                                              Building, buying, providing upkeep/maintenance, features, and
                                                              many other things prevent us from making or buying our own
                                                              discussion mailing list. Actually, we have one, but don't use it
                                                              for our Fookes Software [FS] discussion mailing lists. We might,
                                                              in the future, use it for a very small amount of beta testers on
                                                              our FS Private beta testing list. Using our own list or
                                                              transferring to another list, for me, is just not reasonably
                                                              worth doing for larger list. The pros far out weight the cons as
                                                              far as staying as it right now. We have had private/our own lists
                                                              before and they just didn't work out well. Yahoogroups is the
                                                              least hassle to maintain and for the users too because of multi-
                                                              list members. Just one setting has to be changed for it to span
                                                              across all the lists they are subscribed to. Yahoogroups has only
                                                              caused minor inconveniences at time and I and very happy that
                                                              they exist. I was with them when they were the little guys:
                                                              MakeList and OneList. I've watched them grow for over 12 years
                                                              and for the most part am extremely happy with them. I've seen,
                                                              used, and posted messages on a host of discussion mailing lists.
                                                              There are certain things that I require, if possible, that I will
                                                              not move to a list if they have or have not the feature, or
                                                              whatever. Yahoogroups hits my "most wanted/needed" list the
                                                              majority of time over any other list.

                                                              We'll stay with Yahoogroups who, as a whole, has mean very nice
                                                              for us. I find that this subject falls into the "use Linux, not
                                                              puking Microsoft" crowd. There's a minority of people that like
                                                              to bash Microsoft, etc. and they post so often about it making a
                                                              lot of practically slanderous remarks, that some would think
                                                              everybody hates Microsoft. Not true!, at all. I don't bother with
                                                              those kind of threads. They normally just get deleted after I
                                                              scan very quickly to see if it has to go to OffTopic or for the
                                                              cussing/bashing to send out personal messages, etc.


                                                              Happy Topics,
                                                              Jody

                                                              All Fookes Software discussion mailing lists with brief
                                                              description of lists and easy sign-on/off...
                                                              http://www.fookes.us/maillist.htm

                                                              The NoteTab Off-Topic List
                                                              mailto:ntb-OffTopic-Subscribe@yahoogroups.com (Subscribe)
                                                              mailto:ntb-OffTopic-UnSubscribe@yahoogroups.com (UnSubscribe)
                                                            • abairheart
                                                              ... Whaddaya mean WITHOUT smiley face? I WAS smiling.
                                                              Message 30 of 30 , Jul 19, 2006
                                                                --- In ntb-clips@yahoogroups.com, "rpdooling" <rpdooling@...> wrote:
                                                                >
                                                                > >> Do you get a commission from google for
                                                                > >> transfering groups to them?
                                                                >
                                                                > This is a joke without the smiley face, right?
                                                                >
                                                                > rd
                                                                >

                                                                Whaddaya mean WITHOUT smiley face?

                                                                I WAS smiling.
                                                              Your message has been successfully submitted and would be delivered to recipients shortly.
                                                              »
                                                              «