Loading ...
Sorry, an error occurred while loading the content.

12910RE: [Clip] Re: Extracting words from a file

Expand Messages
  • Hugo Paulissen
    Dec 4, 2004
    • 0 Attachment
      Don,

      You wrote the kind of clip I had in mind and for which I didn't have the
      time. It was clear that NoteTab's regex was in the way... ;-). If I had the
      need for this clip I would definitely test it!

      Hugo

      > -----Oorspronkelijk bericht-----
      > Van: Don - htmlfixit.com [mailto:don@...]
      > Verzonden: zaterdag 4 december 2004 3:37
      > Aan: ntb-clips@yahoogroups.com
      > Onderwerp: Re: [Clip] Re: Extracting words from a file
      >
      >
      >
      > > Hi Franz,
      > >
      > > I just happend across this thread. If I have understood your needs
      > > correctly, why not just reduce the list to a single column of words ,
      > > and sort them case sensitive?
      > >
      > > 1. Replace all spaces in the document with "^P" to change the list to
      > > individual words (ignore puntuation, if you like.
      > >
      > > 2. Sort the list CASE SENSITIVE
      > >
      > > 3. Delete the lower case words
      > >
      > >
      > > 500 K files should contain about 80,000 words or so. Shouldn't take
      > > more than a few minutes to do this by hand. If you have a lot of
      > > files you can always write down the keystrokes you use, then do the
      > > sort by Menu commands (^!Menu Modify/...). I think there's a
      > > configuration switch to change sorting behaviour (remove duplicates
      > > or not; case sensitive or not).
      > >
      > >
      > > Abair
      >
      > Bingo Abair, with one exception that pertains to German, but not to
      > English! It works and doesn't use regex. I tried it on the 500 lines
      > sent by Franz and on my 181,000 word file I have been trying with all
      > others (always an out of memory error until now). I used a clip to do
      > it as shown below. There is one problem however ... the German
      > characters with two dots over them (is that an umlaut?) are treated as
      > coming after the equivalent lower case letter .... so how do we deal
      > with that? Currently as written it deletes them as lower case. Maybe I
      > have to go one line at a time to delete? Does a German version of
      > NoteTab sort these correctly? Is it a bug in the sorting engine? Is it
      > just good old ASCII ordering? Are only certain letters umlauted, or
      > whatever the double dots are called, in German?
      >
      > ; by don at htmlfixit.com
      > ^!Menu Edit/Copy All
      > ^!Toolbar Paste New
      > ^!Replace "^P" >> " " ATIWS
      > ^!Replace ")" >> " " ATIWS
      > ^!Replace "(" >> " " ATIWS
      > ^!Replace """ >> " " ATIWS
      > ^!Replace "^T" >> " " ATIWS
      > ^!Replace "," >> " " ATIWS
      > ^!Replace "[" >> " " ATIWS
      > ^!Replace "]" >> " " ATIWS
      > ^!Replace "<" >> " " ATIWS
      > ^!Replace ">" >> " " ATIWS
      > ^!Replace "~" >> " " ATIWS
      > ^!Replace "!" >> " " ATIWS
      > ^!Replace "@" >> " " ATIWS
      > ^!Replace "#" >> " " ATIWS
      > ^!Replace "$" >> " " ATIWS
      > ^!Replace "%" >> " " ATIWS
      > ^!Replace "^" >> " " ATIWS
      > ^!Replace "&" >> " " ATIWS
      > ^!Replace "*" >> " " ATIWS
      > ^!Replace "_" >> " " ATIWS
      > ^!Replace "+" >> " " ATIWS
      > ^!Replace "=" >> " " ATIWS
      > ^!Replace "|" >> " " ATIWS
      > ^!Replace "{" >> " " ATIWS
      > ^!Replace "}" >> " " ATIWS
      > ^!Replace "\" >> " " ATIWS
      > ^!Replace "/" >> " " ATIWS
      > ^!Replace "?" >> " " ATIWS
      > ^!Replace "." >> " " ATIWS
      > ^!Replace ";" >> " " ATIWS
      > ^!Replace ":" >> " " ATIWS
      > ^!Replace "" >> " " ATIWS
      > ^!Replace "•" >> " " ATIWS
      > ^!Replace "– " >> " " ATIWS
      > ^!Replace "´" >> " " ATIWS
      > ^!Replace "”" >> " " ATIWS
      > ^!Replace "“" >> " " ATIWS
      > ^!Replace "‘" >> " " ATIWS
      > ^!Replace "`" >> " " ATIWS
      >
      >
      > ^!Menu Modify/Spaces/Single Space
      > ^!Replace " " >> "^P" ATIWS
      > ^!Replace "^P’" >> "^P" ATIWS
      > ^!Replace "^P-" >> "^P" ATIWS
      > ^!Replace "^P " >> "^P" ATIWS
      > ^!Menu Edit/Copy All
      > ^!SetClipboard ^$StrSort("^$GetClipboard$";1;1;1)$
      > ^!Select All
      > ^!Toolbar Paste
      >
      > ^!Set %LineN%=0
      > :DumpNumbers
      > ;^!SetDebug 1
      > ^!Inc %LineN% 10
      > ^!Jump ^%LineN%
      > ^!IfTrue ^$IsEmpty("^$GetLine$")$ DumpNumbers
      > ^!Select +1
      > ^!If "^$IsNumber("^$GetSelection$")$" = "1" DumpNumbers ELSE NotNumber
      > :NotNumber
      > ^!Jump -1
      > ^!Select +1
      > ^!If "^$IsNumber("^$GetSelection$")$" = "0" NotNumber ELSE DeleteNumbers
      >
      > :DeleteNumbers
      > ^!Jump +1
      > ^!SelectTo 1:1
      > ^!Continue is proper highlighted
      >
      > ^!Keyboard DELETE
      >
      >
      > ^!Set %LineN%=^$GetLineCount$
      > :DumpLowers
      > ^!Inc %LineN% -100
      > ^!Jump ^%LineN%
      > ^!Select +1
      > ^!If "^$IsUppercase("^$GetSelection$")$" = "0" DumpLowers ELSE NotLower
      > :NotLower
      > ^!Jump +1
      > ^!Select +1
      > ^!If "^$IsUppercase("^$GetSelection$")$" = "1" NotLower ELSE DeleteLowers
      >
      > :DeleteLowers
      > ^!Jump Select_Start
      > ^!Set %cursor_row%=^$GetRow$
      > ^!Set %cursor_col%=^$GetCol$
      > ^!Jump Doc_End
      > ^!SelectTo ^%cursor_row%:^%cursor_col%
      > ^!Continue Is Proper Highlighted
      > ^!Keyboard DELETE
      >
    • Show all 23 messages in this topic