Loading ...
Sorry, an error occurred while loading the content.

24211Re: [Clip] Find Common Words Among Two Documents

Expand Messages
  • flo.gehrke
    Dec 4, 2013
      Sorry, there was some nonsense in my first reply (please delete!). Here is a correction that removes everything after the "P.S." ;-(

      --- In ntb-clips@yahoogroups.com, Ray Shapp <rayshapp@...> wrote:
      > I used NoteTab to separate every word in the second file onto
      > a line of its own then sorted while discarding duplicates.
      > Then the *comm *command gave an incredibly fast result.
      > Problem solved!

      Are you sure that Comm will accord with your intention in any case?

      For easier testing, I've integrated Comm in the following clip (please check '[PATH]' after '^!SetClipboard...'):

      ^!SetWizardWidth 75
      ^!SetWizardLabel Compare two lists
      ^!Set %File1%=^?{(T=O;F="Text files (*.txt)|*.txt")Left file:==^%File1%};
      %File2%=^?{(T=O;F="Text files (*.txt)|*.txt")Right file:==^%File2%};
      %Opt%=^?{(T=L)Suppress:==_Unique to leftt file^=-1|Unique to right
      file^=-2|Unique to both files^=-3}
      ^!SetClipboard ^$GetDosOutput([PATH]Comm ^%File1% ^%File2% ^%Opt%)$

      ^!IfSame "^%Opt%" "-1" Next Else Opt2
      ^!Toolbar New Document
      ^!InsertText Suppressing unique to left file:^P^P
      ^!Goto End

      ^!IfSame "^%Opt%" "-2" Next Else Opt3
      ^!Toolbar New Document
      ^!InsertText Suppressing unique to right file:^P^P
      ^!Goto End

      ^!Toolbar New Document
      ^!InsertText Suppressing unique to both files^P^P

      Two little files for testing:





      I understand that you try to find entries in left file which also occur in right file. In other words: Only those (duplicate) entries which occur in both files. Actually, there is no option in Comm that will provide exactly this result. Maybe '-2' gets pretty close to it. But note that it will also output entries that are "unique to right", i.e. missing in left.

      However, this is just a hint which may be irrelevant for you...

      By the way, why don't you try a NT clip which could be even more adapted to your needs? For example, the following clip compares both lists providing exactly only those entries contained in both lists. Different from Comm, no sorting is needed in both files:

      ^!SetWizardWidth 75
      ^!SetWizardLabel Compare two lists
      ^!Set %File1%=^?{(T=O;F="Text files (*.txt)|*.txt")First file:==^%File1%};
      %File2%=^?{(T=O;F="Text files (*.txt)|*.txt")Second file:==^%File2%}
      ^!SetScreenUpdate Off
      ^!InsertFile ^%File1%
      ; Insert empty line at end of list if missing
      ^!Replace "\R*\Z" >> "\r\n" WRS
      ^!InsertFile ^%File2%
      ^!Select All
      ^!InsertText ^$StrSort("^$GetSelection$";0;1;0)$
      ; Insert if needed: ^!Delay 5
      ; Assign duplicates to clipboard
      ^!SetClipboard ^$GetDocListAll("(^.+)(\r\n\1)+(\r\n|\Z)";"$1\r\n")$
      ^!Close Discard
      ^!InsertText Duplicate names^P^$StrFill("-";20)$^P

      Regarding your first posting, you are starting with a list as first (or left) file whereas the second file is containing continuous text merging file names and all sort of data. The next clip deals with that type of file. That is, you don't have to extract a list of entries from the second file because the clip will access that file directly and search for (duplicate) entries occurring in both files:

      ^!SetWizardWidth 75
      ^!SetWizardLabel Find duplicate file names
      ^!Set %File1%=^?{(T=O;F="Text files (*.txt)|*.txt")List containing file
      names:==^%File1%}; %File2%=^?{(T=O;F="Target file (*.txt)|*.txt")Second
      ^!SetScreenUpdate Off
      ^!InsertFile ^%File1%
      ^!Replace "\R{1,}\Z" >> "" WRS
      ; Create alternation from list
      ^!Replace "^P" >> "|" WAS
      ^!Set %Search%=(^$GetText$)
      ^!Close Discard
      ^!Open ^%File2%
      ^!SetListDelimiter ^%NL%
      ^!SetClipboard ^$GetDocMatchAll("(?i)^%Search%")$
      ; Note: Check 'Options|Tools|Sort Removes Duplicates'
      ^!Toolbar Sort Ascending
      ^!Jump Doc_Start
      ^!InsertText File names in ^$GetName(^%File1%)$ also contained in

      Start both clips from an empty document.

      Please note: These clips are a first draft only and, certainly, have to get attuned to your conditions.


      P.S. Check for false line breaks inserted by Yahoo!
    • Show all 12 messages in this topic