Loading ...
Sorry, an error occurred while loading the content.

24210Re: [Clip] Find Common Words Among Two Documents

Expand Messages
  • Ray Shapp
    Dec 4, 2013
      Amazing work, Flo!

      Many thanks.

      Ray Shapp
      ---


      On Wed, Dec 4, 2013 at 7:35 AM, flo.gehrke <flo.gehrke@...> wrote:
       

      --- In ntb-clips@yahoogroups.com, Ray Shapp <rayshapp@...> wrote:
      >
      > I used NoteTab to separate every word in the second file onto
      > a line of its own then sorted while discarding duplicates.
      > Then the *comm *command gave an incredibly fast result.
      > Problem solved!

      Are you sure that Comm will accord with your intention in any case?

      For easier testing, I've integrated Comm in the following clip (please check '[PATH]' after '^!SetClipboard...'):

      ^!SetWizardWidth 75
      ^!SetWizardLabel Compare two lists
      ^!Set %File1%=^?{(T=O;F="Text files (*.txt)|*.txt")Left file:==^%File1%}; %File2%=^?{(T=O;F="Text files (*.txt)|*.txt")Right file:==^%File2%}; %Opt%=^?{(T=L)Suppress:==_Unique to leftt file^=-1|Unique to right file^=-2|Unique to both files^=-3}
      ^!SetClipboard ^$GetDosOutput([PATH]Comm ^%File1% ^%File2% ^%Opt%)$

      :Opt1
      ^!IfSame "^%Opt%" "-1" Next Else Opt2
      ^!Toolbar New Document
      ^!InsertText Suppressing unique to left file:^P^P
      ^!Paste
      ^!Goto End

      :Opt2
      ^!IfSame "^%Opt%" "-2" Next Else Opt3
      ^!Toolbar New Document
      ^!InsertText Suppressing unique to right file:^P^P
      ^!Paste
      ^!Goto End

      :Opt3
      ^!Toolbar New Document
      ^!InsertText Suppressing unique to both files^P^P
      ^!Paste

      Two little files for testing:

      file_left.txt

      Anthony
      Bertha
      Carla
      Dorothy
      Edward
      Frederic
      LeftOnly

      file_right.txt

      Anthony
      Dorothy
      Edward
      Frederic
      RightOnly

      I understand that you try to find entries in left file which also occur in right file. In other words: Only those (duplicate) entries which occur in both files. Actually, there is no option in Comm that will provide exactly this result. Maybe '-2' gets pretty close to it. But note that it will also output entries that are "unique to right", i.e. missing in left.

      However, this is just a hint which may be irrelevant for you...

      By the way, why don't you try a NT clip which could be even more adapted to your needs? For example, the following clip compares both lists providing exactly only those entries contained in both lists. Different from Comm, no sorting is needed in both files:

      ^!SetWizardWidth 75
      ^!SetWizardLabel Compare two lists
      ^!Set %File1%=^?{(T=O;F="Text files (*.txt)|*.txt")First file:==^%File1%}; %File2%=^?{(T=O;F="Text files (*.txt)|*.txt")Second file:==^%File2%}
      ^!SetScreenUpdate Off
      ^!InsertFile ^%File1%
      ; Insert empty line at end of list if missing
      ^!Replace "\R*\Z" >> "\r\n" WRS
      ^!InsertFile ^%File2%
      ^!Select All
      ^!InsertText ^$StrSort("^$GetSelection$";0;1;0)$
      ; Insert if needed: ^!Delay 5
      ; Assign duplicates to clipboard
      ^!SetClipboard ^$GetDocListAll("(^.+)(\r\n\1)+(\r\n|\Z)";"$1\r\n")$
      ^!Close Discard
      ^!InsertText Duplicate names^P^$StrFill("-";20)$^P
      ^!Paste

      Regarding your first posting, you are starting with a list as first (or right) file whereas the second file is containing continuous text merging file names and all sort of data. The next clip deals with that type of file. That is, you don't have to extract a list of entries from the second file because the clip will access that file directly and search for (duplicate) entries occurring in both files:

      ^!SetWizardWidth 75
      ^!SetWizardLabel Find duplicate file names
      ^!Set %File1%=^?{(T=O;F="Text files (*.txt)|*.txt")List containing file names:==^%File1%}; %File2%=^?{(T=O;F="Target file (*.txt)|*.txt")Second file:==^%File2%}
      ^!SetScreenUpdate Off
      ^!InsertFile ^%File1%
      ^!Replace "\R{1,}\Z" >> "" WRS
      ; Create alternation from list
      ^!Replace "^P" >> "|" WAS
      ^!Set %Search%=(^$GetText$)
      ^!Close Discard
      ^!Open ^%File2%
      ^!SetListDelimiter ^%NL%
      ^!SetClipboard ^$GetDocMatchAll("(?i)^%Search%")$
      ^!Close
      ^!Paste
      ; Note: Check 'Options|Tools|Sort Removes Duplicates'
      ^!Toolbar Sort Ascending
      ^!Jump Doc_Start
      ^!InsertText File names in ^$GetName(^%File1%)$ also contained in ^$GetName(^%File2%)$^P^P

      Start both clips from an empty document.

      Please note: These clips are a first draft only and, certainly, have to get attuned to your conditions.

      Regards,
      Flo

      P.S. Check for false line breaks inserted by Yahoo!

      If I'm not mistaken, the options in this tool are a bit confusing.

      '-1' is quite plausible. It "suppresses" all entries that are "unique to left file". The result displays a set of all entries (left + right) excluding what is "unique to left file".

      '-2' actually doesn't "suppress" anything but displays positively what is "unique to left file" and positively what has been removed from left file. What you see in the right column isn't "unique" at all but occurs in both files. So what is "suppressed" here in the left column are entries occurring in both files being "not unique".

      '-3' also doesn't "suppress" anything that is "unique" in any file but it actually displays positively what is "unique" in left and right file.

      I tested this with...

      file_left.txt:

      Anthony
      Bertha
      Carla
      Dorothy
      Edward
      Frederic
      111

      file_right.txt:

      Anthony
      Dorothy
      Edward
      Frederic
      999

      My clip using -- please edit the path to COMM...

      ^!SetWizardWidth 75
      ^!SetWizardLabel Compare two lists
      ^!Set %File1%=^?{(T=O;F="Text files (*.txt)|*.txt")Left file:==^%File1%}; %File2%=^?{(T=O;F="Text files (*.txt)|*.txt")Right file:==^%File2%}; %Opt%=^?{(T=L)Suppress:==_Unique to left file^=-1|Unique to right file^=-2|Unique to both files^=-3}
      ^!SetClipboard ^$GetDosOutput([PATH]Comm ^%File1% ^%File2% ^%Opt%)$

      :Opt1
      ^!IfSame "^%Opt%" "-1" Next Else Opt2
      ^!SetClipboard ^$StrReplace("^T";"";"^$GetClipboard$";A)$
      ^!Toolbar New Document
      ^!InsertText Suppressing unique to left file:^P^P
      ^!Paste
      ^!Goto End

      :Opt2
      ^!IfSame "^%Opt%" "-2" Next Else Opt3
      ^!Toolbar New Document
      ^!Paste
      ^!Jump Doc_Start
      ^!InsertText Suppressing unique to right file:^P^P
      ^!Goto End

      :Opt3
      ^!Toolbar New Document
      ^!Paste
      ^!Replace "^T" >> "^T^T" WAS
      ^!Jump Doc_Start
      ^!InsertText Suppressing unique to both files^PUnique to left^TUnique to right^P^P

      Regarding you first posting, you are starting with a list as first (or "left") file. The second (or "right") file contains continuous text merging file names with all sort of data. The following is dealing with this condition. That is, you don't have to make a list of entries from the second file because the clip will directly access that file searching the entries from file #1.


    • Show all 12 messages in this topic