Loading ...
Sorry, an error occurred while loading the content.

24213Re: [Clip] Find Common Words Among Two Documents

Expand Messages
  • flo.gehrke
    Dec 6, 2013
    • 0 Attachment
      In case anyone is still interested in using Comm.exe: Here is a clip that simulates Comm.exe. In my view, the clip has some advantages:

      1. You can choose 'Case-sensitive Yes|No'. Comm is case-sensitive only.

      2. The output is arranged in one column. This provides a better overview when comparing long lines. The arrangement in two columns with Comm doesn't work reliably anyway. Quite often, the entries are assigned to the wrong column.

      3. The text in the dialog accords with what the clip will perform. Option '-3' in Comm doesn't really "suppress" any unique entry but it positively outputs what is unique.

      4. You don't have to care for a sorted input. The clip will sort both lists, remove empty lines etc. Comm works with sorted lists only.

      Give this a try...

      ^!SetWizardWidth 75
      ^!SetWizardLabel Compare two lists
      ^!Set %File1%=^?{(T=O;F="Textfiles (*.txt)|*.txt")First file:=^%File1%}; %File2%=^?{(T=O;F="Textfiles (*.txt)|*.txt")Second file:=^%File2%}; %Opt%=^?{(T=L)Action:==_Suppress unique to first file^=First|Suppress unique to second file^=Second|Show unique to both files^=Both}; %Case%=^?{Case-sensitive:==Yes|_No}
      ^!SetScreenUpdate Off
      ^!InsertFile ^%File1%
      ; Mark entries from file #1 / Remove trailing blanks
      ^!Replace "(\w)\x20*$" >> "$1#1" WARS
      ^!Jump Doc_End
      ^!InsertText ^P^$GetFileText(^%File2%)$
      ^!Select All
      ^!Menu Modify/Lines/Trim Blanks
      ; Remove empty lines
      ^!Replace "^\R|\R{1,}\Z" >> "" WARS
      ^!IfFalse ^%Case% Next Else Skip_2
      ^!Select All
      ^!Toolbar Lower Case
      ; Sort ascending, case-sensitive, removing duplicates
      ^!Select All
      ^$StrSort("^$GetSelection$";1;1;1)$
      ^!IfSame "^%Opt%" "First" Next Else Second

      :First
      ^!Set %1stMit%=^$GetDocListAll("(^.+)(\r\n\1#1)+(\r\n|\Z)";"$1\r\n")$
      ^!Set %1stOhne%=^$GetDocListAll("^.+$(?<!#1)";"$0\r\n")$
      ^!Set %Combi%=^%1stMit%^%1stOhne%
      ^!Set %Combi%=^$StrSort("^%Combi%";0;1;1)$
      ^!Select All
      ^!InsertText Suppress unique to first file^P^P^%Combi%
      ^!Goto Out

      :Second
      ^!IfSame "^%Opt%" "Second" Next Else Both
      ^!Replace "^.+(?<!#1)(\R|\Z)" >> "" WARS
      ^!Replace "#1$" >> "" WARS
      ^!Jump Doc_Start
      ^!InsertText Suppress unique to second file:^P^P
      ^!Goto Out

      :Both
      ^!Replace "(^[^\r\n]+)\r\n(\1#1)(\r\n)?" >> "" AWRS
      ^!SetListDelimiter ^%NL%
      ^!Set %First%=^$GetDocMatchAll("^[^\r\n]+#1$")$
      ^!Replace "^([^\r\n]+)#1(\r\n)?" >> "" AWRS
      ^!Jump Doc_Start
      ^!InsertText Unique to second file:^P^P
      ^!Jump Doc_Start
      ^!InsertText Unique to first file:^P^P
      ^!InsertText ^%First%^P^P
      ^!Replace "#1$" >> "" AWRS

      :Out
      ;^!ClearVariables


      When disabling '^!ClearVariables', the clip will automatically select the source files if you re-start the clip. Maybe that's more comfortable.

      Tested with simple entries only...

      First file:

      Anthony
      Bertha
      Dorothy
      edward
      Frederic
      LeftOnly
      Carla

      Second file:

      RightOnly
      Anthony
      bobby
      Dorothy
      Edward
      Frederic
      Posemuckel

      Didn' test it -- but with more complicated entries we possibly have to do something about RegEx metacharacters...

      Regards,
      Flo
    • Show all 12 messages in this topic