Re: Finding out matching and not matching entries between two files !
- Hi Jim,
Thanks for the response.
I was doing the approach which you have mentioned as the fastest one. I was
using nested data structures for the same. The only problem with that
approach which I faced are :-
1. The code was getting too much complex due to use of too many references
to implement nested data structure.
2. Finding out the matched line was not much of an issue, however I wanted
to find out the unmatched records and that's where i had to scan the files
more than twice. Perhaps my algorithm was not efficient but as performance
is not the criteria for my script, I can afford the same.
Is there any CPAN module (preferably built-in) which takes a CSV or
character delimited file as an input and generate a nested data structure
containing entire file contents automatically. Also is there any module for
file comparison of two similar format files.
Thanks & Regards,
On Thu, Jul 16, 2009 at 7:18 AM, Jim Gibson <jimsgibson@...> wrote:
> At 2:12 AM -0700 7/16/09, Amit Saxena wrote:
>> Hi all,
>> I need help regarding the approach to find out matched and unmatched
>> between two files using perl.
>> As the number of lines in the files would be around 10k-50k, I don't want
>> load entire file contents into memory.
> The fastest approach is usually to load the shorter of the two files into
> memory, then read the longer of the two files and process each line,
> recording whether the line matches any record in the shorter file. A hash is
> best for this method. 50k files should be no problem.
> If you really don't or can't read one of the files into memory, then a
> method that still requires only one pass over each of the two files is to
> sort the files and save the sorted copies. Then, read one line from each
> file and compare. If they are equal, record this fact and read two more
> lines. If they do not match, record the fact and read a line from the file
> with the lessor of the two line, alphabetically speaking, then compare
> Jim Gibson
[Non-text portions of this message have been removed]