Loading ...
Sorry, an error occurred while loading the content.

Re: [PBML] Matchin Words Between 2 Files

Expand Messages
  • Nathan E
    This is good an all, but what if I need to put them into a hash? I m sure you see my issue now. Basically, is there any way I can even take two arrays, keep
    Message 1 of 7 , Aug 27, 2003
    • 0 Attachment
      This is good an all, but what if I need to put them into a hash?

      I'm sure you see my issue now. Basically, is there any way I can even
      take two arrays, keep them in the order they were inputted in, and
      match them up together within a hash?

      If I can do that, than we're basically set to go.

      -- Nathan

      --- In perl-beginner@yahoogroups.com, "J.E. Cripps" <cycmn@n...>
      wrote:
      >
      > > #OK we need to loop through _both_ files, suggestions?
      >
      > This will _print_ alternating lines (without the
      > split, etc, in the earlier msg.) No doubt
      > there are other ways.
      >
      >
      > #!/usr/bin/perl -w
      >
      > use strict;
      >
      > die "Usage input.file.1 input.file.2 outputfile\n" unless (@ARGV
      == 3);
      >
      > my $in1 = shift; my $in2 = shift; my $destination = shift;
      >
      > open IN1, "<$in1" or die "can't open $in1 $!\n";
      > open IN2, "<$in2" or die "can't open $in2 $!\n";
      > open OUT, "> $destination" or die "can't write to $destination $!
      \n";
      >
      > my $line1 = <IN1>;
      > my $line2 = <IN2>;
      >
      > my $done = 0;
      >
      > until ($done) {
      >
      > print $line1; print $line2;
      > $done = 1 unless defined ($line1 = <IN1>);
      > $done = 2 unless defined ($line2 = <IN2>);
      > }
      >
      >
      > close IN1;
      > close IN2;
      > close OUT;
    • Jeff Eggen
      ... So, you could say that you want an array of arrays: my @list = ( [ $line1fromfile1, $line1fromfile2 ], [ $line2fromfile1, $line2fromfile2 ], ... ); You
      Message 2 of 7 , Aug 27, 2003
      • 0 Attachment
        >>> nathan@... 08/27/03 03:35pm >>>
        >This is good an all, but what if I need to put them into a hash?

        >I'm sure you see my issue now. Basically, is there any way I can even

        >take two arrays, keep them in the order they were inputted in, and
        >match them up together within a hash?

        >If I can do that, than we're basically set to go.

        So, you could say that you want an array of arrays:

        my @list = ( [ $line1fromfile1, $line1fromfile2 ],
        [ $line2fromfile1, $line2fromfile2 ],
        ...
        );

        You could try the following:

        my @list = ();
        open FILE1, "< file1";
        while (<FILE1>)
        {
        my $ref2line = [ $_ ];
        push @list, $ref2line;
        }
        close FILE1;

        open FILE2, "< file2";
        my $counter = 0;
        while (<FILE2>)
        {
        push @{$list[$counter++]}, $_;
        }
        close FILE2;

        Once this is done, you now have a correlation between line numbers in
        each file. To do something for each line of both files, just step
        through @list.

        I don't think this is an elegant solution to your question, but I do
        think it'll do the job. Also, that code is just off the top of my head,
        and may need tweaking. Hopefully someone more learned than myself will
        point out any booboos, or point out glaring problems with my logic.

        Hope this helps,

        Jeff Eggen
        IT Programmer Analyst
        Saskatchewan Government Insurance
        Ph (306) 751-1795
        email jeggen@...
      • Nathan E
        I guess the rest of the problem is the main issue. I need to be able to aggregate on words. So, all words in the first list will end up being unique, with all
        Message 3 of 7 , Aug 28, 2003
        • 0 Attachment
          I guess the rest of the problem is the main issue. I need to be able
          to aggregate
          on words. So, all words in the first list will end up being unique,
          with all of the
          value brought together under that one word as opposed to being
          associated with
          50 of the same word at different places. I then also have to say
          where the word
          from the second set of words came from.

          I tried doing this, but it's not working. It's, obviously, not that
          pretty. It's also not
          working, but I feel as though I should be close. Perhaps one or two
          small
          corrections may set me back on track.

          Thanks guys!

          # Ask for the files used before and after stemming

          $f = 0;
          print 'How Many File Pairs For Input? ';
          $number = <stdin>;

          while ($f <= ($number - 1)){
          print 'Un-Stemmed File ', $f + 1, ': ';
          $unstemmedfile = <stdin>;
          print 'Associated Stemmed File ', $f + 1, ': ';
          $stemmedfile = <stdin>;

          @unstemmedfiles[$f] = $unstemmedfile;
          @stemmedfiles[$f] = $stemmedfile;
          $f++;
          }

          # Open the un-stemmed and stemmed files into handles relative to
          their names

          foreach $file (@unstemmedfiles) {
          open $file, $file or die "Cannot open '$file': $!";
          }

          foreach $file (@stemmedfiles) {
          open $file, $file or die "Cannot open '$file': $!";
          }

          $f = 0;
          while ( $f <= ($number - 1) ) {
          while ( <$stemmedfiles[$f]> ) {
          chomp $_;
          @stemmedwords = split / /, $_; # Split the input from
          $_ into
          words
          }
          while ( <$unstemmedfiles[$f]> ) {
          chomp$_;
          @unstemmedwords = split / /, $_;
          }
          $length = $#stemmedwords; # Find the length of
          @Stemmedwords
          $n = 0;
          while ($n <= $length) {
          # If the stemmed word already exists in %Comparison,
          # than add the unstemmed word to it's array of associated values
          if (exists $Comparison{$stemmedwords[$n]} ){
          @Temp = [@{ $Comparison{$stemmedwords[$n]} }];
          push @Temp, $unstemmedwords[$n];
          $Comparison{$stemmedwords[$n]} = @Temp;
          $Comparison{$stemmedwords[$n]}{$unstemmedwords[$n]}[0] =
          $unstemmedfiles[$f];
          # ^^ Add the file name

          }
          # Otherwise map the stemmed word to the unstemmed word in
          %Comparison
          # and add the unstemmed word's file name mapped to it
          else {
          %Comparison = ( $stemmedwords[$n] => $unstemmedwords[$n] );
          $Comparison{$stemmedwords[$n]}{$unstemmedwords[$n]}[0] =
          $unstemmedfiles[$f];

          }
          $n++; # Increment through all of the words currently in
          @stemmedwords
          }
          $f++; # Increment through all of the files in @stemmedfiles
          }
          foreach $value (keys %Comparison) {
          print "$value @{ $Comparison{$value} }\n\n";
          }


          --- In perl-beginner@yahoogroups.com, "Jeff Eggen" <jeggen@s...>
          wrote:
          > >>> nathan@o... 08/27/03 03:35pm >>>
          > >This is good an all, but what if I need to put them into a hash?
          >
          > >I'm sure you see my issue now. Basically, is there any way I can
          even
          >
          > >take two arrays, keep them in the order they were inputted in, and

          > >match them up together within a hash?
          >
          > >If I can do that, than we're basically set to go.
          >
          > So, you could say that you want an array of arrays:
          >
          > my @list = ( [ $line1fromfile1, $line1fromfile2 ],
          > [ $line2fromfile1, $line2fromfile2 ],
          > ...
          > );
          >
          > You could try the following:
          >
          > my @list = ();
          > open FILE1, "< file1";
          > while (<FILE1>)
          > {
          > my $ref2line = [ $_ ];
          > push @list, $ref2line;
          > }
          > close FILE1;
          >
          > open FILE2, "< file2";
          > my $counter = 0;
          > while (<FILE2>)
          > {
          > push @{$list[$counter++]}, $_;
          > }
          > close FILE2;
          >
          > Once this is done, you now have a correlation between line numbers
          in
          > each file. To do something for each line of both files, just step
          > through @list.
          >
          > I don't think this is an elegant solution to your question, but I
          do
          > think it'll do the job. Also, that code is just off the top of my
          head,
          > and may need tweaking. Hopefully someone more learned than myself
          will
          > point out any booboos, or point out glaring problems with my logic.
          >
          > Hope this helps,
          >
          > Jeff Eggen
          > IT Programmer Analyst
          > Saskatchewan Government Insurance
          > Ph (306) 751-1795
          > email jeggen@s...
        • Nathan E
          I actually fixed that script mostly now. The problem was that the filehandle s didn t like accessing from an array, so I just set a variable to that array
          Message 4 of 7 , Aug 28, 2003
          • 0 Attachment
            I actually fixed that script mostly now. The problem was that the filehandle <>'s
            didn't like accessing from an array, so I just set a variable to that array value
            and put that variable in for the filehandle instead. Works fine now. Only problem
            now is that I'm having difficulty adding another word to the array of the hash
            when the initial hash value already exists.

            Currently, if I have a one to one value with a "stemmed" word that is unique within
            the text, it outputs what I want properly. When I try to have two words mapped to
            the same stemmed word, it says that it "cannot coerce array into hash".

            Any help?

            I'm doing this;

            $length = $#stemmedwords; # Find the length of
            @Stemmedwords
            $n = 0;
            while ($n <= $length) {
            # If the stemmed word already exists in %Comparison,
            # than add the unstemmed word to it's array of associated values
            if (exists $Comparison{$stemmedwords[$n]} ){
            @Temp = [$Comparison{$stemmedwords[$n]}];
            push @Temp, $unstemmedwords[$n];
            $Comparison{$stemmedwords[$n]} = [@Temp];
            $Comparison{$stemmedwords[$n]}{$unstemmedwords[$n]}[0] =
            $unstemmedfiles[$f];
            # ^^ Add the file name

            }
            # Otherwise map the stemmed word to the unstemmed word in
            %Comparison
            # and add the unstemmed word's file name mapped to it
            else {
            $Comparison{$stemmedwords[$n]} = $unstemmedwords[$n] ;
            $Comparison{$stemmedwords[$n]}{$unstemmedwords[$n]}[0] =
            $unstemmedfiles[$f];

            }
          Your message has been successfully submitted and would be delivered to recipients shortly.