Loading ...
Sorry, an error occurred while loading the content.

Hashes with Multiple Values per Key

Expand Messages
  • msutfin2
    I have a file that contains 6 records - variable length. 1 22 333 4444 555 6 I would like to produce this report... # bytes begin rec # occurrences of
    Message 1 of 2 , Feb 1, 2002
    • 0 Attachment
      I have a file that contains 6 records - variable length.

      1
      22
      333
      4444
      555
      6

      I would like to produce this report...

      # bytes begin rec # occurrences of lines this length
      -------- ------------ --------------------------------
      1 1 2
      2 2 1
      3 3 2
      4 4 1

      I was able to do this by creating a temp table in Sybase, and taking
      multiple passes at the input file. But, it sounded like a good
      application of the multi value hash, described in the Perl Cookbook
      pg. 140. I have yet to figure out how to use a hash with anything
      other than a single value for both the key and value, or a value that
      is a concatentation of values...(but then I become the <split King>
      parsing elements out of the hash value...

      Conceptually, I see the number of bytes being the hash key, and the
      hash value a reference to an array that holds:
      [0] - beginning record number
      [1] - number of occurrences of lines this length

      In the following code (attempt), I have trouble:
      1) Printing what's in the hash. Here's the output I get..

      1 2 4 1
      1 2 4 1
      1 2 4 1
      1 2 4 1

      (shows pretty clearly that I also have a logic problem...)

      2) Also, I have yet try to code for updating the counter that
      reflects the number of occurrences of lines "n" bytes in length. It's
      the topside of the IF statement within the while...

      BIG NOTE: If there's another solution, that doesn't involve writing
      to a file, multiple passes... I'd give it go!

      Thanks in advance,
      Mark Sutin

      #!c:\perl\bin\perl.exe
      # Report on the number of unique line lengths in a file. Display rec
      #length, line number of first occurrence, total number of
      #occurrences. Total number of occurrences would be number of records.

      use strict;

      # Init program variables
      my ($infile, $line, %line_attrib, $key, @keys,
      $record_count, $bytes, $count, $beg_rec_number );

      # Init and open input file
      $infile = "c:\\CWRRECV3.txt";
      open(INFILE, "<$infile") || die "Couldn't open $infile for read: $!
      \n";

      $record_count = 0;

      # While the file handle is open, populate a hash (%line_attrib)
      # Use the record length ($bytes) as the key
      # Populate an array with beginning record number ($beg_rec_number)
      # and number of occurrences (just to count the number of lines this
      length)
      while (<INFILE>) {
      $line = $_; # get the line
      chomp($line); # nix the cr/lf
      $record_count += 1; # count the records
      $bytes = (length($line)); # get the line length

      # check for a hash entry with a key = $bytes
      if (exists ($line_attrib{$bytes})) {
      next; # "next;" is just here to keep the damn thing
      # running while I get help trying to increment
      # the number occurrences in $line_attrib{$bytes}
      } else {
      $beg_rec_number = $record_count;
      # rec num of first occurrence of a line of this
      # long

      $count = 1;
      # number of occurrences of lines of $byte length

      push( @{$line_attrib{$bytes}}, $beg_rec_number,
      $count);
      # I want this to populate the array (hash value)
      # with beginning record number([0]) and
      # occurrences ([1])

      # How do I print the array to see if the
      # $beg_rec_number and $count are what I expect?
      }
      }

      @keys = sort { $line_attrib{$a} <=> $line_attrib{$b} } keys %
      line_attrib;

      foreach $key (@keys) {
      print "$bytes: ", scalar( @{ $line_attrib{$bytes}} ), "
      $beg_rec_number $count\n";
      }

      # Close the input file
      close INFILE;
    • b_harnish
      ... #!perl -w # My version use strict; use Data::Dumper; my %x; while( ) { # Remove cr/lf chomp; my $key = length($_); # $x{length} is an array of line
      Message 2 of 2 , Feb 1, 2002
      • 0 Attachment
        Here are my version (I did it before reading your code), and a modified version of yours:
        --->8--- Cut ---8<---
        #!perl -w
        # My version
        use strict;
        use Data::Dumper;

        my %x;
        while(<DATA>) {
        # Remove cr/lf
        chomp;

        my $key = length($_);

        # $x{length} is an array of line number occurences
        # The @{...} tells perl I'm using it as an array
        # $. = current record #
        push(@{$x{$key}}, $.);
        }

        print Dumper \%x;

        foreach (sort keys(%x)) {
        # Copy array for simpler access.
        my @a = @{$x{$_}};

        # Display Length, Line#, Count
        print join(' ', $_, $a[0], scalar(@a)), "\n";
        }

        __DATA__
        1
        22
        333
        4444
        555
        6
        --->8--- Cut ---8<---

        Re-Commented slightly modified version (based on yours):
        --->8--- Cut ---8<---
        #!c:\perl\bin\perl.exe
        use strict;
        use Data::Dumper;

        # Declare variables at last posible moment

        # Don't double quote what you don't have to
        my $infile = 'c:\CWRRECV3.txt';
        open(INFILE, "<$infile") || die "Couldn't open $infile for read: $!\n";

        my %line_attrib;
        my $record_count = 0;
        while (<INFILE>) {
        my $line = $_;
        chomp($line);
        $record_count += 1;

        my $bytes = (length($line));
        if (exists ($line_attrib{$bytes})) {
        # next;
        # Increase count (2nd element in array)
        $line_attrib{$bytes}->[1]++;
        } else {
        # You don't need to use these variables
        # $beg_rec_number = $record_count;
        # $count = 1;
        push( @{$line_attrib{$bytes}}, $record_count, 1);
        }
        }

        # Dump data
        print Dumper \%line_attrib;

        # No need to do a weird sort, we just want the keys sorted
        # Also, we only care about the sorted order for the keys
        # for the foreach loop, no need for a variable to hold them
        foreach my $key (sort keys(%line_attrib)) {
        # $bytes does not exist here, $key has what you want
        # print "$bytes: ", scalar( @{ $line_attrib{$bytes}} ),
        # " $beg_rec_number $count\n";

        print "$key $line_attrib{$key}->[0] $line_attrib{$key}->[1]\n";
        }
        close INFILE;
        --->8--- Cut ---8<---

        - Brian

        --- In perl-beginner@y..., "msutfin2" <msutfin@a...> wrote:
        > I have a file that contains 6 records - variable length.
        >
        > 1
        > 22
        > 333
        > 4444
        > 555
        > 6
        >
        > I would like to produce this report...
        >
        > # bytes begin rec # occurrences of lines this length
        > -------- ------------ --------------------------------
        > 1 1 2
        > 2 2 1
        > 3 3 2
        > 4 4 1
        >
        > I was able to do this by creating a temp table in Sybase, and taking
        > multiple passes at the input file. But, it sounded like a good
        > application of the multi value hash, described in the Perl Cookbook
        > pg. 140. I have yet to figure out how to use a hash with anything
        > other than a single value for both the key and value, or a value that
        > is a concatentation of values...(but then I become the <split King>
        > parsing elements out of the hash value...
        >
        > Conceptually, I see the number of bytes being the hash key, and the
        > hash value a reference to an array that holds:
        > [0] - beginning record number
        > [1] - number of occurrences of lines this length
        >
        > In the following code (attempt), I have trouble:
        > 1) Printing what's in the hash. Here's the output I get..
        >
        > 1 2 4 1
        > 1 2 4 1
        > 1 2 4 1
        > 1 2 4 1
        >
        > (shows pretty clearly that I also have a logic problem...)
        >
        > 2) Also, I have yet try to code for updating the counter that
        > reflects the number of occurrences of lines "n" bytes in length. It's
        > the topside of the IF statement within the while...
        >
        > BIG NOTE: If there's another solution, that doesn't involve writing
        > to a file, multiple passes... I'd give it go!
        >
        > Thanks in advance,
        > Mark Sutin
        >
        > #!c:\perl\bin\perl.exe
        > # Report on the number of unique line lengths in a file. Display rec
        > #length, line number of first occurrence, total number of
        > #occurrences. Total number of occurrences would be number of records.
        >
        > use strict;
        >
        > # Init program variables
        > my ($infile, $line, %line_attrib, $key, @keys,
        > $record_count, $bytes, $count, $beg_rec_number );
        >
        > # Init and open input file
        > $infile = "c:\\CWRRECV3.txt";
        > open(INFILE, "<$infile") || die "Couldn't open $infile for read: $!
        > \n";
        >
        > $record_count = 0;
        >
        > # While the file handle is open, populate a hash (%line_attrib)
        > # Use the record length ($bytes) as the key
        > # Populate an array with beginning record number ($beg_rec_number)
        > # and number of occurrences (just to count the number of lines this
        > length)
        > while (<INFILE>) {
        > $line = $_; # get the line
        > chomp($line); # nix the cr/lf
        > $record_count += 1; # count the records
        > $bytes = (length($line)); # get the line length
        >
        > # check for a hash entry with a key = $bytes
        > if (exists ($line_attrib{$bytes})) {
        > next; # "next;" is just here to keep the damn thing
        > # running while I get help trying to increment
        > # the number occurrences in $line_attrib{$bytes}
        > } else {
        > $beg_rec_number = $record_count;
        > # rec num of first occurrence of a line of this
        > # long
        >
        > $count = 1;
        > # number of occurrences of lines of $byte length
        >
        > push( @{$line_attrib{$bytes}}, $beg_rec_number,
        > $count);
        > # I want this to populate the array (hash value)
        > # with beginning record number([0]) and
        > # occurrences ([1])
        >
        > # How do I print the array to see if the
        > # $beg_rec_number and $count are what I expect?
        > }
        > }
        >
        > @keys = sort { $line_attrib{$a} <=> $line_attrib{$b} } keys %
        > line_attrib;
        >
        > foreach $key (@keys) {
        > print "$bytes: ", scalar( @{ $line_attrib{$bytes}} ), "
        > $beg_rec_number $count\n";
        > }
        >
        > # Close the input file
        > close INFILE;
      Your message has been successfully submitted and would be delivered to recipients shortly.