Loading ...
Sorry, an error occurred while loading the content.

Re: [PBML] Alphabetize a word

Expand Messages
  • Ryan J Nauman
    Ah, excellent. However, allow my noobness to kick in. Alright so you want me to create a hash that contains the entire list of words (values) associated with
    Message 1 of 15 , Sep 5, 2007
    • 0 Attachment
      Ah, excellent. However, allow my noobness to kick in. Alright so you
      want me to create a hash that contains the entire list of words (values)
      associated with their alphabetized version (keys)? Should I do this as
      soon as I read in the input file? And should I continue with the line of
      code "my @WORDS = <INFILE>;" and create the bucket directly after?

      Now for the "dumping each bucket" portion I don't follow. Assuming I am
      following correctly on the first part, you're sorting the hash we made in
      the first step correct? I don't understand the print line. And is this
      used solely for sorting our newly created hash and then using my
      conditions against this new word list?

      Once the "bucket" hash is created would it be beneficial to my programs
      memory/execution time to close the INFILE and is there a way to delete the
      original @WORDS array now that the improved hash is in place?

      Thanks,
      Ryan




      merlyn@...
      Sent by: perl-beginner@yahoogroups.com
      09/05/2007 12:36 PM
      Please respond to
      perl-beginner@yahoogroups.com


      To
      Ryan J Nauman <RJNauman@...>
      cc
      perl-beginner@yahoogroups.com
      Subject
      Re: [PBML] Alphabetize a word






      >>>>> "Ryan" == Ryan J Nauman <RJNauman@...> writes:

      Ryan> Ah, thanks. Neat little trick. Well I finished my first complete
      working
      Ryan> version of the code and is a bit sluggish. Though this is expected
      since
      Ryan> the input file contains about 170,000 words stuffed into my @WORDS
      array.
      Ryan> Wondering if you guys know any tricks to speed things up anywhere?
      Here
      Ryan> is a link to my source code: http://pastie.textmate.org/94246

      Ooof, yeah. You're doing an exponential matching O(n squared)
      instead of just stashing everything according to its anagram.

      You need to use a "bucket" approach. As you compute each anagram,
      dump the original word into a bucket keyed by anagram, using a hash
      of arrayrefs...

      my %buckets;
      for my $word (@words) {
      my $alphaword = alphabetize($word);
      push @{$bucket{$alphaword}}, $word;
      }

      Now it's simply a matter of dumping each bucket:

      for my $alphaword (sort keys %bucket) {
      my $aref = $bucket{$alphaword};
      print "@$aref\n";
      }

      That will be infinitely faster when you get above a few hundred words.

      --
      Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777
      0095
      <merlyn@...> <URL:http://www.stonehenge.com/merlyn/>
      Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
      See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl
      training!



      [Non-text portions of this message have been removed]
    • Ryan J Nauman
      Can someone please help me understand what is being explained here? I thought I understood it after looking at it a second time but the part that confuses me
      Message 2 of 15 , Sep 7, 2007
      • 0 Attachment
        Can someone please help me understand what is being explained here? I
        thought I understood it after looking at it a second time but the part
        that confuses me now is the declare of "my %buckets" but we never use it,
        we use a scalar called $bucket. So, now I am completely lost.

        If someone could break it down, it'd be much appreciated.

        ------------------------
        Ryan



        merlyn@... (Randal L. Schwartz)
        09/05/2007 12:36 PM

        To
        Ryan J Nauman <RJNauman@...>
        cc
        perl-beginner@yahoogroups.com
        Subject
        Re: [PBML] Alphabetize a word






        >>>>> "Ryan" == Ryan J Nauman <RJNauman@...> writes:

        Ryan> Ah, thanks. Neat little trick. Well I finished my first complete
        working
        Ryan> version of the code and is a bit sluggish. Though this is expected
        since
        Ryan> the input file contains about 170,000 words stuffed into my @WORDS
        array.
        Ryan> Wondering if you guys know any tricks to speed things up anywhere?
        Here
        Ryan> is a link to my source code: http://pastie.textmate.org/94246

        Ooof, yeah. You're doing an exponential matching O(n squared)
        instead of just stashing everything according to its anagram.

        You need to use a "bucket" approach. As you compute each anagram,
        dump the original word into a bucket keyed by anagram, using a hash
        of arrayrefs...

        my %buckets;
        for my $word (@words) {
        my $alphaword = alphabetize($word);
        push @{$bucket{$alphaword}}, $word;
        }

        Now it's simply a matter of dumping each bucket:

        for my $alphaword (sort keys %bucket) {
        my $aref = $bucket{$alphaword};
        print "@$aref\n";
        }

        That will be infinitely faster when you get above a few hundred words.

        --
        Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777
        0095
        <merlyn@...> <URL:http://www.stonehenge.com/merlyn/>
        Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
        See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl
        training!



        [Non-text portions of this message have been removed]
      • Ryan J Nauman
        I have the following code for pushing a new value onto my hash: my %buckets; push(@{ $buckets{ KEYNAME } }, new value ); my @keys = keys %buckets; my @values
        Message 3 of 15 , Sep 7, 2007
        • 0 Attachment
          I have the following code for pushing a new value onto my hash:

          my %buckets;
          push(@{ $buckets{"KEYNAME"} }, "new value");
          my @keys = keys %buckets;
          my @values = values %buckets;
          print %buckets;
          print "\n@keys[0] => @values[0]";

          This code yields the following output...

          KEYNAMEARRAY(0x15d702c)
          KEYNAME => ARRAY(0x15d702c)

          Why won't it print "new value" ???

          I'm running perl, v5.8.8 build 822 from ActiveState.

          [Non-text portions of this message have been removed]
        • bike2ride
          ... With use strict; use warnings; you get the hint that you are not using Perl properly. Scalar value @keys[0] better written as $keys[0] at t.pl line 10.
          Message 4 of 15 , Sep 7, 2007
          • 0 Attachment
            --- In perl-beginner@yahoogroups.com, Ryan J Nauman <RJNauman@...> wrote:
            >
            > I have the following code for pushing a new value onto my hash:
            >
            > my %buckets;
            > push(@{ $buckets{"KEYNAME"} }, "new value");
            > my @keys = keys %buckets;
            > my @values = values %buckets;
            > print %buckets;
            > print "\n@keys[0] => @values[0]";
            >
            > This code yields the following output...
            >
            > KEYNAMEARRAY(0x15d702c)
            > KEYNAME => ARRAY(0x15d702c)
            >
            > Why won't it print "new value" ???
            >
            > I'm running perl, v5.8.8 build 822 from ActiveState.
            >
            > [Non-text portions of this message have been removed]
            >

            With

            use strict;
            use warnings;

            you get the hint that you are not using Perl properly.

            Scalar value @keys[0] better written as $keys[0] at t.pl line 10.
            Scalar value @values[0] better written as $values[0] at t.pl line 10.


            my %bucket = (
            KEYNAME => 'new value',
            ANOTHER => 'another value',
            );

            foreach my $key (keys %bucket) {
            print "key = $key, value = $bucket{$key}\n";
            }

            for more information -> web search "perl hash howto"

            gl
          • a_z0_9_blah
            ... Because $values[0] is an array reference. Try: print n$keys[0] = $values[0][0] ; If you want a better explanation, see:
            Message 5 of 15 , Sep 7, 2007
            • 0 Attachment
              --- In perl-beginner@yahoogroups.com, Ryan J Nauman <RJNauman@...>
              wrote:
              >
              > I have the following code for pushing a new value onto my hash:
              >
              > my %buckets;
              > push(@{ $buckets{"KEYNAME"} }, "new value");
              > my @keys = keys %buckets;
              > my @values = values %buckets;
              > print %buckets;
              > print "\n@keys[0] => @values[0]";
              >
              > This code yields the following output...
              >
              > KEYNAMEARRAY(0x15d702c)
              > KEYNAME => ARRAY(0x15d702c)
              >
              > Why won't it print "new value" ???

              Because $values[0] is an array reference.

              Try:
              print "\n$keys[0] => $values[0][0]";

              If you want a better explanation, see:
              http://perldoc.perl.org/perldsc.html

              (which wiil also lead you to several other manual pages to
              understand Perls datastructures)

              Chris
            • Jenda Krynicky
              - OK then. - Because then it s all backwards. - Why is that? - Please don t top-post! ... Declare the %bucket hash. It s not initialized to anything, but will
              Message 6 of 15 , Sep 8, 2007
              • 0 Attachment
                - OK then.
                - Because then it's all backwards.
                - Why is that?
                - Please don't top-post!


                Ryan J Nauman <RJNauman@...> wrote:
                > Ah, excellent. However, allow my noobness to kick in. Alright so you
                > want me to create a hash that contains the entire list of words (values)
                > associated with their alphabetized version (keys)? Should I do this as
                > soon as I read in the input file? And should I continue with the line of
                > code "my @WORDS = <INFILE>;" and create the bucket directly after?
                >
                > Now for the "dumping each bucket" portion I don't follow. Assuming I am
                > following correctly on the first part, you're sorting the hash we made in
                > the first step correct? I don't understand the print line. And is this
                > used solely for sorting our newly created hash and then using my
                > conditions against this new word list?
                >
                > Once the "bucket" hash is created would it be beneficial to my programs
                > memory/execution time to close the INFILE and is there a way to delete the
                > original @WORDS array now that the improved hash is in place?
                >
                > Thanks,
                > Ryan

                OK. Let me try to dissect the merlyn's code for you:

                > merlyn@... wrote:
                > Ooof, yeah. You're doing an exponential matching O(n squared)
                > instead of just stashing everything according to its anagram.
                >
                > You need to use a "bucket" approach. As you compute each anagram,
                > dump the original word into a bucket keyed by anagram, using a hash
                > of arrayrefs...
                >
                > my %bucket;

                Declare the %bucket hash. It's not initialized to anything, but will
                be used as a HashOfArrays.

                > for my $word (@words) {
                > my $alphaword = alphabetize($word);

                Let's sort the letters in the word

                > push @{$bucket{$alphaword}}, $word;

                And now let's the word at the the end of the array referenced by the
                value of the hash %bucket using the key $alphaword.

                When storing the first word containing a set of letters the
                $bucket{$alphaword} doesn't exist in the hash, but that doesn't
                matter. Perl will create it for us and as we are aparently expecting
                the value to be a reference to an array ( @$ref is the array pointed
                to by the reference stored in $ref) Perl even creates a brand new
                array for us and stores its reference in $bucket{$alphaword}.

                > }

                Once this loop completes all words in the @words array have been
                processed and are stored in the HashOfArrays named %bucket. They keys
                are the sorted sets of letters and the values of the hash are
                references to arrays containing the words.

                You may clear the @words array now.

                If your @words was created by
                my @WORDS = <INFILE>;

                You might actually get rid of the array completely and save memory.
                Just remove that statement and change the loop to

                while (my $word = <INFILE>) {
                chomp($word);
                my $alphaword = alphabetize($word);
                push @{$bucket{$alphaword}}, $word;
                }


                > Now it's simply a matter of dumping each bucket:
                >
                > for my $alphaword (sort keys %bucket) {
                > my $aref = $bucket{$alphaword};
                > print "@$aref\n";
                > }

                As I said, %bucket is a hash of arrayrefs so in the loop above we get
                the list of keys (the sorted sets of letters)
                keys %bucket
                sort it alphabetically
                sort keys %bucket
                and then loop through the sorted list, get the reference contained in
                the hash and print the values in the referenced array.

                HTH, Jenda

                ===== Jenda@... === http://Jenda.Krynicky.cz =====
                When it comes to wine, women and song, wizards are allowed
                to get drunk and croon as much as they like.
                -- Terry Pratchett in Sourcery
              Your message has been successfully submitted and would be delivered to recipients shortly.