Loading ...
Sorry, an error occurred while loading the content.
 

numbered slice from a hash.

Expand Messages
  • Franki
    Hi guys, I m not chasing any code from you, just ideas and pointers. I have a script that collects unique url s as the keys to a hash, and the value s are the
    Message 1 of 9 , Jan 18, 2005
      Hi guys,

      I'm not chasing any code from you, just ideas and pointers.

      I have a script that collects unique url's as the keys to a hash, and
      the value's are the number of times those URL's have been used.

      sort of like this I guess:

      my %addresses = (
      'http://mypage.com' => '9999',
      'http://mypage2.com' => '8888',
      'http://mypage3.com' => '8852',
      'http://mypage4.com' => '344',
      'http://mypage5.com' => '233',
      'http://mypage6.com' => '115',
      'http://mypage7.com' => '15'
      );

      (In the script, the hash is filled dynamically of course, but for the
      purposes of demonstration, this will do.)

      Anyway, that was fine at first, then I discovered that I was ending up
      with up to 15,000 URL's in the hash, and since I display the results of
      this in a web page, I wanted to add next/previous links and display only
      500 at a time. (because 15000 makes the display really really slow. :-)

      Anyway, in another part of this script, I had the same issue, only it
      was an array, so I just used numbered slices..
      How can I achieve the same thing with the hash? I know I could do it
      using a loop, but I'm wondering if there isn't a "Perl" way to do it.
      So basically I want to be able to grab the first 500 URL's, then the
      next batch and so on.. the next prev code I already have, I just need a
      way to grab a numbered slice, if there isn't a "Perl" way, I'll just use
      a loop, but I wanted to check with the guru's first.


      Any tips would be much appreciated.

      rgds

      Frank
    • merlyn@stonehenge.com
      ... Franki So basically I want to be able to grab the first 500 URL s, then the Franki next batch and so on.. the next prev code I already have, I just need
      Message 2 of 9 , Jan 18, 2005
        >>>>> "Franki" == Franki <franki@...> writes:

        Franki> So basically I want to be able to grab the first 500 URL's, then the
        Franki> next batch and so on.. the next prev code I already have, I just need a
        Franki> way to grab a numbered slice, if there isn't a "Perl" way, I'll just use
        Franki> a loop, but I wanted to check with the guru's first.

        What does "first" mean for an unordered hash?

        When you can answer that, your problem will be nearly solved.

        --
        Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
        <merlyn@...> <URL:http://www.stonehenge.com/merlyn/>
        Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
        See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
      • Franki
        ... Doh! My bad, I forgot to explain in enough detail. The hash is a list of referrers from google, and the number is the number of times that a specific
        Message 3 of 9 , Jan 18, 2005
          merlyn@... wrote:
          >>>>>>"Franki" == Franki <franki@...> writes:
          >
          >
          > Franki> So basically I want to be able to grab the first 500 URL's, then the
          > Franki> next batch and so on.. the next prev code I already have, I just need a
          > Franki> way to grab a numbered slice, if there isn't a "Perl" way, I'll just use
          > Franki> a loop, but I wanted to check with the guru's first.
          >
          > What does "first" mean for an unordered hash?
          >
          > When you can answer that, your problem will be nearly solved.
          >

          Doh!
          My bad, I forgot to explain in enough detail.
          The hash is a list of referrers from google, and the number is the
          number of times that a specific search time shows up.

          I'm currently displaying them in order of popularity.

          foreach my $key (reverse sort {$google_terms{$a} <=> $google_terms{$b} }
          keys %google_terms)

          Dumb of me not to mention that as it complicates things somewhat.

          (It prints out the url's with their percentages, and the actual number
          of occurances in alternating color rows.)
          Looking at this, it seems I am going to have to change to a for loop and
          set start and finish numbers.

          For what its worth, here is the current block of code in all its messyness.

          foreach my $key (reverse sort {$google_terms{$a} <=> $google_terms{$b} }
          keys %google_terms)
          {
          $tr_color = $odd;
          my $percentage = (($google_terms{$key} / $total_searches) * 100);
          $percentage = sprintf("%.2f", $percentage);
          # prints out the returned values in tr/td tags here.
          }
        • merlyn@stonehenge.com
          ... Franki I m currently displaying them in order of popularity. Franki foreach my $key (reverse sort {$google_terms{$a} $google_terms{$b} } Franki keys
          Message 4 of 9 , Jan 18, 2005
            >>>>> "Franki" == Franki <franki@...> writes:

            Franki> I'm currently displaying them in order of popularity.

            Franki> foreach my $key (reverse sort {$google_terms{$a} <=> $google_terms{$b} }
            Franki> keys %google_terms)

            Well, then, you have your list, and you can slice the list:

            @the_whole_list =
            sort { $google_terms{$b} <=> $google_terms{$a} } # why the reverse???
            keys %google_terms;

            @first_100 = @the_whole_list[0..99];

            Etc.

            --
            Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
            <merlyn@...> <URL:http://www.stonehenge.com/merlyn/>
            Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
            See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
          • Franki
            ... The reverse is there to display the highest to the lowest, without it the display starts with a couple of thousand google referers with only one useage,
            Message 5 of 9 , Jan 19, 2005
              merlyn@... wrote:
              >>>>>>"Franki" == Franki <franki@...> writes:
              >
              >
              > Franki> I'm currently displaying them in order of popularity.
              >
              > Franki> foreach my $key (reverse sort {$google_terms{$a} <=> $google_terms{$b} }
              > Franki> keys %google_terms)
              >
              > Well, then, you have your list, and you can slice the list:
              >
              > @the_whole_list =
              > sort { $google_terms{$b} <=> $google_terms{$a} } # why the reverse???
              > keys %google_terms;
              >
              > @first_100 = @the_whole_list[0..99];
              >
              > Etc.
              >


              The reverse is there to display the highest to the lowest, without it
              the display starts with a couple of thousand google referers with only
              one useage, add the reverse to it and it starts with terms that have had
              thousands of uses.

              Rather then put the whole lot into an array to use slices, I realised
              last night I can just put an if and a next in there.
              Something like this:

              # Set some test numbers for start and end.
              my $start_num = 500;
              my $end_num = 1000;

              my $count = 1;
              foreach my $key (reverse sort {$google_terms{$a} <=> $google_terms{$b} }
              keys %google_terms)
              {
              if ($count < $start_num)
              {
              $count +=1;
              next;
              }

              if ($count <= $end_num) # calculate percentage and display
              html if the count is under the required end num.
              {
              $tr_color = $odd;
              my $percentage = (($google_terms{$key} / $total_searches)
              * 100);
              $percentage = sprintf("%.2f", $percentage);
              # Print stuff here.
              }
              $count++;
              }

              I seems to do the job nicely and doesn't require creating another big
              variable. It's still pretty slow in the browser though, which makes me
              think it's as much to do with the sort as it is the display HTML.

              Still, it works, and the sort has to be there anyway, so I guess there
              isn't much I can do to speed it up more.
              Thanks for the help Randal, just reading your post made me realise that
              the problem wasn't as difficult as it seemed.


              Rgds

              Frank
            • Paul Archer
              ... You missed the point: why do a *forward* sort and then *reverse* it? Look again at what Randal wrote. That s a reverse sort. Paul
              Message 6 of 9 , Jan 19, 2005
                8:20pm, Franki wrote:

                >
                > merlyn@... wrote:
                >>>>>>> "Franki" == Franki <franki@...> writes:
                >>
                >>
                >> Franki> I'm currently displaying them in order of popularity.
                >>
                >> Franki> foreach my $key (reverse sort {$google_terms{$a} <=> $google_terms{$b} }
                >> Franki> keys %google_terms)
                >>
                >> Well, then, you have your list, and you can slice the list:
                >>
                >> @the_whole_list =
                >> sort { $google_terms{$b} <=> $google_terms{$a} } # why the reverse???
                >> keys %google_terms;
                >>
                >> @first_100 = @the_whole_list[0..99];
                >>
                >> Etc.
                >>
                >
                >
                > The reverse is there to display the highest to the lowest, without it
                > the display starts with a couple of thousand google referers with only
                > one useage, add the reverse to it and it starts with terms that have had
                > thousands of uses.
                >
                You missed the point: why do a *forward* sort and then *reverse* it?
                Look again at what Randal wrote. That's a reverse sort.

                Paul
              • merlyn@stonehenge.com
                ... Franki The reverse is there to display the highest to the lowest, without it Franki the display starts with a couple of thousand google referers with
                Message 7 of 9 , Jan 19, 2005
                  >>>>> "Franki" == Franki <franki@...> writes:

                  Franki> The reverse is there to display the highest to the lowest, without it
                  Franki> the display starts with a couple of thousand google referers with only
                  Franki> one useage, add the reverse to it and it starts with terms that have had
                  Franki> thousands of uses.

                  Right, but notice I swapped $b and $a to get the same effect.

                  Franki> I seems to do the job nicely and doesn't require creating another big
                  Franki> variable.

                  You're creating it anyway (an anonymous value for the foreach to
                  walk), and your way is now slower.

                  Franki> It's still pretty slow in the browser though, which makes me
                  Franki> think it's as much to do with the sort as it is the display HTML.

                  Right... you want to cache that result between hits. One of my
                  very first magazine articles for WebTechniques was about this very
                  problem: <http://www.stonehenge.com/merlyn/WebTechniques/col02.html>,
                  although today I'd use something with Cache::FileCache instead of files
                  in /tmp.

                  --
                  Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
                  <merlyn@...> <URL:http://www.stonehenge.com/merlyn/>
                  Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
                  See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
                • Franki
                  ... LOL, it was 2am and I was out of it, so no, I didn t notice, but yes I understand now. Thankyou, can t believe I did that, when it first happened, the
                  Message 8 of 9 , Jan 20, 2005
                    merlyn@... wrote:
                    >>>>>>"Franki" == Franki <franki@...> writes:
                    >
                    >
                    > Franki> The reverse is there to display the highest to the lowest, without it
                    > Franki> the display starts with a couple of thousand google referers with only
                    > Franki> one useage, add the reverse to it and it starts with terms that have had
                    > Franki> thousands of uses.
                    >
                    > Right, but notice I swapped $b and $a to get the same effect.

                    LOL, it was 2am and I was out of it, so no, I didn't notice, but yes I
                    understand now.
                    Thankyou, can't believe I did that, when it first happened, the output
                    was round the wrong way and I added the reverse without thinking.

                    > Franki> I seems to do the job nicely and doesn't require creating another big
                    > Franki> variable.
                    >
                    > You're creating it anyway (an anonymous value for the foreach to
                    > walk), and your way is now slower.

                    I'm not sure I understand how your way would display both the url in
                    question and the number of times its been hit from the array.

                    @the_whole_list =
                    sort { $google_terms{$b} <=> $google_terms{$a} } # why the reverse???
                    keys %google_terms;
                    @first_100 = @the_whole_list[0..99];

                    Or are you saying that using the sorted list of keys in @the_whole_list
                    and then using those keys to get the values in question from the
                    original hash (for which I'd need another loop on the array slice I think.)

                    The output as it is now, looks like this:

                    1 search term 1
                    Percentage: (3.51%) Actual: 218
                    2 search term 1
                    Percentage: (2.93%) Actual: 182
                    3 search term 3
                    Percentage: (1.17%) Actual: 73

                    Looking at the code, I suspect I know what is causing the major slowdown.
                    I get the search terms by feeding the referer into the CGI module like so:
                    my $qstring = new CGI($string);
                    my $searchq = $qstring->param('q');

                    In a loop, and then put the result into the hash. (q is the param that
                    google uses to designate the search term)
                    doing that up to 15000 times (the loop) is probably the greatest reason
                    for the slowdown, but I can't think if a more reliable method to get the
                    terms out. I could use a regex, but as you guys keep telling us, we
                    should use the provided tools as they are better then anything we could
                    code outselves. Would cgi::simple be any faster for this? or is there a
                    better module I should be looking at?


                    >
                    > Franki> It's still pretty slow in the browser though, which makes me
                    > Franki> think it's as much to do with the sort as it is the display HTML.
                    >
                    > Right... you want to cache that result between hits. One of my
                    > very first magazine articles for WebTechniques was about this very
                    > problem: <http://www.stonehenge.com/merlyn/WebTechniques/col02.html>,
                    > although today I'd use something with Cache::FileCache instead of files
                    > in /tmp.

                    Thats certainly an interesting read.. thanks, I have looked though your
                    site many times and its one of my "resource" sites.
                    I'm not sure caching is necessary at this point, because I'm convinced
                    its the loop using the CGI module that is causing the slowdown.
                    This code is only run when I want to see the search terms, which is once
                    a month, so I'm prepared to wait for the 5-8 seconds it takes, but if I
                    can fix the CGI issue somehow, that would more then cut that in half
                    I'll bet.

                    rgds

                    Franki
                  • merlyn@stonehenge.com
                    ... Franki @the_whole_list = Franki sort { $google_terms{$b} $google_terms{$a} } # why the reverse??? Franki keys %google_terms; Franki
                    Message 9 of 9 , Jan 20, 2005
                      >>>>> "Franki" == Franki <franki@...> writes:

                      Franki> @the_whole_list =
                      Franki> sort { $google_terms{$b} <=> $google_terms{$a} } # why the reverse???
                      Franki> keys %google_terms;
                      Franki> @first_100 = @the_whole_list[0..99];

                      Franki> Or are you saying that using the sorted list of keys in @the_whole_list
                      Franki> and then using those keys to get the values in question from the
                      Franki> original hash (for which I'd need another loop on the array slice I think.)

                      No, you can get the corresponding values with a hash slice. Learn
                      the power of slices!

                      @first_100_values = @google_terms[@first_100];

                      Franki> Looking at the code, I suspect I know what is causing the major slowdown.
                      Franki> I get the search terms by feeding the referer into the CGI module like so:
                      Franki> my $qstring = new CGI($string);
                      Franki> my $searchq = $qstring->param('q');

                      Most sensible people do the following:

                      use CGI qw(param);
                      my $searchq = param('q');

                      Do *not* use the "object" form of CGI invocation... it just
                      complicates the usage, and people wonder why we need objects for
                      something as simple as CGI param parsing!

                      >> Right... you want to cache that result between hits. One of my
                      >> very first magazine articles for WebTechniques was about this very
                      >> problem: <http://www.stonehenge.com/merlyn/WebTechniques/col02.html>,
                      >> although today I'd use something with Cache::FileCache instead of files
                      >> in /tmp.

                      Franki> Thats certainly an interesting read.. thanks, I have looked though your
                      Franki> site many times and its one of my "resource" sites.
                      Franki> I'm not sure caching is necessary at this point, because I'm convinced
                      Franki> its the loop using the CGI module that is causing the slowdown.

                      No... I'm not convinced. Every CGI hit is *recomputing* the search.
                      If you cache that, it'll be *very* fast.

                      --
                      Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
                      <merlyn@...> <URL:http://www.stonehenge.com/merlyn/>
                      Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
                      See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
                    Your message has been successfully submitted and would be delivered to recipients shortly.