Loading ...
Sorry, an error occurred while loading the content.

strange Benchmark results (was [PBML] regular expression help)

Expand Messages
  • Dave Gray
    ... For those of us who are Benchmark nerds: Benchmark::cmpthese(100000,{ regex = sub{ my$e= a.b.c.d.e ; $e=~s/^(.*?) ..*$/$1/; }, split = sub{
    Message 1 of 7 , Aug 5, 2004
    • 0 Attachment
      > I know that I could use split, but since perl has a regexp engine why don't
      > use it?

      For those of us who are Benchmark nerds:

      Benchmark::cmpthese(100000,{
      regex => sub{ my$e="a.b.c.d.e"; $e=~s/^(.*?)\..*$/$1/; },
      split => sub{ my$e="a.b.c.d.e"; $e=(split ".",$e)[0]; }
      });
      --output--
      Rate regex split
      regex 82237/s -- -2%
      split 84175/s 2% --

      The split is actually a tiny bit faster in this case. UNLESS you move
      the string declaration out of the subroutines being benchmarked:

      $e="a.b.c.d.e";
      Benchmark::cmpthese(2000000,{
      regex => sub{ $e=~s/^(.*?)\..*$/$1/; },
      split => sub{ $e=(split ".",$e)[0]; }
      });
      --output--
      Rate split regex
      split 790514/s -- -73%
      regex 2941176/s 272% --

      In which case the regular expression is sooo much faster compared to
      the split than before... Why exactly is that?

      Confused,
      Dave
    • Jenda Krynicky
      From: Dave Gray ... For example because after the very first iteration of one of the tested subroutines $e gets set to a . This means
      Message 2 of 7 , Aug 5, 2004
      • 0 Attachment
        From: Dave Gray <yargevad@...>
        > > I know that I could use split, but since perl has a regexp engine
        > > why don't use it?
        >
        > For those of us who are Benchmark nerds:
        >
        > Benchmark::cmpthese(100000,{
        > regex => sub{ my$e="a.b.c.d.e"; $e=~s/^(.*?)\..*$/$1/; },
        > split => sub{ my$e="a.b.c.d.e"; $e=(split ".",$e)[0]; }
        > });
        > --output--
        > Rate regex split
        > regex 82237/s -- -2%
        > split 84175/s 2% --
        >
        > The split is actually a tiny bit faster in this case. UNLESS you move
        > the string declaration out of the subroutines being benchmarked:
        >
        > $e="a.b.c.d.e";
        > Benchmark::cmpthese(2000000,{
        > regex => sub{ $e=~s/^(.*?)\..*$/$1/; },
        > split => sub{ $e=(split ".",$e)[0]; }
        > });
        > --output--
        > Rate split regex
        > split 790514/s -- -73%
        > regex 2941176/s 272% --
        >
        > In which case the regular expression is sooo much faster compared to
        > the split than before... Why exactly is that?

        For example because after the very first iteration of one of the
        tested subroutines $e gets set to "a". This means that except in the
        first iteration the regexp just tries to find a dot, fails and
        doesn't do anything with the string while the split doesn't find the
        dot so it constructs a list with one item, takes the first element of
        the list and assigns it to the variable. Much more work to do isn't
        it?

        Jenda
        ===== Jenda@... === http://Jenda.Krynicky.cz =====
        When it comes to wine, women and song, wizards are allowed
        to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery
      • Mark Reed
        ... But, this is true whether the string declaration is inside or outside of the benchmark subroutine. He s not asking for help comparing the split with the
        Message 3 of 7 , Aug 5, 2004
        • 0 Attachment
          --- Jenda Krynicky <Jenda@...> wrote:

          > From: Dave Gray <yargevad@...>
          > > > I know that I could use split, but since perl
          > has a regexp engine
          > > > why don't use it?
          > >
          > > For those of us who are Benchmark nerds:
          > >
          > > Benchmark::cmpthese(100000,{
          > > regex => sub{ my$e="a.b.c.d.e";
          > $e=~s/^(.*?)\..*$/$1/; },
          > > split => sub{ my$e="a.b.c.d.e"; $e=(split
          > ".",$e)[0]; }
          > > });
          > > --output--
          > > Rate regex split
          > > regex 82237/s -- -2%
          > > split 84175/s 2% --
          > >
          > > The split is actually a tiny bit faster in this
          > case. UNLESS you move
          > > the string declaration out of the subroutines
          > being benchmarked:
          > >
          > > $e="a.b.c.d.e";
          > > Benchmark::cmpthese(2000000,{
          > > regex => sub{ $e=~s/^(.*?)\..*$/$1/; },
          > > split => sub{ $e=(split ".",$e)[0]; }
          > > });
          > > --output--
          > > Rate split regex
          > > split 790514/s -- -73%
          > > regex 2941176/s 272% --
          > >
          > > In which case the regular expression is sooo much
          > faster compared to
          > > the split than before... Why exactly is that?
          >
          > For example because after the very first iteration
          > of one of the
          > tested subroutines $e gets set to "a". This means
          > that except in the
          > first iteration the regexp just tries to find a dot,
          > fails and
          > doesn't do anything with the string while the split
          > doesn't find the
          > dot so it constructs a list with one item, takes the
          > first element of
          > the list and assigns it to the variable. Much more
          > work to do isn't
          > it?

          But, this is true whether the string declaration is
          inside or outside of the benchmark subroutine. He's
          not asking for help comparing the split with the
          regexp. He's asking why the results are so
          dramatically different by simply moving a string
          outside the benchmark subroutine. Right?

          I'm interested in an answer myself, as much of my job
          lately has been tuning regexp.

          >
          > Jenda
          > ===== Jenda@... === http://Jenda.Krynicky.cz
          > =====
          > When it comes to wine, women and song, wizards are
          > allowed
          > to get drunk and croon as much as they like.
          > -- Terry Pratchett in Sourcery
          >
          >



          __________________________________________________
          Do You Yahoo!?
          Tired of spam? Yahoo! Mail has the best spam protection around
          http://mail.yahoo.com
        • Dave Gray
          ... I think, unfortunately, there is a stupid explanation for some of this... After the first call to either function, $e == a ; and the regex apparently
          Message 4 of 7 , Aug 5, 2004
          • 0 Attachment
            > I'm interested in an answer myself, as much of my job
            > lately has been tuning regexp.

            I think, unfortunately, there is a stupid explanation for some of
            this... After the first call to either function, $e == 'a'; and the
            regex apparently short-circuits faster than the split.

            But even after "fixing" that:

            use strict;
            use Benchmark;
            our $e="a.b.c.d.e";
            Benchmark::cmpthese(2000000,{
            regex => sub{ local$e; $e=~s/^(.*?)\..*$/$1/; },
            split => sub{ local$e; $e=(split ".",$e)[0]; }
            });
            --output--
            Rate split regex
            split 621118/s -- -53%
            regex 1315789/s 112% --

            That's still a big difference from declaring the strings in each sub...

            (I've been running these as perl -e one-liners, and here's where I
            actually pasted the code into a file and ran it with warnings, and to
            make sure it ran under strictures.)

            I think the answer has to do with all the "uninitialized value"
            warnings you get if that code is run with warnings turned on. I don't
            see how $e could lose its value, but it appears to somehow.

            Anybody see something else I've missed?

            Still Confused,
            Dave

            PS - Moral of the story, always use strict and warnings when you're
            playing with something.
          • Jenda Krynicky
            To: perl-beginner@yahoogroups.com From: Dave Gray Date sent: Thu, 5 Aug 2004 17:50:20 -0400 Subject:
            Message 5 of 7 , Aug 5, 2004
            • 0 Attachment
              To: perl-beginner@yahoogroups.com
              From: Dave Gray <yargevad@...>
              Date sent: Thu, 5 Aug 2004 17:50:20 -0400
              Subject: Re: strange Benchmark results (was [PBML] regular expression help)
              Send reply to: perl-beginner@yahoogroups.com

              > > I'm interested in an answer myself, as much of my job
              > > lately has been tuning regexp.
              >
              > I think, unfortunately, there is a stupid explanation for some of
              > this... After the first call to either function, $e == 'a'; and the
              > regex apparently short-circuits faster than the split.

              Right. That's what I said.

              > But even after "fixing" that:
              >
              > use strict;
              > use Benchmark;
              > our $e="a.b.c.d.e";
              > Benchmark::cmpthese(2000000,{
              > regex => sub{ local$e; $e=~s/^(.*?)\..*$/$1/; },
              > split => sub{ local$e; $e=(split ".",$e)[0]; }
              > });
              > --output--
              > Rate split regex
              > split 621118/s -- -53%
              > regex 1315789/s 112% --
              >
              > That's still a big difference from declaring the strings in each
              > sub...

              What about testing your "fix"? :-)

              Try this:
              our $x = "Global";
              sub foo {
              local $x;
              print "The value of \$x is '$x'\n";
              }
              foo();
              print "The value of \$x is '$x'\n";

              A real fix would be

              use strict;
              use Benchmark;
              our $e="a.b.c.d.e";
              Benchmark::cmpthese(2000000,{
              regex => sub{ local $e = $e; $e=~s/^(.*?)\..*$/$1/; },
              split => sub{ local $e = $e; $e=(split ".",$e)[0]; }
              });

              The results are closer to the version with the lexicals.

              > (I've been running these as perl -e one-liners, and here's where I
              > actually pasted the code into a file and ran it with warnings, and to
              > make sure it ran under strictures.)
              >
              > I think the answer has to do with all the "uninitialized value"
              > warnings you get if that code is run with warnings turned on. I don't
              > see how $e could lose its value, but it appears to somehow.
              >
              > Anybody see something else I've missed?

              The reason for the uninitialized varnings is that
              local $var;
              doesn't mean just
              "localize my changes to this variable"
              but
              "give me a brand new local variable, I don't care what was in it
              originaly".

              After
              local $var;
              the $var is always uninitialized, no matter what value did it have
              before. Therefore if you do want to keep the old value you need to
              assign it to the local variable:

              local $var = $var;

              it looks strange, but it's logical. The variable is localized only
              AFTER this statement. So within the righthand side of the statement
              you still have access to the old value.

              > Still Confused,
              > Dave
              >
              > PS - Moral of the story, always use strict and warnings when you're
              > playing with something.

              And the other moral, if something behaves funy create a test script,
              watch the variables and read the docs :-)

              HTH, Jenda
              ===== Jenda@... === http://Jenda.Krynicky.cz =====
              When it comes to wine, women and song, wizards are allowed
              to get drunk and croon as much as they like.
              -- Terry Pratchett in Sourcery
            • Dave Gray
              ... You re right, I completely mis-read your previous message. ... =P Also, don t submit code to a mailing list while you re as distracted as I was today.
              Message 6 of 7 , Aug 5, 2004
              • 0 Attachment
                > > I think, unfortunately, there is a stupid explanation for some of
                > > this... After the first call to either function, $e == 'a'; and the
                > > regex apparently short-circuits faster than the split.
                >
                > Right. That's what I said.

                You're right, I completely mis-read your previous message.

                > > PS - Moral of the story, always use strict and warnings when you're
                > > playing with something.
                >
                > And the other moral, if something behaves funy create a test script,
                > watch the variables and read the docs :-)

                =P Also, don't submit code to a mailing list while you're as
                distracted as I was today. Thanks for the 'local', um, reminder.

                Cheers,
                Dave
              • Brad Lhotsky
                I threw together a comparison as well, here are my results: Rate switching splitting matching switching 2146/s -- -55% -61% splitting 4717/s
                Message 7 of 7 , Aug 6, 2004
                • 0 Attachment
                  I threw together a comparison as well, here are my results:

                  Rate switching splitting matching
                  switching 2146/s -- -55% -61%
                  splitting 4717/s 120% -- -14%
                  matching 5495/s 156% 16% --

                  Code follows:

                  #!/usr/bin/perl

                  use strict;
                  use Benchmark qw( cmpthese );

                  my @strings = qw(
                  aasfklaashasfafhsfahsfa.agasgl;asgjl;gasas.fa.ag
                  1.12afas.ag.asgasga.1.af.aga
                  .asgaagasga.sga.sg.asgasgasg.
                  afs.asfa.sf.a.q.2.asfa
                  );

                  cmpthese(5000,
                  {
                  splitting => \&splitting,
                  switching => \&switching,
                  matching => \&matching
                  }
                  );

                  sub splitting {
                  my @t = @strings;
                  (split '.',$_,2)[0] for @t;
                  }

                  sub switching {
                  my @t = @strings;
                  s/^(.*?)\..*$/$1/ for @t;
                  }

                  sub matching {
                  my @t = @strings;
                  /^([^\.]*)/ for @t;
                  }

                  On Thu, Aug 05, 2004 at 07:41:32PM -0400, Dave Gray wrote:
                  > > > I think, unfortunately, there is a stupid explanation for some of
                  > > > this... After the first call to either function, $e == 'a'; and the
                  > > > regex apparently short-circuits faster than the split.
                  > >
                  > > Right. That's what I said.
                  >
                  > You're right, I completely mis-read your previous message.
                  >
                  > > > PS - Moral of the story, always use strict and warnings when you're
                  > > > playing with something.
                  > >
                  > > And the other moral, if something behaves funy create a test script,
                  > > watch the variables and read the docs :-)
                  >
                  > =P Also, don't submit code to a mailing list while you're as
                  > distracted as I was today. Thanks for the 'local', um, reminder.
                  >
                  > Cheers,
                  > Dave
                  >
                  >
                  >
                  > Unsubscribing info is here: http://help.yahoo.com/help/us/groups/groups-32.html
                  > Yahoo! Groups Links
                  >
                  >
                  >
                  >
                  >

                  --
                  Brad Lhotsky <brad@...>
                Your message has been successfully submitted and would be delivered to recipients shortly.