Loading ...
Sorry, an error occurred while loading the content.

Re: PATCH perl_reference.pod "Remedies for Inner Subroutines"

Expand Messages
  • Stas Bekman
    ... Thanks Brian. A few comments: [...] ... But that won t work with perl
    Message 1 of 5 , Nov 2, 2003
    • 0 Attachment
      Brian McCauley wrote:

      > Here's a _very_ rough first cut at perl_reference.pod. I haven't even
      > proof-read it yet so it's probably got spelling a and grammar errors
      > but I just want to be sure I'm going in the right direction.

      Thanks Brian. A few comments:

      [...]
      > - #!/usr/bin/perl -w
      > + #!/usr/bin/perl
      >
      > use strict;
      > + use warnings;

      But that won't work with perl < 5.6.

      [...]
      > +The simplest of the workarounds is to use package-scoped variables,
      > +declared using C<our> or, on older versions of Perl, the C<vars>
      > +pragma. Note that whereas using C<my> declaration also implicitly
      > +initializes variables to undefined the C<our> declaration does not,
      > +and so you may need to add explicit initialisation.
      >
      > multirun1.pl
      > - -----------
      > - #!/usr/bin/perl -w
      > + ------------
      > + #!/usr/bin/perl
      >
      > use strict;
      > - use vars qw($counter);
      > -
      > + use warnings;
      > +
      > for (1..3){
      > print "run: [time $_]\n";
      > run();
      > @@ -946,7 +953,7 @@
      >
      > sub run {
      >
      > - $counter = 0;
      > + our $counter = 0;

      I think it would be more clear if all are declared at the top of the file,
      just where the 'use vars' declaration was.

      [...]
      > multirun2.pl
      > - -----------
      > + ------------
      > #!/usr/bin/perl -w
      >
      > use strict;
      > @@ -993,14 +1023,14 @@
      >
      > sub run {
      >
      > - $main::counter = 0;
      > + $::counter = 0;

      what's the gain in doing this? The two are identical and for unexperienced
      perl users $::counter will look totally alien.

      The rest looks good, sans spelling ;)

      __________________________________________________________________
      Stas Bekman JAm_pH ------> Just Another mod_perl Hacker
      http://stason.org/ mod_perl Guide ---> http://perl.apache.org
      mailto:stas@... http://use.perl.org http://apacheweek.com
      http://modperlbook.org http://apache.org http://ticketmaster.com


      --
      Reporting bugs: http://perl.apache.org/bugs/
      Mail list info: http://perl.apache.org/maillist/modperl.html
    • Brian McCauley
      ... This is example code being used to illustrate a point. It is more appropriate therefore (IMHO) that it reflect best current practice than that it maintain
      Message 2 of 5 , Nov 14, 2003
      • 0 Attachment
        Stas Bekman <stas@...> writes:

        > Brian McCauley wrote:
        >
        > > Here's a _very_ rough first cut at perl_reference.pod. I haven't even
        > > proof-read it yet so it's probably got spelling a and grammar errors
        > > but I just want to be sure I'm going in the right direction.
        >
        > Thanks Brian. A few comments:
        >
        > [...]
        > > - #!/usr/bin/perl -w
        > > + #!/usr/bin/perl
        > > use strict;
        > > + use warnings;
        >
        > But that won't work with perl < 5.6.

        This is example code being used to illustrate a point. It is more
        appropriate therefore (IMHO) that it reflect best current practice
        than that it maintain backward compatability. Anyone still using Perl
        < 5.6 must be used to seeing "use warnings" all over the place and
        know that they need to remove it on their system. Unless, of course,
        they are living in total isolation - in which case they'll never see
        this document so the point is moot anyhow.

        > [...]
        > > +The simplest of the workarounds is to use package-scoped variables,
        > > +declared using C<our> or, on older versions of Perl, the C<vars>
        > > +pragma. Note that whereas using C<my> declaration also implicitly
        > > +initializes variables to undefined the C<our> declaration does not,
        > > +and so you may need to add explicit initialisation.
        > > multirun1.pl
        > > - -----------
        > > - #!/usr/bin/perl -w
        > > + ------------
        > > + #!/usr/bin/perl
        > > use strict;
        > > - use vars qw($counter);
        > > -
        > > + use warnings; +
        > > for (1..3){
        > > print "run: [time $_]\n";
        > > run();
        > > @@ -946,7 +953,7 @@
        > > sub run {
        > > - $counter = 0;
        > > + our $counter = 0;
        >
        > I think it would be more clear if all are declared at the top of the
        > file,

        Declaring variables at the top of the file is, IMNSHO, a bad
        programming habit that should be discouraged. Variables' declaration
        and initialisation should be kept together wherever possible. It
        seems more natural. It aids readibility. It aids maintainability. I
        have collegues who program in the "declarations go at the top" style.
        Their programs almost invariably declare variables that are never then
        used.

        > just where the 'use vars' declaration was.

        Remember that you are looking at the diff file so it seems to you that
        I've moved the declaration. The end-user reader of perl_reference.pod
        is not looking at the diff file. They are comparing multirun.pl and
        multirun1.pl. So from their point of view, by simply replacing "my"
        with "our" I'm keeping the declaration in the same place. If we were
        to move the declaration away from the initialisation then I think we'd
        have to explain why we did it. And since I think it's a bad idea I
        couldn't do that.

        > [...]
        > > multirun2.pl
        > > - -----------
        > > + ------------
        > > #!/usr/bin/perl -w
        > > use strict;
        > > @@ -993,14 +1023,14 @@
        > > sub run {
        > > - $main::counter = 0;
        > > + $::counter = 0;
        >
        > what's the gain in doing this? The two are identical and for
        > unexperienced perl users $::counter will look totally alien.

        That, of course, would depend on the nature of their (in)experience to
        date :-). If they have always seen explicit access to variables in the
        root namepace written as $::counter then it is the use of the alias
        $main::counter that will look totally alien.

        However, given that I go on to refer to root namespace as 'main::'
        later on, I suppose it makes sense to be consistant and revert to
        using the alias throughout.

        > The rest looks good, sans spelling ;)

        I'll try to fix that now.

        Combined patch follows.

        --- perl_reference.pod.orig Thu Aug 14 18:11:11 2003
        +++ perl_reference.pod Fri Nov 14 17:39:20 2003
        @@ -863,16 +863,17 @@
        problem, Perl will always alert you.

        Given that you have a script that has this problem, what are the ways
        -to solve it? There are many of them and we will discuss some of them
        -here.
        +to solve it? There have been many suggested in the past, and we
        +discuss some of them here.

        We will use the following code to show the different solutions.

        multirun.pl
        -----------
        - #!/usr/bin/perl -w
        + #!/usr/bin/perl

        use strict;
        + use warnings;

        for (1..3){
        print "run: [time $_]\n";
        @@ -925,20 +926,27 @@
        Counter is equal to 5 !
        Counter is equal to 6 !

        -Obviously, the C<$counter> variable is not reinitialized on each
        -execution of run(). It retains its value from the previous execution,
        -and sub increment_counter() increments that.
        -
        -One of the workarounds is to use globally declared variables, with the
        -C<vars> pragma.
        +Apparently, the C<$counter> variable is not reinitialized on each
        +execution of run(), it retains its value from the previous execution,
        +and increment_counter() increments that. Actually that is not quite
        +what happens. On each execution of run() a new C<$counter> variable
        +is initialized to zero but increment_counter() remains bound to the
        +C<$counter> variable from the first call to run().
        +
        +The simplest of the work-rounds is to use package-scoped variables.
        +These can be declared using C<our> or, on older versions of Perl, the
        +C<vars> pragma. Note that whereas using C<my> declaration also
        +implicitly initializes variables to undefined the C<our> declaration
        +does not, and so you will probably need to add explicit initialisation
        +for variables that lacked it.

        multirun1.pl
        - -----------
        - #!/usr/bin/perl -w
        + ------------
        + #!/usr/bin/perl

        use strict;
        - use vars qw($counter);
        -
        + use warnings;
        +
        for (1..3){
        print "run: [time $_]\n";
        run();
        @@ -946,7 +954,7 @@

        sub run {

        - $counter = 0;
        + our $counter = 0;

        increment_counter();
        increment_counter();
        @@ -977,11 +985,37 @@
        problem, since there is no C<my()> (lexically defined) variable used
        in the nested subroutine.

        -Another approach is to use fully qualified variables. This is better,
        -since less memory will be used, but it adds a typing overhead:
        +In the above example we know C<$counter> is just a simple small
        +scalar. In the general case variables could reference external
        +resource handles or large data structures. In that situation the fact
        +that the variable would not be released immediately when run()
        +completes could be a problem. To avoid this you can put C<local> in
        +front of the C<our> declaration of all variables other than simple
        +scalars. This has the effect of restoring the variable to its
        +previous value (usually undefined) upon exit from the current scope.
        +As a side-effect C<local> also initializes the variables to C<undef>.
        +So, if you recall that thing I said about adding explicit
        +initialization when you replace C<my> by C<our>, well, you can forget
        +it again if you replace C<my> with C<local our>.
        +
        +Be warned that C<local> will not release circular data structures. If
        +the original CGI script relied upon process termination to clean up
        +after it then it will leak memory as a registry script.
        +
        +A varient of the package variable approach is not to declare your
        +variables, but instead to use explicit package qualifiers. This has
        +the advantage on old versions of Perl that there is no need to load
        +the C<vars> module, but it adds a significant typing overhead.
        +Another downside is that you become dependant on the "used only once"
        +warning to detect typos in variable names. The explicit package name
        +approach is not really suitable for registry scripts because it
        +pollutes the C<main::> namespace rather than staying properly within
        +the namespace that has been allocated. Finally, note that the
        +overhead of loading the C<vars> module only has to be paid once per
        +Perl interpreter.

        multirun2.pl
        - -----------
        + ------------
        #!/usr/bin/perl -w

        use strict;
        @@ -1019,10 +1053,11 @@
        and then submit it for your script to process.

        multirun3.pl
        - -----------
        - #!/usr/bin/perl -w
        + ------------
        + #!/usr/bin/perl

        use strict;
        + use warnings;

        for (1..3){
        print "run: [time $_]\n";
        @@ -1056,10 +1091,11 @@
        variables in a calling function.

        multirun4.pl
        - -----------
        - #!/usr/bin/perl -w
        + ------------
        + #!/usr/bin/perl

        use strict;
        + use warnings;

        for (1..3){
        print "run: [time $_]\n";
        @@ -1092,10 +1128,11 @@
        a literal, e.g. I<increment_counter(5)>).

        multirun5.pl
        - -----------
        - #!/usr/bin/perl -w
        + ------------
        + #!/usr/bin/perl

        use strict;
        + use warnings;

        for (1..3){
        print "run: [time $_]\n";
        @@ -1120,14 +1157,27 @@

        Here is a solution that avoids the problem entirely by splitting the
        code into two files; the first is really just a wrapper and loader,
        -the second file contains the heart of the code.
        +the second file contains the heart of the code. This second file must
        +go into a directory in your C<@INC>. Some people like to put the
        +library in the same directory as the script but this assumes that the
        +current working directory will be equal to the directory where the
        +script is located and also that C<@INC> will contain C<'.'>, neither
        +of which are assumptions you should expect to hold in all cases.
        +
        +Note that the name chosen for the library must be unique throughout
        +the entire server and indeed every server on which you many ever
        +install the script. This solution is probably more trouble than it is
        +worth - it is only oncluded because it was mentioned in previous
        +versions of this guide.

        multirun6.pl
        - -----------
        - #!/usr/bin/perl -w
        + ------------
        + #!/usr/bin/perl

        use strict;
        - require 'multirun6-lib.pl' ;
        + use warnings;
        +
        + require 'multirun6-lib.pl';

        for (1..3){
        print "run: [time $_]\n";
        @@ -1138,8 +1188,54 @@

        multirun6-lib.pl
        ----------------
        - use strict ;
        + use strict;
        + use warnings;
        +
        + my $counter;
        +
        + sub run {
        + $counter = 0;
        +
        + increment_counter();
        + increment_counter();
        + }
        +
        + sub increment_counter{
        + $counter++;
        + print "Counter is equal to $counter !\n";
        + }
        +
        + 1 ;
        +
        +An alternative verion of the above, that mitigates some of the
        +disadvantages, is to use a Perl5-style Exporter module rather than a
        +Perl4-style library. The global uniqueness requirement still applies
        +to the module name, but at least this is a problem Perl programmers
        +should already be familiar with when creating modules.
        +
        + multirun7.pl
        + ------------
        + #!/usr/bin/perl

        + use strict;
        + use warnings;
        + use My::Multirun7;
        +
        + for (1..3){
        + print "run: [time $_]\n";
        + run();
        + }
        +
        +Separate file:
        +
        + My/Multirun7.pm
        + ---------------
        + package My::Multirun7;
        + use strict;
        + use warnings;
        + use base qw( Exporter );
        + our @EXPORT = qw( run );
        +
        my $counter;

        sub run {
        @@ -1156,7 +1252,8 @@

        1 ;

        -Now you have at least six workarounds to choose from.
        +Now you have at least five workarounds to choose from (not counting
        +numbers 2 and 6).

        For more information please refer to perlref and perlsub manpages.

        --- porting.pod.orig Fri Oct 10 18:58:48 2003
        +++ porting.pod Mon Oct 27 09:57:15 2003
        @@ -88,7 +88,7 @@

        print "Content-type: text/plain\r\n\r\n";

        - my $counter = 0;
        + my $counter = 0; # Explicit initialization technically redundant

        for (1..5) {
        increment_counter();
        @@ -195,8 +195,8 @@

        print "Content-type: text/plain\r\n\r\n";

        - my $counter = 0;
        -
        + my $counter = 0; # Explicit initialization technically redundant
        +
        for (1..5) {
        increment_counter();
        }
        @@ -228,51 +228,65 @@

        It's important to understand that the I<inner subroutine> effect
        happens only with code that C<Apache::Registry> wraps with a
        -declaration of the C<handler> subroutine. If you put your code into a
        -library or module, which the main script require()'s or use()'s, this
        -effect doesn't occur.
        -
        -For example if we move the code from the script into the subroutine
        -I<run>, place the subroutines into the I<mylib.pl> file, save it in
        -the same directory as the script itself and require() it, there will
        -be no problem at all. (Don't forget the C<1;> at the end of the
        -library or the require() might fail.)
        -
        - mylib.pl:
        - ---------
        - my $counter;
        - sub run{
        - print "Content-type: text/plain\r\n\r\n";
        - $counter = 0;
        - for (1..5) {
        - increment_counter();
        - }
        +declaration of the C<handler> subroutine. If you put all your code
        +into modules, which the main script C<use()>s, this effect doesn't
        +occur.
        +
        +Do not use Perl4-style libraries. Subroutines in such libraries will
        +only be available to the first script in any given interpreter thread
        +to C<require()> a library of any given name. This can lead to
        +confusing sporadic failures.
        +
        +The easiest and the fastest way to solve the nested subroutines
        +problem is to switch every lexically scoped variable foe which you get
        +the warning for to a package variable. The C<handler> subroutines are
        +never called re-entrantly and each resides in a package to itself.
        +Most of the usual disadvantates of package scoped variables are,
        +therefore, not a concern. Note, however, that whereas explicit
        +initialization is not always necessary for lexical variables it is
        +usually necessary for these package variables as they persist in
        +subsequent executions of the handler and unlike lexical variables,
        +don't get automatically destroyed at the end of each handler.
        +
        +
        + counter.pl:
        + ----------
        + #!/usr/bin/perl -w
        + use strict;
        +
        + print "Content-type: text/plain\r\n\r\n";
        +
        + # In Perl <5.6 our() did not exist, so:
        + # use vars qw($counter);
        + our $counter = 0; # Explicit initialization now necessary
        +
        + for (1..5) {
        + increment_counter();
        }
        +
        sub increment_counter{
        $counter++;
        print "Counter is equal to $counter !\r\n";
        }
        - 1;
        -
        - counter.pl:
        - ----------
        - use strict;
        - require "./mylib.pl";
        - run();

        -This solution provides the easiest and the fastest way to solve the
        -nested subroutines problem, since all you have to do is to move the
        -code into a separate file, by first wrapping the initial code into
        -some function that you later will call from the script and keeping the
        -lexically scoped variables that could cause the problem out of this
        -function.
        -
        -But as a general rule of thumb, unless the script is very short, I
        -tend to write all the code in external libraries, and to have only a
        -few lines in the main script. Generally the main script simply calls
        -the main function of my library. Usually I call it C<init()> or
        -C<run()>. I don't worry about nested subroutine effects anymore
        -(unless I create them myself :).
        +If the variable contains a reference it may hold onto lots of
        +unecessary memory (or worse) if the reference is left to hang about
        +until the next call to the same handler. For such variables you
        +should use C<local> so that the value is removed when the C<handler>
        +subroutine exits.
        +
        + my $query = CGI->new;
        +
        +becomes:
        +
        + local our $query = CGI->new;
        +
        +All this is very interesting but as a general rule of thumb, unless
        +the script is very short, I tend to write all the code in external
        +libraries, and to have only a few lines in the main script. Generally
        +the main script simply calls the main function of my library. Usually
        +I call it C<init()> or C<run()>. I don't worry about nested
        +subroutine effects anymore (unless I create them myself :).

        The section 'L<Remedies for Inner
        Subroutines|general::perl_reference::perl_reference/Remedies_for_Inner_Subroutines>' discusses

        --
        Reporting bugs: http://perl.apache.org/bugs/
        Mail list info: http://perl.apache.org/maillist/modperl.html
      • Stas Bekman
        ... Agreed. ... Sorry I ve missed the word globals while typing, so it should have read: I think it would be more clear if all globals are declared at the
        Message 3 of 5 , Nov 14, 2003
        • 0 Attachment
          Brian McCauley wrote:
          > Stas Bekman <stas@...> writes:
          >
          >
          >>Brian McCauley wrote:
          >>
          >>
          >>>Here's a _very_ rough first cut at perl_reference.pod. I haven't even
          >>>proof-read it yet so it's probably got spelling a and grammar errors
          >>>but I just want to be sure I'm going in the right direction.
          >>
          >>Thanks Brian. A few comments:
          >>
          >>[...]
          >>
          >>>- #!/usr/bin/perl -w
          >>>+ #!/usr/bin/perl
          >>> use strict;
          >>>+ use warnings;
          >>
          >>But that won't work with perl < 5.6.
          >
          >
          > This is example code being used to illustrate a point. It is more
          > appropriate therefore (IMHO) that it reflect best current practice
          > than that it maintain backward compatability. Anyone still using Perl
          > < 5.6 must be used to seeing "use warnings" all over the place and
          > know that they need to remove it on their system. Unless, of course,
          > they are living in total isolation - in which case they'll never see
          > this document so the point is moot anyhow.

          Agreed.

          >>>+The simplest of the workarounds is to use package-scoped variables,
          >>>+declared using C<our> or, on older versions of Perl, the C<vars>
          >>>+pragma. Note that whereas using C<my> declaration also implicitly
          >>>+initializes variables to undefined the C<our> declaration does not,
          >>>+and so you may need to add explicit initialisation.
          >>> multirun1.pl
          >>>- -----------
          >>>- #!/usr/bin/perl -w
          >>>+ ------------
          >>>+ #!/usr/bin/perl
          >>> use strict;
          >>>- use vars qw($counter);
          >>>-
          >>>+ use warnings; +
          >>> for (1..3){
          >>> print "run: [time $_]\n";
          >>> run();
          >>>@@ -946,7 +953,7 @@
          >>> sub run {
          >>> - $counter = 0;
          >>>+ our $counter = 0;
          >>
          >>I think it would be more clear if all are declared at the top of the
          >>file,
          >
          >
          > Declaring variables at the top of the file is, IMNSHO, a bad
          > programming habit that should be discouraged. Variables' declaration
          > and initialisation should be kept together wherever possible. It
          > seems more natural. It aids readibility. It aids maintainability. I
          > have collegues who program in the "declarations go at the top" style.
          > Their programs almost invariably declare variables that are never then
          > used.

          Sorry I've missed the word "globals" while typing, so it should have read:

          I think it would be more clear if all globals are declared at the top of the
          file,

          You are talking about the locality rule, which I perfectly agree with when it
          comes to scoped variables.

          However when it comes to globals I tend to disagree with you. globals should
          be all declared together at the top of the file/package so that it's easy to
          learn what variables are globals and not scan the code (perhaps thousands of
          lines trying to spot those globals definition in the code where they are used.

          Consider:

          % perl -le ' {our $x = 5;} print $x'
          5

          Now move that our definition hundreds of lines into the code, are going to
          spot it?

          Granted, using 'use strict':

          % perl -le ' use strict; use warnings; {our $x = 5;} print $x'
          Variable "$x" is not imported at -e line 1.
          Global symbol "$x" requires explicit package name at -e line 1.
          Execution of -e aborted due to compilation errors.

          this potential error of our() scoped variable affecting some other global with
          the same name, won't creep in. But unfortunately many don't use 'use strict'.

          Of course you will say that the example does use 'use strict' and therefore
          this global definition is scoped well and defined where it should be. And
          you'd be right. In which case I've no objections.

          /me is still trying to wrap his head around our()'s subtleties.

          >>just where the 'use vars' declaration was.
          >
          >
          > Remember that you are looking at the diff file so it seems to you that
          > I've moved the declaration. The end-user reader of perl_reference.pod
          > is not looking at the diff file. They are comparing multirun.pl and
          > multirun1.pl. So from their point of view, by simply replacing "my"
          > with "our" I'm keeping the declaration in the same place. If we were
          > to move the declaration away from the initialisation then I think we'd
          > have to explain why we did it. And since I think it's a bad idea I
          > couldn't do that.

          Agreed.

          >>[...]
          >>
          >>> multirun2.pl
          >>>- -----------
          >>>+ ------------
          >>> #!/usr/bin/perl -w
          >>> use strict;
          >>>@@ -993,14 +1023,14 @@
          >>> sub run {
          >>> - $main::counter = 0;
          >>>+ $::counter = 0;
          >>
          >>what's the gain in doing this? The two are identical and for
          >>unexperienced perl users $::counter will look totally alien.
          >
          >
          > That, of course, would depend on the nature of their (in)experience to
          > date :-). If they have always seen explicit access to variables in the
          > root namepace written as $::counter then it is the use of the alias
          > $main::counter that will look totally alien.

          I guess I have hardly ever seen code that uses $::, and still strikes me odd.
          But that's just my personal (in)experience as you put it.

          > However, given that I go on to refer to root namespace as 'main::'
          > later on, I suppose it makes sense to be consistant and revert to
          > using the alias throughout.

          Perfect.

          >>The rest looks good, sans spelling ;)
          >
          >
          > I'll try to fix that now.
          >
          > Combined patch follows.

          Perfect. Both are now committed and will appear online within the next 6 hours.

          Thanks a lot, Brian. Please keep those patches coming (and I promise to be
          build less walls next time).

          __________________________________________________________________
          Stas Bekman JAm_pH ------> Just Another mod_perl Hacker
          http://stason.org/ mod_perl Guide ---> http://perl.apache.org
          mailto:stas@... http://use.perl.org http://apacheweek.com
          http://modperlbook.org http://apache.org http://ticketmaster.com


          --
          Reporting bugs: http://perl.apache.org/bugs/
          Mail list info: http://perl.apache.org/maillist/modperl.html
        • Brian McCauley
          ... [ snip - Stas correctly anticipates all my arguments in the debate ] ... I used to think I was the only person I knew who could loose a debate without
          Message 4 of 5 , Nov 17, 2003
          • 0 Attachment
            Stas Bekman <stas@...> writes:

            > Brian McCauley wrote:
            > >
            > > Stas Bekman <stas@...> writes:
            > >>
            > >>I think it would be more clear if all are declared at the top of the
            > >>file,
            > >
            > > Declaring variables at the top of the file is, IMNSHO, a bad
            > > programming habit that should be discouraged.
            >
            > Sorry I've missed the word "globals" while typing [...]
            >
            > You are talking about the locality rule, which I perfectly agree with
            > when it comes to scoped variables.
            >
            > However when it comes to globals I tend to disagree with you.

            [ snip - Stas correctly anticipates all my arguments in the debate ]

            > Agreed.

            I used to think I was the only person I knew who could loose a debate
            without letting the other guy get a word in. :-)

            > Both are now committed and will appear online within the next 6 hours.

            Phew!

            > Thanks a lot, Brian. Please keep those patches coming (and I promise
            > to be build less walls next time).

            Will do - but right now I have other fish to fry.

            --
            Reporting bugs: http://perl.apache.org/bugs/
            Mail list info: http://perl.apache.org/maillist/modperl.html
          Your message has been successfully submitted and would be delivered to recipients shortly.