Loading ...
Sorry, an error occurred while loading the content.

regex capture \1 versus $1

Expand Messages
  • acummingsus
    Hi, Why the error? Tips, suggestions, etc. (I thought that I m using it correctly). But: 1 better written as $1 at /home/al/bin/wthr_pbml line 9. 2 better
    Message 1 of 4 , Aug 1, 2005
    • 0 Attachment
      Hi,

      Why the error? Tips, suggestions, etc.

      (I thought that I'm using it correctly). But:

      \1 better written as $1 at /home/al/bin/wthr_pbml line 9.
      \2 better written as $2 at /home/al/bin/wthr_pbml line 9.

      That's the error that I get when I run the next on Slackware 10.1
      Linux with Perl 5.8.6:

      #!/usr/bin/perl -w
      use strict;

      while (<DATA>) {
      if ( /\/ \d{4}\.\d{2}.\d{2} \d{4} UTC/ ) {
      my $utc_line = $_;
      $utc_line =~ s/^.+\/ \d{4}\.//;
      # the culprit is the next line
      $utc_line =~ s/(\d{2})\.(\d{2}) (\d{2})(\d{2}) UTC/\1 \2 \3 \4/;
      chomp($utc_line);
      # (substitution) what happens if the string begins with 1 instead of 0
      $utc_line =~ s/^0//;
      my @parsed_utc_fields = split(/ /,$utc_line);
      print "\$utc_line: $utc_line\n";
      print "yes matched\n";
      print "\@parsed_utc_fields: @parsed_utc_fields\n";
      }
      }

      __DATA__
      Sacramento, Sacramento Executive Airport, CA, United States (KSAC)
      38-30-25N 121-29-42W 11M
      Jul 31, 2005 - 11:53 PM EDT / 2005.08.01 0353 UTC
      Wind: from the SSW (200 degrees) at 6 MPH (5 KT):0
      Visibility: 10 mile(s):0
      Sky conditions: clear
      Temperature: 80.1 F (26.7 C)
      Dew Point: 54.0 F (12.2 C)
      Relative Humidity: 40%
      Pressure (altimeter): 29.86 in. Hg (1011 hPa)
      ob: KSAC 010353Z 20005KT 10SM CLR 27/12 A2986 RMK AO2 SLP110 T02670122
      cycle: 4
      # end_of_DATA

      (finds the date line, line number three. Parses said line). Will
      soon then feed said parsed fields to the DateTime module so as to get
      UTC translated to a time print out of Los Angeles (US Pacific Coast)
      time zone.

      Thanks. Alan.
    • acummingsus
      ... # line 9 is the next line ... al@p2bs103:~$ perldoc perlre plre.txt (opens the text file plre.txt in the Kate editor, KDE desktop, Slackware 10.1)
      Message 2 of 4 , Aug 1, 2005
      • 0 Attachment
        --- In perl-beginner@yahoogroups.com, "acummingsus" <acelists@g...> wrote:
        > \1 better written as $1 at /home/al/bin/wthr_pbml line 9.
        # line 9 is the next line
        > $utc_line =~ s/(\d{2})\.(\d{2}) (\d{2})(\d{2}) UTC/\1 \2 \3 \4/;

        al@p2bs103:~$ perldoc perlre > plre.txt

        (opens the text file plre.txt in the Kate editor, KDE desktop,
        Slackware 10.1)

        <quote>The bracketing construct "( ... )" creates capture
        buffers.</quote>

        <quote>Examples:

        s/^([^ ]*) *([^ ]*)/$2 $1/; # swap first two words

        if (/(.)\1/) { # find first doubled char
        print "'$1' is the first doubled character\n";
        }</quote>

        My (new, latest) perception/revelation (if I dare) is that \1 is
        intended as a back reference and thus is only to be used from within
        the m for match, *not* to be used within the s for substitution regex.

        I guess ^^ that's ^^ right?

        Oop, except *maybe* this next is OK too

        s/(.)\1//;

        But, there, the \1 is used in the *matching* portion of the
        substitution regex.

        Evidently \1 is not to ever leave the matching portion of a regex it
        looks like -- if I have it right that is.
        --

        On another list somewhere in the past perhaps someone told me wrong or
        else didn't provide enough specifics for me to get it correctly.
        (based on that, formerly I thought \1 "is to be used within the regex
        itself"). But, more specifically, it now appears to me that \1
        must_not_leave_the_matching_portion of a regex as I demonstrated above.

        Hopefully, if I got part of it wrong, someone will correct me.

        Thanks. Alan.
      • Jeff 'japhy' Pinyan
        ... It s not an error, it s a warning. ... Because Perl prefers you use $ outside of the *actual* regular expression. The right-hand side of an s/// is
        Message 3 of 4 , Aug 1, 2005
        • 0 Attachment
          On Aug 1, acummingsus said:

          > Why the error? Tips, suggestions, etc.

          It's not an error, it's a warning.

          > \1 better written as $1 at /home/al/bin/wthr_pbml line 9.
          > \2 better written as $2 at /home/al/bin/wthr_pbml line 9.

          Because Perl prefers you use $<DIGIT> outside of the *actual* regular
          expression. The right-hand side of an s/// is not parsed as a regex, but
          as a double-quoted string. You should use *variables* there, not the
          regex escape sequence denoting a capturing group.

          Whoever taught you that \1 should be used on the right-hand side is guilty
          of at least one of the following: using a very old version of Perl, not
          turning warnings on, or ignoring the warnings.

          --
          Jeff "japhy" Pinyan % How can we ever be the sold short or
          RPI Acacia Brother #734 % the cheated, we who for every service
          http://japhy.perlmonk.org/ % have long ago been overpaid?
          http://www.perlmonks.org/ % -- Meister Eckhart
        • acummingsus
          ... [ why the error ] ... regex, but ... 1 is a part of a capturing group. And, capture group is part of the m for match a.k.a regex a.k.a. m/(.) 1/;
          Message 4 of 4 , Aug 1, 2005
          • 0 Attachment
            --- In perl-beginner@yahoogroups.com, Jeff 'japhy' Pinyan <japhy@p...>
            wrote:
            > On Aug 1, acummingsus said:
            [ why the error ]
            > It's not an error, it's a warning.
            >
            > > \1 better written as $1 at /home/al/bin/wthr_pbml line 9.
            > > \2 better written as $2 at /home/al/bin/wthr_pbml line 9.
            >
            > Because Perl prefers you use $<DIGIT> outside of the *actual* regular
            > expression. The right-hand side of an s/// is not parsed as a
            regex, but
            > as a double-quoted string. You should use *variables* there, not the
            > regex escape sequence denoting a capturing group.

            \1 is a part of a capturing group. And, capture group is part of the
            m for match a.k.a "regex" a.k.a. m/(.)\1/; a.k.a. s/(.)\1//;

            The right hand portion of a substitution, however, is *not* defined as
            "regex". And, said right hand portion differs from the left hand
            portion in that the left hand portion *is* "regex".

            So, now I know what I didn't know back then which is that
            *I_formerly_did_not_know* that the right hand portion of a
            substitution is *not* "regex".

            So, as a beginner, I *formerly* didn't have/own sufficient definition
            and/or context of what "regex" is when, on not this but a different
            beginner group, I was told "\1 must be used within the regex itself".

            Super! Thank you! And, HTH someone, some how, in some way.

            Alan.
          Your message has been successfully submitted and would be delivered to recipients shortly.