Loading ...
Sorry, an error occurred while loading the content.

Re: [PBML] Regular Expression mathces but captures are undefined

Expand Messages
  • Jenda Krynicky
    From: warrengallin ... The line above is the problem. The $1 and friends contain the data from the last successful regexp match and
    Message 1 of 7 , Jan 20, 2013
    View Source
    • 0 Attachment
      From: "warrengallin" <wgallin@...>
      > I am having a problem with some lines of text matching my regular expression, but the captured parts of the match are not defined.
      >
      > Attached is a minimal example of my problem. The match condition for
      > the if condition is met, the match value is printed, but the four
      > individual captures are undefined and are not printed. Perl 5.12
      > running on OSX Mountain Lion. Note, this is one line from a large
      > file, most of which are handled as expected, but several of which fail
      > in the same way.
      >
      > #!/usr/bin/perl
      > use strict;
      > use warnings;
      >
      > my $temp_in = "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap, Tm=60 from oligoCalc Web site, note, one C shorter than WJG2981, avoid in frame ORF w/ beta gal promoter IF you are not cutting for ligation into pXT7 CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";
      >
      > if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
      > print "Match is:\n $&\n";
      > my $sequence = $4;
      > $sequence =~ tr/[a-z]/[A-Z]/;
      > $sequence =~ s/\s//;

      The line above is the problem. The $1 and friends contain the data
      from the last successful regexp match and s/.../.../ is a regex match
      and replace.

      > my $title = $1;
      > my $comment1 = $2;
      > my $comment2 =$3;
      > print "Found a Match\n$title\n$comment1\n$comment2\n$sequence\n";

      You should copy the data from $1, $2, ... to ordinary variables as
      soon as possible, before something overwrites them.

      Jenda
      ===== Jenda@... === http://Jenda.Krynicky.cz =====
      When it comes to wine, women and song, wizards are allowed
      to get drunk and croon as much as they like.
      -- Terry Pratchett in Sourcery
    • Warren Gallin
      Thanks, that explains it - I ll keep this in mind in the future. Warren Gallin
      Message 2 of 7 , Jan 20, 2013
      View Source
      • 0 Attachment
        Thanks, that explains it - I'll keep this in mind in the future.

        Warren Gallin

        On 2013-01-20, at 6:02 PM, "Jenda Krynicky" <Jenda@...> wrote:

        > From: "warrengallin" wgallin@...>
        > > I am having a problem with some lines of text matching my regular expression, but the captured parts of the match are not defined.
        > >
        > > Attached is a minimal example of my problem. The match condition for
        > > the if condition is met, the match value is printed, but the four
        > > individual captures are undefined and are not printed. Perl 5.12
        > > running on OSX Mountain Lion. Note, this is one line from a large
        > > file, most of which are handled as expected, but several of which fail
        > > in the same way.
        > >
        > > #!/usr/bin/perl
        > > use strict;
        > > use warnings;
        > >
        > > my $temp_in = "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap, Tm=60 from oligoCalc Web site, note, one C shorter than WJG2981, avoid in frame ORF w/ beta gal promoter IF you are not cutting for ligation into pXT7 CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";
        > >
        > > if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
        > > print "Match is:\n $&\n";
        > > my $sequence = $4;
        > > $sequence =~ tr/[a-z]/[A-Z]/;
        > > $sequence =~ s/\s//;
        >
        > The line above is the problem. The $1 and friends contain the data
        > from the last successful regexp match and s/.../.../ is a regex match
        > and replace.
        >
        > > my $title = $1;
        > > my $comment1 = $2;
        > > my $comment2 =$3;
        > > print "Found a Match\n$title\n$comment1\n$comment2\n$sequence\n";
        >
        > You should copy the data from $1, $2, ... to ordinary variables as
        > soon as possible, before something overwrites them.
        >
        > Jenda
        > ===== Jenda@... === http://Jenda.Krynicky.cz =====
        > When it comes to wine, women and song, wizards are allowed
        > to get drunk and croon as much as they like.
        > -- Terry Pratchett in Sourcery
        >
        >
      • afbach1
        if ($temp_in =~ m/^(WJG d{4}) t([^ t]*) t([^ t]*) t([^ t]*) t/){ print Match is: n $& n ; my $sequence = $4; $sequence =~ tr/[a-z]/[A-Z]/; $sequence =~
        Message 3 of 7 , Jan 21, 2013
        View Source
        • 0 Attachment
          if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
          print "Match is:\n $&\n";
          my $sequence = $4;
          $sequence =~ tr/[a-z]/[A-Z]/;
          $sequence =~ s/\s//;
          my $title = $1;
          my $comment1 = $2;
          my $comment2 =$3;
          print "Found a Match\n$title\n$comment1?92;n$comment2?92;n$sequence?92;n";
          }

          Your match against "s/\s//" resets the capture vars (I don't think the
          match against "tr" does). The advantage of splitting over an RE depends
          upon how confident you are in the data formatting.

          else{

          print "Did not match.\n";

          }

          Worth adding input line number ("$.") and input (and maybe "warn" instead
          of "print") to the error msg for ease of tracking down any data problems.

          a
          ----------------------
          Andy Bach
          Systems Mangler
          Internet: andy_bach@...
          Voice: (608) 261-5738, Cell: (608) 658-1890

          "If Java had true garbage collection, most programs would delete
          themselves upon execution."
          Robert Sewell.
        Your message has been successfully submitted and would be delivered to recipients shortly.