Loading ...
Sorry, an error occurred while loading the content.

Re: [PBML] Regular Expression mathces but captures are undefined

Expand Messages
  • Oral Akkan
    Hi Just use this one and write again if it does not work. I cannot test it for you, because in the originat text you must have tabulator ( t) and I see here
    Message 1 of 7 , Jan 19, 2013
    • 0 Attachment
      Hi

      Just use this one and write again if it does not work. I cannot test it for you, because in the originat text you must have tabulator (\t) and I see here only spaces. Consider that $1, $2, $3, ... have all only short lifetime and you must save them immediately in variables after the matching. 

      ...
      ...
      if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
          my ($title,$comment1,$comment2,$sequence) = ($1,$2,$3,$4);
          print "Match is:\n $&\n";
          $sequence =~ tr/[a-z]/[A-Z]/;
          $sequence =~ s/\s//;
          print "Found a Match\n$title\n$comment1\n$comment2\n$sequence\n";
      }
      else{
          print "Did not match.\n";
      }





      ________________________________
      Von: warrengallin <wgallin@...>
      An: perl-beginner@yahoogroups.com
      Gesendet: 0:45 Sonntag, 20.Januar 2013
      Betreff: [PBML] Regular Expression mathces but captures are undefined


       
      I am having a problem with some lines of text matching my regular expression, but the captured parts of the match are not defined.

      Attached is a minimal example of my problem. The match condition for the if condition is met, the match value is printed, but the four individual captures are undefined and are not printed. Perl 5.12 running on OSX Mountain Lion. Note, this is one line from a large file, most of which are handled as expected, but several of which fail in the same way.

      #!/usr/bin/perl
      use strict;
      use warnings;

      my $temp_in = "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap, Tm=60 from oligoCalc Web site, note, one C shorter than WJG2981, avoid in frame ORF w/ beta gal promoter IF you are not cutting for ligation into pXT7 CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";

      if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
      print "Match is:\n $&\n";
      my $sequence = $4;
      $sequence =~ tr/[a-z]/[A-Z]/;
      $sequence =~ s/\s//;
      my $title = $1;
      my $comment1 = $2;
      my $comment2 =$3;
      print "Found a Match\n$title\n$comment1\n$comment2\n$sequence\n";
      }
      else{

      print "Did not match.\n";

      }

      exit;




      [Non-text portions of this message have been removed]
    • Warren Gallin
      Tim, That worked perfectly. Although I am curious about the reason for the regex approach failing, your suggestions makes my script work, which is the most
      Message 2 of 7 , Jan 19, 2013
      • 0 Attachment
        Tim,

        That worked perfectly. Although I am curious about the reason for the regex approach failing, your suggestions makes my script work, which is the most important thing.

        Thanks,

        Warren

        On 2013-01-19, at 5:45 PM, timothy adigun <2teezperl@...> wrote:

        > Hi warrengallin,
        >
        > Please check my comments below:
        > On Sun, Jan 20, 2013 at 12:45 AM, warrengallin <wgallin@...> wrote:
        >
        >> **
        >>
        >>
        >> I am having a problem with some lines of text matching my regular
        >> expression, but the captured parts of the match are not defined.
        >>
        >> Attached is a minimal example of my problem. The match condition for the
        >> if condition is met, the match value is printed, but the four individual
        >> captures are undefined and are not printed. Perl 5.12 running on OSX
        >> Mountain Lion. Note, this is one line from a large file, most of which are
        >> handled as expected, but several of which fail in the same way.
        >>
        >> #!/usr/bin/perl
        >> use strict;
        >> use warnings;
        >>
        >> my $temp_in = "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap,
        >> Tm=60 from oligoCalc Web site, note, one C shorter than WJG2981, avoid in
        >> frame ORF w/ beta gal promoter IF you are not cutting for ligation into
        >> pXT7 CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";
        >>
        >> if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
        >> print "Match is:\n $&\n";
        >> my $sequence = $4;
        >> $sequence =~ tr/[a-z]/[A-Z]/;
        >> $sequence =~ s/\s//;
        >> my $title = $1;
        >> my $comment1 = $2;
        >> my $comment2 =$3;
        >> print "Found a Match\n$title\n$comment1\n$comment2\n$sequence\n";
        >> }
        >> else{
        >>
        >> print "Did not match.\n";
        >>
        >> }
        >>
        >> Instead of using regex to match each desired substrings I would rather
        > suggest using Perl function *split, *like so, to solve this:
        >
        > #!/usr/bin/perl
        > use strict;
        > use warnings;
        >
        > my $temp_in =
        > "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap, Tm=60 from
        > oligoCalc Web site, note, one C shorter than WJG2981, avoid in frame ORF w/
        > beta gal promoter IF you are not cutting for ligation into pXT7 CAA CTT TGG
        > CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";
        >
        > my @string_array = split /\t/, $temp_in, 4;
        >
        > if ( $temp_in =~ m/^WJG\d{4}/ and ( @string_array == 4 ) ) {
        > my ( $title, $comment1, $comment2, $sequence ) = @string_array;
        > $sequence =~ tr/[a-z]/[A-Z]/;
        > print "Found a Match\n\nTitle: ", $title,
        > "\n\nComment 1: ", $comment1, "\n\nComment 2: ", $comment2,
        > "\n\nSequence: ", $sequence, "\n\n";
        > }
        > else {
        > print "Did not match.\n";
        > }
        > __END__
        >
        > OUTPUT:
        > Found a Match
        >
        > Title: WJG2983
        >
        > Comment 1: jShaw2 ORF Forward
        >
        > Comment 2: Redesigned for longer overlap, Tm=60 from oligoCalc Web site,
        > note, one C shorter than WJG2981, avoid in frame ORF w/ beta gal promoter
        > IF you are not cutting for ligation into pXT7
        >
        > Sequence: CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCATGTCGGCAGCAAGAAATCT
        >
        > NOTE:
        > If you still want to print the match string, one can do that by printing
        > the original "$temp_in" if and only when the "IF" condition is met.
        >
        > For more on split you can do *perldoc -f split*.
        >
        >
        >> exit;
        >>
        >>
        >>
        >
        >
        >
        > --
        > Tim
        >
      • Jenda Krynicky
        From: warrengallin ... The line above is the problem. The $1 and friends contain the data from the last successful regexp match and
        Message 3 of 7 , Jan 20, 2013
        • 0 Attachment
          From: "warrengallin" <wgallin@...>
          > I am having a problem with some lines of text matching my regular expression, but the captured parts of the match are not defined.
          >
          > Attached is a minimal example of my problem. The match condition for
          > the if condition is met, the match value is printed, but the four
          > individual captures are undefined and are not printed. Perl 5.12
          > running on OSX Mountain Lion. Note, this is one line from a large
          > file, most of which are handled as expected, but several of which fail
          > in the same way.
          >
          > #!/usr/bin/perl
          > use strict;
          > use warnings;
          >
          > my $temp_in = "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap, Tm=60 from oligoCalc Web site, note, one C shorter than WJG2981, avoid in frame ORF w/ beta gal promoter IF you are not cutting for ligation into pXT7 CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";
          >
          > if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
          > print "Match is:\n $&\n";
          > my $sequence = $4;
          > $sequence =~ tr/[a-z]/[A-Z]/;
          > $sequence =~ s/\s//;

          The line above is the problem. The $1 and friends contain the data
          from the last successful regexp match and s/.../.../ is a regex match
          and replace.

          > my $title = $1;
          > my $comment1 = $2;
          > my $comment2 =$3;
          > print "Found a Match\n$title\n$comment1\n$comment2\n$sequence\n";

          You should copy the data from $1, $2, ... to ordinary variables as
          soon as possible, before something overwrites them.

          Jenda
          ===== Jenda@... === http://Jenda.Krynicky.cz =====
          When it comes to wine, women and song, wizards are allowed
          to get drunk and croon as much as they like.
          -- Terry Pratchett in Sourcery
        • Warren Gallin
          Thanks, that explains it - I ll keep this in mind in the future. Warren Gallin
          Message 4 of 7 , Jan 20, 2013
          • 0 Attachment
            Thanks, that explains it - I'll keep this in mind in the future.

            Warren Gallin

            On 2013-01-20, at 6:02 PM, "Jenda Krynicky" <Jenda@...> wrote:

            > From: "warrengallin" wgallin@...>
            > > I am having a problem with some lines of text matching my regular expression, but the captured parts of the match are not defined.
            > >
            > > Attached is a minimal example of my problem. The match condition for
            > > the if condition is met, the match value is printed, but the four
            > > individual captures are undefined and are not printed. Perl 5.12
            > > running on OSX Mountain Lion. Note, this is one line from a large
            > > file, most of which are handled as expected, but several of which fail
            > > in the same way.
            > >
            > > #!/usr/bin/perl
            > > use strict;
            > > use warnings;
            > >
            > > my $temp_in = "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap, Tm=60 from oligoCalc Web site, note, one C shorter than WJG2981, avoid in frame ORF w/ beta gal promoter IF you are not cutting for ligation into pXT7 CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";
            > >
            > > if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
            > > print "Match is:\n $&\n";
            > > my $sequence = $4;
            > > $sequence =~ tr/[a-z]/[A-Z]/;
            > > $sequence =~ s/\s//;
            >
            > The line above is the problem. The $1 and friends contain the data
            > from the last successful regexp match and s/.../.../ is a regex match
            > and replace.
            >
            > > my $title = $1;
            > > my $comment1 = $2;
            > > my $comment2 =$3;
            > > print "Found a Match\n$title\n$comment1\n$comment2\n$sequence\n";
            >
            > You should copy the data from $1, $2, ... to ordinary variables as
            > soon as possible, before something overwrites them.
            >
            > Jenda
            > ===== Jenda@... === http://Jenda.Krynicky.cz =====
            > When it comes to wine, women and song, wizards are allowed
            > to get drunk and croon as much as they like.
            > -- Terry Pratchett in Sourcery
            >
            >
          • afbach1
            if ($temp_in =~ m/^(WJG d{4}) t([^ t]*) t([^ t]*) t([^ t]*) t/){ print Match is: n $& n ; my $sequence = $4; $sequence =~ tr/[a-z]/[A-Z]/; $sequence =~
            Message 5 of 7 , Jan 21, 2013
            • 0 Attachment
              if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
              print "Match is:\n $&\n";
              my $sequence = $4;
              $sequence =~ tr/[a-z]/[A-Z]/;
              $sequence =~ s/\s//;
              my $title = $1;
              my $comment1 = $2;
              my $comment2 =$3;
              print "Found a Match\n$title\n$comment1?92;n$comment2?92;n$sequence?92;n";
              }

              Your match against "s/\s//" resets the capture vars (I don't think the
              match against "tr" does). The advantage of splitting over an RE depends
              upon how confident you are in the data formatting.

              else{

              print "Did not match.\n";

              }

              Worth adding input line number ("$.") and input (and maybe "warn" instead
              of "print") to the error msg for ease of tracking down any data problems.

              a
              ----------------------
              Andy Bach
              Systems Mangler
              Internet: andy_bach@...
              Voice: (608) 261-5738, Cell: (608) 658-1890

              "If Java had true garbage collection, most programs would delete
              themselves upon execution."
              Robert Sewell.
            Your message has been successfully submitted and would be delivered to recipients shortly.