Loading ...
Sorry, an error occurred while loading the content.

Re: [PBML] Regular Expression mathces but captures are undefined

Expand Messages
  • Warren Gallin
    Tim, That worked perfectly. Although I am curious about the reason for the regex approach failing, your suggestions makes my script work, which is the most
    Message 1 of 7 , Jan 19, 2013
    • 0 Attachment
      Tim,

      That worked perfectly. Although I am curious about the reason for the regex approach failing, your suggestions makes my script work, which is the most important thing.

      Thanks,

      Warren

      On 2013-01-19, at 5:45 PM, timothy adigun <2teezperl@...> wrote:

      > Hi warrengallin,
      >
      > Please check my comments below:
      > On Sun, Jan 20, 2013 at 12:45 AM, warrengallin <wgallin@...> wrote:
      >
      >> **
      >>
      >>
      >> I am having a problem with some lines of text matching my regular
      >> expression, but the captured parts of the match are not defined.
      >>
      >> Attached is a minimal example of my problem. The match condition for the
      >> if condition is met, the match value is printed, but the four individual
      >> captures are undefined and are not printed. Perl 5.12 running on OSX
      >> Mountain Lion. Note, this is one line from a large file, most of which are
      >> handled as expected, but several of which fail in the same way.
      >>
      >> #!/usr/bin/perl
      >> use strict;
      >> use warnings;
      >>
      >> my $temp_in = "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap,
      >> Tm=60 from oligoCalc Web site, note, one C shorter than WJG2981, avoid in
      >> frame ORF w/ beta gal promoter IF you are not cutting for ligation into
      >> pXT7 CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";
      >>
      >> if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
      >> print "Match is:\n $&\n";
      >> my $sequence = $4;
      >> $sequence =~ tr/[a-z]/[A-Z]/;
      >> $sequence =~ s/\s//;
      >> my $title = $1;
      >> my $comment1 = $2;
      >> my $comment2 =$3;
      >> print "Found a Match\n$title\n$comment1\n$comment2\n$sequence\n";
      >> }
      >> else{
      >>
      >> print "Did not match.\n";
      >>
      >> }
      >>
      >> Instead of using regex to match each desired substrings I would rather
      > suggest using Perl function *split, *like so, to solve this:
      >
      > #!/usr/bin/perl
      > use strict;
      > use warnings;
      >
      > my $temp_in =
      > "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap, Tm=60 from
      > oligoCalc Web site, note, one C shorter than WJG2981, avoid in frame ORF w/
      > beta gal promoter IF you are not cutting for ligation into pXT7 CAA CTT TGG
      > CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";
      >
      > my @string_array = split /\t/, $temp_in, 4;
      >
      > if ( $temp_in =~ m/^WJG\d{4}/ and ( @string_array == 4 ) ) {
      > my ( $title, $comment1, $comment2, $sequence ) = @string_array;
      > $sequence =~ tr/[a-z]/[A-Z]/;
      > print "Found a Match\n\nTitle: ", $title,
      > "\n\nComment 1: ", $comment1, "\n\nComment 2: ", $comment2,
      > "\n\nSequence: ", $sequence, "\n\n";
      > }
      > else {
      > print "Did not match.\n";
      > }
      > __END__
      >
      > OUTPUT:
      > Found a Match
      >
      > Title: WJG2983
      >
      > Comment 1: jShaw2 ORF Forward
      >
      > Comment 2: Redesigned for longer overlap, Tm=60 from oligoCalc Web site,
      > note, one C shorter than WJG2981, avoid in frame ORF w/ beta gal promoter
      > IF you are not cutting for ligation into pXT7
      >
      > Sequence: CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCATGTCGGCAGCAAGAAATCT
      >
      > NOTE:
      > If you still want to print the match string, one can do that by printing
      > the original "$temp_in" if and only when the "IF" condition is met.
      >
      > For more on split you can do *perldoc -f split*.
      >
      >
      >> exit;
      >>
      >>
      >>
      >
      >
      >
      > --
      > Tim
      >
    • Jenda Krynicky
      From: warrengallin ... The line above is the problem. The $1 and friends contain the data from the last successful regexp match and
      Message 2 of 7 , Jan 20, 2013
      • 0 Attachment
        From: "warrengallin" <wgallin@...>
        > I am having a problem with some lines of text matching my regular expression, but the captured parts of the match are not defined.
        >
        > Attached is a minimal example of my problem. The match condition for
        > the if condition is met, the match value is printed, but the four
        > individual captures are undefined and are not printed. Perl 5.12
        > running on OSX Mountain Lion. Note, this is one line from a large
        > file, most of which are handled as expected, but several of which fail
        > in the same way.
        >
        > #!/usr/bin/perl
        > use strict;
        > use warnings;
        >
        > my $temp_in = "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap, Tm=60 from oligoCalc Web site, note, one C shorter than WJG2981, avoid in frame ORF w/ beta gal promoter IF you are not cutting for ligation into pXT7 CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";
        >
        > if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
        > print "Match is:\n $&\n";
        > my $sequence = $4;
        > $sequence =~ tr/[a-z]/[A-Z]/;
        > $sequence =~ s/\s//;

        The line above is the problem. The $1 and friends contain the data
        from the last successful regexp match and s/.../.../ is a regex match
        and replace.

        > my $title = $1;
        > my $comment1 = $2;
        > my $comment2 =$3;
        > print "Found a Match\n$title\n$comment1\n$comment2\n$sequence\n";

        You should copy the data from $1, $2, ... to ordinary variables as
        soon as possible, before something overwrites them.

        Jenda
        ===== Jenda@... === http://Jenda.Krynicky.cz =====
        When it comes to wine, women and song, wizards are allowed
        to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery
      • Warren Gallin
        Thanks, that explains it - I ll keep this in mind in the future. Warren Gallin
        Message 3 of 7 , Jan 20, 2013
        • 0 Attachment
          Thanks, that explains it - I'll keep this in mind in the future.

          Warren Gallin

          On 2013-01-20, at 6:02 PM, "Jenda Krynicky" <Jenda@...> wrote:

          > From: "warrengallin" wgallin@...>
          > > I am having a problem with some lines of text matching my regular expression, but the captured parts of the match are not defined.
          > >
          > > Attached is a minimal example of my problem. The match condition for
          > > the if condition is met, the match value is printed, but the four
          > > individual captures are undefined and are not printed. Perl 5.12
          > > running on OSX Mountain Lion. Note, this is one line from a large
          > > file, most of which are handled as expected, but several of which fail
          > > in the same way.
          > >
          > > #!/usr/bin/perl
          > > use strict;
          > > use warnings;
          > >
          > > my $temp_in = "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap, Tm=60 from oligoCalc Web site, note, one C shorter than WJG2981, avoid in frame ORF w/ beta gal promoter IF you are not cutting for ligation into pXT7 CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";
          > >
          > > if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
          > > print "Match is:\n $&\n";
          > > my $sequence = $4;
          > > $sequence =~ tr/[a-z]/[A-Z]/;
          > > $sequence =~ s/\s//;
          >
          > The line above is the problem. The $1 and friends contain the data
          > from the last successful regexp match and s/.../.../ is a regex match
          > and replace.
          >
          > > my $title = $1;
          > > my $comment1 = $2;
          > > my $comment2 =$3;
          > > print "Found a Match\n$title\n$comment1\n$comment2\n$sequence\n";
          >
          > You should copy the data from $1, $2, ... to ordinary variables as
          > soon as possible, before something overwrites them.
          >
          > Jenda
          > ===== Jenda@... === http://Jenda.Krynicky.cz =====
          > When it comes to wine, women and song, wizards are allowed
          > to get drunk and croon as much as they like.
          > -- Terry Pratchett in Sourcery
          >
          >
        • afbach1
          if ($temp_in =~ m/^(WJG d{4}) t([^ t]*) t([^ t]*) t([^ t]*) t/){ print Match is: n $& n ; my $sequence = $4; $sequence =~ tr/[a-z]/[A-Z]/; $sequence =~
          Message 4 of 7 , Jan 21, 2013
          • 0 Attachment
            if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
            print "Match is:\n $&\n";
            my $sequence = $4;
            $sequence =~ tr/[a-z]/[A-Z]/;
            $sequence =~ s/\s//;
            my $title = $1;
            my $comment1 = $2;
            my $comment2 =$3;
            print "Found a Match\n$title\n$comment1?92;n$comment2?92;n$sequence?92;n";
            }

            Your match against "s/\s//" resets the capture vars (I don't think the
            match against "tr" does). The advantage of splitting over an RE depends
            upon how confident you are in the data formatting.

            else{

            print "Did not match.\n";

            }

            Worth adding input line number ("$.") and input (and maybe "warn" instead
            of "print") to the error msg for ease of tracking down any data problems.

            a
            ----------------------
            Andy Bach
            Systems Mangler
            Internet: andy_bach@...
            Voice: (608) 261-5738, Cell: (608) 658-1890

            "If Java had true garbage collection, most programs would delete
            themselves upon execution."
            Robert Sewell.
          Your message has been successfully submitted and would be delivered to recipients shortly.