Loading ...
Sorry, an error occurred while loading the content.

Re: [PBML] Regular Expression mathces but captures are undefined

Expand Messages
  • timothy adigun
    Hi warrengallin, Please check my comments below: ... suggest using Perl function *split, *like so, to solve this: #!/usr/bin/perl use strict; use warnings; my
    Message 1 of 7 , Jan 19, 2013
    • 0 Attachment
      Hi warrengallin,

      Please check my comments below:
      On Sun, Jan 20, 2013 at 12:45 AM, warrengallin <wgallin@...> wrote:

      > **
      >
      >
      > I am having a problem with some lines of text matching my regular
      > expression, but the captured parts of the match are not defined.
      >
      > Attached is a minimal example of my problem. The match condition for the
      > if condition is met, the match value is printed, but the four individual
      > captures are undefined and are not printed. Perl 5.12 running on OSX
      > Mountain Lion. Note, this is one line from a large file, most of which are
      > handled as expected, but several of which fail in the same way.
      >
      > #!/usr/bin/perl
      > use strict;
      > use warnings;
      >
      > my $temp_in = "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap,
      > Tm=60 from oligoCalc Web site, note, one C shorter than WJG2981, avoid in
      > frame ORF w/ beta gal promoter IF you are not cutting for ligation into
      > pXT7 CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";
      >
      > if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
      > print "Match is:\n $&\n";
      > my $sequence = $4;
      > $sequence =~ tr/[a-z]/[A-Z]/;
      > $sequence =~ s/\s//;
      > my $title = $1;
      > my $comment1 = $2;
      > my $comment2 =$3;
      > print "Found a Match\n$title\n$comment1\n$comment2\n$sequence\n";
      > }
      > else{
      >
      > print "Did not match.\n";
      >
      > }
      >
      > Instead of using regex to match each desired substrings I would rather
      suggest using Perl function *split, *like so, to solve this:

      #!/usr/bin/perl
      use strict;
      use warnings;

      my $temp_in =
      "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap, Tm=60 from
      oligoCalc Web site, note, one C shorter than WJG2981, avoid in frame ORF w/
      beta gal promoter IF you are not cutting for ligation into pXT7 CAA CTT TGG
      CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";

      my @string_array = split /\t/, $temp_in, 4;

      if ( $temp_in =~ m/^WJG\d{4}/ and ( @string_array == 4 ) ) {
      my ( $title, $comment1, $comment2, $sequence ) = @string_array;
      $sequence =~ tr/[a-z]/[A-Z]/;
      print "Found a Match\n\nTitle: ", $title,
      "\n\nComment 1: ", $comment1, "\n\nComment 2: ", $comment2,
      "\n\nSequence: ", $sequence, "\n\n";
      }
      else {
      print "Did not match.\n";
      }
      __END__

      OUTPUT:
      Found a Match

      Title: WJG2983

      Comment 1: jShaw2 ORF Forward

      Comment 2: Redesigned for longer overlap, Tm=60 from oligoCalc Web site,
      note, one C shorter than WJG2981, avoid in frame ORF w/ beta gal promoter
      IF you are not cutting for ligation into pXT7

      Sequence: CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCATGTCGGCAGCAAGAAATCT

      NOTE:
      If you still want to print the match string, one can do that by printing
      the original "$temp_in" if and only when the "IF" condition is met.

      For more on split you can do *perldoc -f split*.


      > exit;
      >
      >
      >



      --
      Tim


      [Non-text portions of this message have been removed]
    • Oral Akkan
      Hi Just use this one and write again if it does not work. I cannot test it for you, because in the originat text you must have tabulator ( t) and I see here
      Message 2 of 7 , Jan 19, 2013
      • 0 Attachment
        Hi

        Just use this one and write again if it does not work. I cannot test it for you, because in the originat text you must have tabulator (\t) and I see here only spaces. Consider that $1, $2, $3, ... have all only short lifetime and you must save them immediately in variables after the matching. 

        ...
        ...
        if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
            my ($title,$comment1,$comment2,$sequence) = ($1,$2,$3,$4);
            print "Match is:\n $&\n";
            $sequence =~ tr/[a-z]/[A-Z]/;
            $sequence =~ s/\s//;
            print "Found a Match\n$title\n$comment1\n$comment2\n$sequence\n";
        }
        else{
            print "Did not match.\n";
        }





        ________________________________
        Von: warrengallin <wgallin@...>
        An: perl-beginner@yahoogroups.com
        Gesendet: 0:45 Sonntag, 20.Januar 2013
        Betreff: [PBML] Regular Expression mathces but captures are undefined


         
        I am having a problem with some lines of text matching my regular expression, but the captured parts of the match are not defined.

        Attached is a minimal example of my problem. The match condition for the if condition is met, the match value is printed, but the four individual captures are undefined and are not printed. Perl 5.12 running on OSX Mountain Lion. Note, this is one line from a large file, most of which are handled as expected, but several of which fail in the same way.

        #!/usr/bin/perl
        use strict;
        use warnings;

        my $temp_in = "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap, Tm=60 from oligoCalc Web site, note, one C shorter than WJG2981, avoid in frame ORF w/ beta gal promoter IF you are not cutting for ligation into pXT7 CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";

        if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
        print "Match is:\n $&\n";
        my $sequence = $4;
        $sequence =~ tr/[a-z]/[A-Z]/;
        $sequence =~ s/\s//;
        my $title = $1;
        my $comment1 = $2;
        my $comment2 =$3;
        print "Found a Match\n$title\n$comment1\n$comment2\n$sequence\n";
        }
        else{

        print "Did not match.\n";

        }

        exit;




        [Non-text portions of this message have been removed]
      • Warren Gallin
        Tim, That worked perfectly. Although I am curious about the reason for the regex approach failing, your suggestions makes my script work, which is the most
        Message 3 of 7 , Jan 19, 2013
        • 0 Attachment
          Tim,

          That worked perfectly. Although I am curious about the reason for the regex approach failing, your suggestions makes my script work, which is the most important thing.

          Thanks,

          Warren

          On 2013-01-19, at 5:45 PM, timothy adigun <2teezperl@...> wrote:

          > Hi warrengallin,
          >
          > Please check my comments below:
          > On Sun, Jan 20, 2013 at 12:45 AM, warrengallin <wgallin@...> wrote:
          >
          >> **
          >>
          >>
          >> I am having a problem with some lines of text matching my regular
          >> expression, but the captured parts of the match are not defined.
          >>
          >> Attached is a minimal example of my problem. The match condition for the
          >> if condition is met, the match value is printed, but the four individual
          >> captures are undefined and are not printed. Perl 5.12 running on OSX
          >> Mountain Lion. Note, this is one line from a large file, most of which are
          >> handled as expected, but several of which fail in the same way.
          >>
          >> #!/usr/bin/perl
          >> use strict;
          >> use warnings;
          >>
          >> my $temp_in = "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap,
          >> Tm=60 from oligoCalc Web site, note, one C shorter than WJG2981, avoid in
          >> frame ORF w/ beta gal promoter IF you are not cutting for ligation into
          >> pXT7 CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";
          >>
          >> if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
          >> print "Match is:\n $&\n";
          >> my $sequence = $4;
          >> $sequence =~ tr/[a-z]/[A-Z]/;
          >> $sequence =~ s/\s//;
          >> my $title = $1;
          >> my $comment1 = $2;
          >> my $comment2 =$3;
          >> print "Found a Match\n$title\n$comment1\n$comment2\n$sequence\n";
          >> }
          >> else{
          >>
          >> print "Did not match.\n";
          >>
          >> }
          >>
          >> Instead of using regex to match each desired substrings I would rather
          > suggest using Perl function *split, *like so, to solve this:
          >
          > #!/usr/bin/perl
          > use strict;
          > use warnings;
          >
          > my $temp_in =
          > "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap, Tm=60 from
          > oligoCalc Web site, note, one C shorter than WJG2981, avoid in frame ORF w/
          > beta gal promoter IF you are not cutting for ligation into pXT7 CAA CTT TGG
          > CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";
          >
          > my @string_array = split /\t/, $temp_in, 4;
          >
          > if ( $temp_in =~ m/^WJG\d{4}/ and ( @string_array == 4 ) ) {
          > my ( $title, $comment1, $comment2, $sequence ) = @string_array;
          > $sequence =~ tr/[a-z]/[A-Z]/;
          > print "Found a Match\n\nTitle: ", $title,
          > "\n\nComment 1: ", $comment1, "\n\nComment 2: ", $comment2,
          > "\n\nSequence: ", $sequence, "\n\n";
          > }
          > else {
          > print "Did not match.\n";
          > }
          > __END__
          >
          > OUTPUT:
          > Found a Match
          >
          > Title: WJG2983
          >
          > Comment 1: jShaw2 ORF Forward
          >
          > Comment 2: Redesigned for longer overlap, Tm=60 from oligoCalc Web site,
          > note, one C shorter than WJG2981, avoid in frame ORF w/ beta gal promoter
          > IF you are not cutting for ligation into pXT7
          >
          > Sequence: CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCATGTCGGCAGCAAGAAATCT
          >
          > NOTE:
          > If you still want to print the match string, one can do that by printing
          > the original "$temp_in" if and only when the "IF" condition is met.
          >
          > For more on split you can do *perldoc -f split*.
          >
          >
          >> exit;
          >>
          >>
          >>
          >
          >
          >
          > --
          > Tim
          >
        • Jenda Krynicky
          From: warrengallin ... The line above is the problem. The $1 and friends contain the data from the last successful regexp match and
          Message 4 of 7 , Jan 20, 2013
          • 0 Attachment
            From: "warrengallin" <wgallin@...>
            > I am having a problem with some lines of text matching my regular expression, but the captured parts of the match are not defined.
            >
            > Attached is a minimal example of my problem. The match condition for
            > the if condition is met, the match value is printed, but the four
            > individual captures are undefined and are not printed. Perl 5.12
            > running on OSX Mountain Lion. Note, this is one line from a large
            > file, most of which are handled as expected, but several of which fail
            > in the same way.
            >
            > #!/usr/bin/perl
            > use strict;
            > use warnings;
            >
            > my $temp_in = "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap, Tm=60 from oligoCalc Web site, note, one C shorter than WJG2981, avoid in frame ORF w/ beta gal promoter IF you are not cutting for ligation into pXT7 CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";
            >
            > if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
            > print "Match is:\n $&\n";
            > my $sequence = $4;
            > $sequence =~ tr/[a-z]/[A-Z]/;
            > $sequence =~ s/\s//;

            The line above is the problem. The $1 and friends contain the data
            from the last successful regexp match and s/.../.../ is a regex match
            and replace.

            > my $title = $1;
            > my $comment1 = $2;
            > my $comment2 =$3;
            > print "Found a Match\n$title\n$comment1\n$comment2\n$sequence\n";

            You should copy the data from $1, $2, ... to ordinary variables as
            soon as possible, before something overwrites them.

            Jenda
            ===== Jenda@... === http://Jenda.Krynicky.cz =====
            When it comes to wine, women and song, wizards are allowed
            to get drunk and croon as much as they like.
            -- Terry Pratchett in Sourcery
          • Warren Gallin
            Thanks, that explains it - I ll keep this in mind in the future. Warren Gallin
            Message 5 of 7 , Jan 20, 2013
            • 0 Attachment
              Thanks, that explains it - I'll keep this in mind in the future.

              Warren Gallin

              On 2013-01-20, at 6:02 PM, "Jenda Krynicky" <Jenda@...> wrote:

              > From: "warrengallin" wgallin@...>
              > > I am having a problem with some lines of text matching my regular expression, but the captured parts of the match are not defined.
              > >
              > > Attached is a minimal example of my problem. The match condition for
              > > the if condition is met, the match value is printed, but the four
              > > individual captures are undefined and are not printed. Perl 5.12
              > > running on OSX Mountain Lion. Note, this is one line from a large
              > > file, most of which are handled as expected, but several of which fail
              > > in the same way.
              > >
              > > #!/usr/bin/perl
              > > use strict;
              > > use warnings;
              > >
              > > my $temp_in = "WJG2983 jShaw2 ORF Forward Redesigned for longer overlap, Tm=60 from oligoCalc Web site, note, one C shorter than WJG2981, avoid in frame ORF w/ beta gal promoter IF you are not cutting for ligation into pXT7 CAA CTT TGG CAG ATC GGT ACC GAA TTCTCGAGCCACCatgtcggcagcaagaaatct ";
              > >
              > > if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
              > > print "Match is:\n $&\n";
              > > my $sequence = $4;
              > > $sequence =~ tr/[a-z]/[A-Z]/;
              > > $sequence =~ s/\s//;
              >
              > The line above is the problem. The $1 and friends contain the data
              > from the last successful regexp match and s/.../.../ is a regex match
              > and replace.
              >
              > > my $title = $1;
              > > my $comment1 = $2;
              > > my $comment2 =$3;
              > > print "Found a Match\n$title\n$comment1\n$comment2\n$sequence\n";
              >
              > You should copy the data from $1, $2, ... to ordinary variables as
              > soon as possible, before something overwrites them.
              >
              > Jenda
              > ===== Jenda@... === http://Jenda.Krynicky.cz =====
              > When it comes to wine, women and song, wizards are allowed
              > to get drunk and croon as much as they like.
              > -- Terry Pratchett in Sourcery
              >
              >
            • afbach1
              if ($temp_in =~ m/^(WJG d{4}) t([^ t]*) t([^ t]*) t([^ t]*) t/){ print Match is: n $& n ; my $sequence = $4; $sequence =~ tr/[a-z]/[A-Z]/; $sequence =~
              Message 6 of 7 , Jan 21, 2013
              • 0 Attachment
                if ($temp_in =~ m/^(WJG\d{4})\t([^\t]*)\t([^\t]*)\t([^\t]*)\t/){
                print "Match is:\n $&\n";
                my $sequence = $4;
                $sequence =~ tr/[a-z]/[A-Z]/;
                $sequence =~ s/\s//;
                my $title = $1;
                my $comment1 = $2;
                my $comment2 =$3;
                print "Found a Match\n$title\n$comment1?92;n$comment2?92;n$sequence?92;n";
                }

                Your match against "s/\s//" resets the capture vars (I don't think the
                match against "tr" does). The advantage of splitting over an RE depends
                upon how confident you are in the data formatting.

                else{

                print "Did not match.\n";

                }

                Worth adding input line number ("$.") and input (and maybe "warn" instead
                of "print") to the error msg for ease of tracking down any data problems.

                a
                ----------------------
                Andy Bach
                Systems Mangler
                Internet: andy_bach@...
                Voice: (608) 261-5738, Cell: (608) 658-1890

                "If Java had true garbage collection, most programs would delete
                themselves upon execution."
                Robert Sewell.
              Your message has been successfully submitted and would be delivered to recipients shortly.