Loading ...
Sorry, an error occurred while loading the content.
 

Re: [PBML] Extracting from to

Expand Messages
  • Jeff 'japhy' Pinyan
    ... His regex fails because . doesn t match n, and I m SURE your HTML text has newlines in it. If you wanted to be quick about it (and possibly inaccurate)
    Message 1 of 8 , May 1, 2002
      On May 1, Greg Krieser said:

      >Thanks for the code. Tried both, but did not get them to work. I'll
      >read more about match and see if I can figure out why it didn't. I'll
      >let you know what I find.

      His regex fails because . doesn't match \n, and I'm SURE your HTML text
      has newlines in it. If you wanted to be quick about it (and possibly
      inaccurate) you could use

      ($contents) = $HTML =~ m{<body>(.*)</body>}si;

      --
      Jeff "japhy" Pinyan japhy@... http://www.pobox.com/~japhy/
      RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/
      ** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
      <stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
      [ I'm looking for programming work. If you like my work, let me know. ]
    • Greg Krieser
      Thanks for everyone s help. I ve got a sample working, but not the way I d like. Could I get a little more help? Tried this parser sample code I found at:
      Message 2 of 8 , May 1, 2002
        Thanks for everyone's help. I've got a sample working, but not the way I'd like. Could I get a little more help?

        Tried this parser sample code I found at: http://www.gellyfish.com/htexamples.

        #!/usr/bin/perl -w
        package Example;
        use strict;
        require HTML::Parser;
        @Example::ISA = qw(HTML::Parser);
        my $parser = Example->new;
        $parser->parse_file('test.html');
        print $parser->{TEXT};
        sub text
        {
        my ($self,$text) = @_;
        $self->{TEXT} .= $text;
        }

        This produces some javascript that is in the test.html file, but not everything between <body> and </body>. How can I modify the code to specify this requirement?

        Thanks A Lot,

        Greg
      • daymobrew
        ... way I d like. Could I get a little more help? ... http://www.gellyfish.com/htexamples. ... not everything between and . How can I modify
        Message 3 of 8 , May 2, 2002
          --- In perl-beginner@y..., "Greg Krieser" <greg@k...> wrote:
          > Thanks for everyone's help. I've got a sample working, but not the
          way I'd like. Could I get a little more help?
          >
          > Tried this parser sample code I found at:
          http://www.gellyfish.com/htexamples.
          >
          > #!/usr/bin/perl -w
          > package Example;
          > use strict;
          > require HTML::Parser;
          > @Example::ISA = qw(HTML::Parser);
          > my $parser = Example->new;
          > $parser->parse_file('test.html');
          > print $parser->{TEXT};
          > sub text
          > {
          > my ($self,$text) = @_;
          > $self->{TEXT} .= $text;
          > }
          >  
          > This produces some javascript that is in the test.html file, but
          not everything between <body> and </body>. How can I modify the code
          to specify this requirement?
          >
          > Thanks A Lot,
          >
          > Greg

          I modified Jeff's (working) regexp. I got the modified regexp
          working. Here is the full code:

          #!/usr/local/bin/perl -w

          use strict;

          if ( open( FH, 'body.html' ) )
          {
          my $whole_file = join( '', <FH> );
          close( FH );

          $whole_file =~ s@.*<body>(.*)</body>.*@$1@si;
          print "$whole_file";
          }
        • Greg Krieser
          VERY IMPRESSIVE! Works like a champ! Thanks! This list is showing me the importance of regexps. Thanks for the help. Can t wait to implement this
          Message 4 of 8 , May 2, 2002
            VERY IMPRESSIVE! Works like a champ! Thanks!

            This list is showing me the importance of regexps. Thanks for the help. Can't wait to implement this everywhere.

            Thanks!

            The following message was sent by "daymobrew" <daymobrew@...> on Thu, 02 May 2002 13:30:53 -0000.

            > <html><body>
            >
            >
            > <tt>
            > --- In perl-beginner@y..., "Greg Krieser" <greg@k...> wrote:<BR>
            > > Thanks for everyone's help.� I've got a sample working, but not
            > the <BR>
            > way I'd like.� Could I get a little more help?<BR>
            > > <BR>
            > > Tried this parser sample code I found at: <BR>
            > <a href="http://www.gellyfish.com/htexamples.">http://www.gellyfish.com/htexamples.</a>�
            > <BR>
            > > <BR>
            > > #!/usr/bin/perl -w<BR>
            > > package Example;<BR>
            > > use strict;<BR>
            > > require HTML::Parser;<BR>
            > > @Example::ISA = qw(HTML::Parser);<BR>
            > > my $parser = Example->new;<BR>
            > > $parser->parse_file('test.html');<BR>
            > > print $parser->{TEXT};<BR>
            > > sub text<BR>
            > > {<BR>
            > > my ($self,$text) = @_;<BR>
            > > $self->{TEXT} .= $text;<BR>
            > > }<BR>
            > > �<BR>
            > > This produces some javascript that is in the test.html file, but <BR>
            > not everything between <body> and </body>.� How can I
            > modify the code <BR>
            > to specify this requirement?<BR>
            > > <BR>
            > > Thanks A Lot,<BR>
            > > <BR>
            > > Greg<BR>
            > <BR>
            > I modified Jeff's (working) regexp. I got the modified regexp <BR>
            > working. Here is the full code:<BR>
            > <BR>
            > #!/usr/local/bin/perl -w<BR>
            > <BR>
            > use strict;<BR>
            > <BR>
            > if ( open( FH, 'body.html' ) )<BR>
            > {<BR>
            > ��� my $whole_file = join( '', <FH> );<BR>
            > ��� close( FH );<BR>
            > <BR>
            > ��� $whole_file =~ s@.*<body>(.*)</body>.*@$1@si;<BR>
            > ��� print "$whole_file";<BR>
            > }<BR>
            > <BR>
            > <BR>
            > </tt>
            >
            > <br>
            >
            > <!-- |**|begin egp html banner|**| -->
            >
            > <table border=0 cellspacing=0 cellpadding=2>
            > <tr bgcolor=#FFFFCC>
            > <td align=center><font size="-1" color=#003399><b>Yahoo! Groups Sponsor</b></font></td>
            > </tr>
            > <tr bgcolor=#FFFFFF>
            > <td align=center width=470><table border=0 cellpadding=0 cellspacing=0><tr><td
            > align=center><font face=arial size=-2>ADVERTISEMENT</font><br><a href="http://rd.yahoo.com/M=225001.2005406.3486599.1971030/D=egroupweb/S=1705006951:HM/A=1044510/R=0/*http://www.gotomypc.com/u/tr/yh/grp/300_g2_01/g22lp?Target=mm/g22lp.tmpl"
            > target=_top><img src="http://us.a1.yimg.com/us.yimg.com/a/ex/expert_city/300_gotomypc_01.gif"
            > alt="Click Here!" width="300" height="250" border="0"></a></td></tr></table></td>
            > </tr>
            > <tr><td><img alt="" width=1 height=1 src="http://us.adserver.yahoo.com/l?M=225001.2005406.3486599.1971030/D=egroupmail/S=1705006951:HM/A=1044510/rand=566292783"></td></tr>
            > </table>
            >
            > <!-- |**|end egp html banner|**| -->
            >
            >
            > <br>
            > <tt>
            > Unsubscribing info is here: <a href="http://help.yahoo.com/help/us/groups/groups-32.html">http://help.yahoo.com/help/us/groups/groups-32.html</a></tt>
            > <br>
            >
            > <br>
            > <tt>Your use of Yahoo! Groups is subject to the <a href="http://docs.yahoo.com/info/terms/">Yahoo!
            > Terms of Service</a>.</tt>
            > </br>
            >
            > </body></html>
            >
          Your message has been successfully submitted and would be delivered to recipients shortly.