Loading ...
Sorry, an error occurred while loading the content.

RE: [PBML] problems fetching a webpage from internet

Expand Messages
  • Charles K. Clarkson
    ... You have a lot of problems here which you are not asking about. The first problem is that your subroutine is working with a hash which has not been passed
    Message 1 of 2 , Mar 30, 2006
    • 0 Attachment
      Debaprasad Mukherjee wrote:

      : I have written a perl program which basically fetches web-pages
      : from the internet and doesprocessin of the pages to extract
      : specific data. I have used "LWP::UserAgent" in my program for
      : fetching web pages over the internet. Now I have moved to
      : another institution (place of work) and have added the the proxy
      : address to my program (entire subroutine code appended below).
      : But when I run the program, it generates the following messages.

      You have a lot of problems here which you are not asking about.
      The first problem is that your subroutine is working with a hash
      which has not been passed into it. Except under unusual
      circumstances it is best to pass all arguments needed by a
      subroutine into it.

      Next, your use of "use LWP::UserAgent;" inside a subroutine
      tells me that you do not understand when "use" gets run. Whether
      you run this subroutine or not, LWP::UserAgent is loaded because
      perl looks for those statements during compilation. Traditionally,
      we place all modules which are loaded at the top of a script.


      : Use of uninitialized value in hash element at
      : C:/Perl/site/lib/LWP/Protocol.pm l
      : ine 55.
      : Use of uninitialized value in pattern match (m//) at
      : C:/Perl/site/lib/LWP/Protoc
      : ol.pm line 58.
      : Use of uninitialized value in concatenation (.) or string at
      : C:/Perl/site/lib/LW
      : P/Protocol.pm line 38.
      : Use of uninitialized value in string eq at
      : C:/Perl/site/lib/LWP/UserAgent.pm lin
      : e 195.

      Since this worked before, I might guess you are using a
      different version of LWP::UserAgent than you used to use or that
      the missing "http://" prefix from the code is more necessary than
      you think.


      : sub getCONTENTS{
      : my $url= shift @_;


      : use LWP::UserAgent;

      Loaded at compile time. Move this out of your subroutine.


      : my $proxy_string;
      :
      : if($parameters{-ip} eq "IICB"){
      : $proxy_string= "http://10.0.0.1:8080";
      : }
      : elsif($parameters{-ip} eq "PDSIT"){
      : $proxy_string= "192.168.35.200:3128";

      Why doesn't that one contain the "http://" part?


      : }
      : else{
      : $proxy_string= "http://192.168.100.1:3128";
      : }
      :
      : my $ua= new LWP::UserAgent;
      : $ua->proxy(http => "$proxy_string");

      Don't quote variables unnecessarily. Use $proxy_string not
      "$proxy_string". Of course, we don't really need $proxy_string.

      my $ua = new LWP::UserAgent;

      if ( $ip eq 'IICB' ) {
      $ua->proxy( http => 'http://10.0.0.1:8080' );

      } elsif ( $ip eq 'PDSIT' ) {
      $ua->proxy( http => 'http://192.168.35.200:3128' );

      } else {
      $ua->proxy( http => 'http://192.168.100.1:3128' );
      }


      : LABEL1: my $req= new HTTP::Request GET => $url;

      There used to be an author who wrote about poor programming
      practices. He said some practices raised red flags. When I first
      saw this line those red flags came to mind.

      : my $res= $ua->request($req);
      :
      : my $contents;
      :
      : if ($res->is_success){
      : $contents= $res->content;
      : #print "DBG Success\n";
      :
      : sleep 5;
      : }
      : else{
      : print "x_net\t";
      :
      : sleep 3;
      : goto LABEL1;
      : }
      :
      : return $contents;
      : }


      while (1) {
      my $req = new HTTP::Request GET => $url;
      my $res = $ua->request($req);

      if ( $res->is_success ) {
      sleep 5;
      return $res->content;
      }

      print "x_net\t";
      sleep 3;
      }

      I still don't like this. It may go on endlessly and I don't
      see a reason for the "sleep 5". This might be better.

      getCONTENTS( $url, $parameters{-ip} );
      .
      .
      .

      sub getCONTENTS {

      my( $url, $ip ) = @_;

      my $ua = new LWP::UserAgent;

      if ( $ip eq 'IICB' ) {
      $ua->proxy( http => 'http://10.0.0.1:8080' );

      } elsif ( $ip eq 'PDSIT' ) {
      $ua->proxy( http => 'http://192.168.35.200:3128' );

      } else {
      $ua->proxy( http => 'http://192.168.100.1:3128' );
      }

      foreach ( 1 .. 10 ) {
      my $req = new HTTP::Request GET => $url;
      my $res = $ua->request( $req );

      return $res->content if $res->is_success;

      print "x_net\t";
      sleep 3;
      }

      #
      # Something is wrong.
      #

      }


      HTH,

      Charles K. Clarkson
      --
      Mobile Homes Specialist
      254 968-8328
    Your message has been successfully submitted and would be delivered to recipients shortly.