Loading ...
Sorry, an error occurred while loading the content.
 

Scraping from webpages

Expand Messages
  • Agyeya Gupta
    Hi I am new to perl I would like your help for the following I want to write a script which would poll a web page,indefintely, say every two minutes, and
    Message 1 of 5 , May 1, 2004
      Hi
      I am new to perl
      I would like your help for the following
      I want to write a script which would poll a web page,indefintely, say
      every two minutes, and record the changes on that page into a local
      database or
      file. E.g. I could be checking the scores of a match or some share or
      stock prices.
      The OS that I am using is Red Hat Linux 9.2 and the database is MySql.

      Later on i may like to compare the changes from the database.

      Thanx in advance
      Regards
      Agyeya Gupta
    • Allan Dystrup
      Hi, A question about regexp (in perl) : I have two regexp es : 1) ^[ x2D_%a-z x91 x9B x86A-Z x92 x9D x8F0-9]+$ 2) ^unknown$ I want (for good reasons) to
      Message 2 of 5 , May 1, 2004
        Hi,
        A question about regexp (in perl) :

        I have two regexp'es :
        1) ^[\x2D_%a-z\x91\x9B\x86A-Z\x92\x9D\x8F0-9]+$
        2) ^unknown$

        I want (for good reasons) to combine these into *one* regexp $Filter
        stating :
        1) MATCH on the character class, AND (if so)
        2) MATCH on NOT "unknown"

        in order to accept only data records that comply with the given :
        1) the lexical AND
        2 the semantic constraints, like :
        next if ( $Field !~ /$Filter/) ); # Filter out bad record


        Can i do that(& how) ?

        best regards
        Allan Dystrup
      • Allan Dystrup
        Well, I hacked up a workaround using : $Filter = ^([^ x2D_%a-z x91 x9B x86A-Z x92 x9D x8F0-9]+)|(UnKnown)$ and reversing the test as: next if ( $Field =~
        Message 3 of 5 , May 1, 2004
          Well, I hacked up a workaround using :

          $Filter = ^([^\x2D_%a-z\x91\x9B\x86A-Z\x92\x9D\x8F0-9]+)|(UnKnown)$

          and reversing the test as:

          next if ( $Field =~ /$Filter/) );

          It works, if you turn your head 180 dg and do a head stand.

          allan



          --- In perl-beginner@yahoogroups.com, "Allan Dystrup"
          <allan_dystrup@y...> wrote:
          >
          > Hi,
          > A question about regexp (in perl) :
          >
          > I have two regexp'es :
          > 1) ^[\x2D_%a-z\x91\x9B\x86A-Z\x92\x9D\x8F0-9]+$
          > 2) ^unknown$
          >
          > I want (for good reasons) to combine these into *one* regexp
          $Filter
          > stating :
          > 1) MATCH on the character class, AND (if so)
          > 2) MATCH on NOT "unknown"
          >
          > in order to accept only data records that comply with the given :
          > 1) the lexical AND
          > 2 the semantic constraints, like :
          > next if ( $Field !~ /$Filter/) ); # Filter out bad record
          >
          >
          > Can i do that(& how) ?
          >
          > best regards
          > Allan Dystrup
        • Jeff 'japhy' Pinyan
          ... Ok: $rx1 = qr/[ x2D_%a-z x91 x9B x86A-Z x92 x9D x8F0-9]+$/; $rx2 = qr/unknown$/; if ($str =~ /^(?=$rx1)(?!$rx2)/) { # matches rx1 and not rx2 } That s how
          Message 4 of 5 , May 1, 2004
            On May 1, Allan Dystrup said:

            >1) ^[\x2D_%a-z\x91\x9B\x86A-Z\x92\x9D\x8F0-9]+$
            >2) ^unknown$
            >
            >I want (for good reasons) to combine these into *one* regexp $Filter
            >stating :
            >1) MATCH on the character class, AND (if so)
            >2) MATCH on NOT "unknown"

            Ok:

            $rx1 = qr/[\x2D_%a-z\x91\x9B\x86A-Z\x92\x9D\x8F0-9]+$/;
            $rx2 = qr/unknown$/;

            if ($str =~ /^(?=$rx1)(?!$rx2)/) {
            # matches rx1 and not rx2
            }

            That's how to do it.

            --
            Jeff "japhy" Pinyan japhy@... http://www.pobox.com/~japhy/
            RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/
            CPAN ID: PINYAN [Need a programmer? If you like my work, let me know.]
            <stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
          • Allan Dystrup
            Hi Jeff, This was just the solution i was lookung for, thanks a bucket! The problem with reversing a logically straight forward regexp (as you have to, not
            Message 5 of 5 , May 2, 2004
              Hi Jeff,

              This was just the solution i was lookung for, thanks a bucket!

              The problem with reversing a logically straight forward regexp (as
              you have to, not using the extended constructs) is that it - even
              after years of yoga training - seems to stare back at you like a line
              straight out of of Gödels theorem. You wouldn't like to run into that
              in a dark alley in someone else's code...

              allan


              --- In perl-beginner@yahoogroups.com, Jeff 'japhy' Pinyan
              <japhy@p...> wrote:
              > On May 1, Allan Dystrup said:
              >
              > >1) ^[\x2D_%a-z\x91\x9B\x86A-Z\x92\x9D\x8F0-9]+$
              > >2) ^unknown$
              > >
              > >I want (for good reasons) to combine these into *one* regexp
              $Filter
              > >stating :
              > >1) MATCH on the character class, AND (if so)
              > >2) MATCH on NOT "unknown"
              >
              > Ok:
              >
              > $rx1 = qr/[\x2D_%a-z\x91\x9B\x86A-Z\x92\x9D\x8F0-9]+$/;
              > $rx2 = qr/unknown$/;
              >
              > if ($str =~ /^(?=$rx1)(?!$rx2)/) {
              > # matches rx1 and not rx2
              > }
              >
              > That's how to do it.
              >
              > --
              > Jeff "japhy" Pinyan japhy@p...
              http://www.pobox.com/~japhy/
              > RPI Acacia brother #734 http://www.perlmonks.org/
              http://www.cpan.org/
              > CPAN ID: PINYAN [Need a programmer? If you like my work, let me
              know.]
              > <stu> what does y/// stand for? <tenderpuss> why, yansliterate of
              course.
            Your message has been successfully submitted and would be delivered to recipients shortly.