Loading ...
Sorry, an error occurred while loading the content.
 

scrapers was Re: GovTrack Goes Open Source

Expand Messages
  • Neil Rest
    I ve just begun the learning curve to start building some screen scrapers of my own. How many people here are using or building scrapers? And how many might
    Message 1 of 5 , Apr 3, 2008
      I've just begun the learning curve to start building some screen
      scrapers of my own.

      How many people here are using or building scrapers? And how many
      might be interested in sharing experience/work/code?


      At 12:20 PM 4/3/2008, you wrote:
      >Very exciting news!
      >
      >Currently I'm running my own scrappers, but love the idea of putting
      >my efforts into an open
      >source scrapper.


      Neil
      --
      NeilRest@...

      Anyhow here's a proof for the philosophy of Solipsism. I stopped
      reading the newspaper, and there stopped being news. Even on CNN for
      these past two weeks they are doing only reruns! The Democrats'
      so-called debates certainly aren't "news." The White House's
      stonewalling in the last two weeks is like their stonewalling for the
      last two years. Starlets drive drunk and are sent to rehab and then
      they drive drunk again.
      So, here's my advice, if you don't like what's happening in the news
      just twist your lips in a disdainful smile and pay no attention. Now,
      at last, you can turn to Spinoza.
      -- Thomas Disch
    • Gabe Hamilton
      I ve built a few web crawlers/scrapers, in the past in perl but recently in java. I ve found the webharvest project to be a very useful tool for standalone
      Message 2 of 5 , Apr 3, 2008
        I've built a few web crawlers/scrapers, in the past in perl but
        recently in java. I've found the webharvest project to be a very
        useful tool for standalone use or from java.
        http://web-harvest.sourceforge.net/

        -Gabe

        On Thu, Apr 3, 2008 at 4:32 PM, Neil Rest <NeilRest@...> wrote:
        >
        >
        >
        >
        >
        >
        > I've just begun the learning curve to start building some screen
        > scrapers of my own.
        >
        > How many people here are using or building scrapers? And how many
        > might be interested in sharing experience/work/code?
        >
        > At 12:20 PM 4/3/2008, you wrote:
        > >Very exciting news!
        > >
        > >Currently I'm running my own scrappers, but love the idea of putting
        > >my efforts into an open
        > >source scrapper.
        >
        > Neil
      • Aron Pilhofer
        I would second, or third, or fourth that. Putting govt. scrapers in the wild would be something I d be very interested in contributing to. As for tools,
        Message 3 of 5 , Apr 3, 2008
          I would second, or third, or fourth that. Putting govt. scrapers in the wild would be something I'd be very interested in contributing to. As for tools, Perl/Mechanize is pretty formidable, but I've really come to love Ruby/Hpricot of late. DOM-based parsing is so slick when it works.

          On Apr 3, 2008, at 8:05 PM, Gabe Hamilton wrote:

          I've built a few web crawlers/scrapers, in the past in perl but
          recently in java. I've found the webharvest project to be a very
          useful tool for standalone use or from java.
          http://web-harvest. sourceforge. net/

          -Gabe

          On Thu, Apr 3, 2008 at 4:32 PM, Neil Rest <NeilRest@rcn. com> wrote:
          >
          >
          >
          >
          >
          >
          > I've just begun the learning curve to start building some screen
          > scrapers of my own.
          >
          > How many people here are using or building scrapers? And how many
          > might be interested in sharing experience/work/ code?
          >
          > At 12:20 PM 4/3/2008, you wrote:
          > >Very exciting news!
          > >
          > >Currently I'm running my own scrappers, but love the idea of putting
          > >my efforts into an open
          > >source scrapper.
          >
          > Neil


          -- 

          ~~~~~~~~~~~~~
          Aron Pilhofer
          Editor, Interactive News Technology,
          The New York Times
          Phone: 212-556-5849
          Email: aron@...

        Your message has been successfully submitted and would be delivered to recipients shortly.