Loading ...
Sorry, an error occurred while loading the content.

Re: Starting from scratch

Expand Messages
  • markcsis@yahoo.com
    Is the scraping a purely manual function or is there something automated about it? Thanks. ... and ... you ... RSS ... then ... provide? ... other ... offer
    Message 1 of 4 , Oct 27, 2001
    View Source
    • 0 Attachment
      Is the scraping a purely manual function or is there something
      automated about it? Thanks.


      --- In rss-dev@y..., "Bill Kearney" <wkearney99@h...> wrote:
      > > I am in the early stages of building a website that will collect
      > and
      > > display headlines from political policy institution websites. I
      > have
      > > been reading everything I can about RSS and trying to understand
      > its
      > > possible uses and limitations for the project I am undertaking
      and
      > I
      > > have some questions. These might seem overly basic to those of
      you
      > > who breathe this stuff so I apologize in advance for my lack of
      RSS
      > > intelligence.
      >
      > You won't learn if you don't ask. Ask away.
      >
      > > 1. Because the sources currently do not publish to RSS is it
      > feasible
      > > to think I can convince an entire group to begin doing this?
      >
      > This is one of the intentions of the www.syndic8.com website.
      > To 'evangelize' websites and have them create a feed that can be
      > listed in the syndic8 database. There's a mailing list at:
      > http://groups.yahoo.com/group/syndic8
      >
      > > 2. Moreover, 10 am., and others harvest headlines and then
      > > redistribute by offering code to put their headlines on others'
      > > sites, do website owners need to use 10 am. to display the
      > headlines
      > > from 10 am sources or could they harvest those same headlines
      > without
      > > 10 am? If they could harvest them without 10 am (or moreover)
      then
      > > what is the value-added that sites like moreover and 10 am
      provide?
      >
      > The advantage to the scraping services is they do a lot of the work
      > for you. They do have some categories of their own. That and
      other
      > features like creating personalized channels, searching, forwarding
      > to e-mail and more. Take a look at www.newsisfree.com.
      >
      > > 3. I would want to harvest headlines and categorize them and
      offer
      > a
      > > service to other websites to choose a category to display and
      then
      > > download the code from me to display the headlines very much like
      > 10
      > > am. Also, like moreover, I would expect to have the headlines
      > module
      > > on other sites to all display the name of my service in the
      module.
      > > Is this reality?
      >
      > Sure. The trick being to get your site working and keep it
      working.
      > There are lots of ways for folks to get feeds displayed. Some of
      > them have been kept up to date. That's usually one of the biggest
      > hassles for any new service, beside just staying up.
      >
      > > 4. Or is reality that I would convince a couple dozen think tanks
      > to
      > > distribute daily via RSS so I can harvest and then their content
      > > would be so readily available that everyone would circumvent my
      > > service?
      >
      > Well, that's a good question. I'm of the opinion it's best to have
      > the sources of content creating their own feeds. You're, in
      theory,
      > guaranteed a greater degree of 'freshness' than aggregated
      > materials. If you choose to scan a feed hourly you might catch
      > things faster than the daily schedule many aggregating sites use.
      >
      > But when a source doesn't have it's own feed, a scraping of it is
      > better than nothing. Mike Krus as Newsisfree does a really great
      job
      > of creating new scraped feeds. 'Scraped' usually means obtaining
      the
      > text from either a website or other (non-RSS) formatted
      information.
      > Scraping services usually put the material into an RSS file, to a
      > webpage or both.
      >
      > Better (any) categorization of feed items would be a good thing.
      >
      > -Bill Kearney
    • Bill Kearney
      ... From what I understand it s an automatic process. There are several tools and services that exist. Newsisfree and blogspace are two sites. Stapler is a
      Message 2 of 4 , Oct 27, 2001
      View Source
      • 0 Attachment
        > Is the scraping a purely manual function or is there something
        > automated about it? Thanks.

        From what I understand it's an automatic process. There are several
        tools and services that exist. Newsisfree and blogspace are two
        sites. Stapler is a tool for Radio Userland. I'm sure there are
        others.

        Several of the CMS (content management systems) already support the
        ability to spit items out into various formats. RSS is but one.
        Some of the scraped RSS feeds are coming from files that are export
        but not into RSS format. The scraped feed 'simply' transforms it
        from the native format into a flavor of RSS.

        I suspect these questions (and this thread) might be worth having on
        the syndication mailing list. Many folks here are on both lists.

        http://groups.yahoo.com/group/sydication

        -Bill Kearney
      Your message has been successfully submitted and would be delivered to recipients shortly.