Re: Starting from scratch
- Is the scraping a purely manual function or is there something
automated about it? Thanks.
--- In rss-dev@y..., "Bill Kearney" <wkearney99@h...> wrote:
> > I am in the early stages of building a website that will collect
> > display headlines from political policy institution websites. I
> > been reading everything I can about RSS and trying to understand
> > possible uses and limitations for the project I am undertaking
> > have some questions. These might seem overly basic to those of
> > who breathe this stuff so I apologize in advance for my lack of
> > intelligence.
> You won't learn if you don't ask. Ask away.
> > 1. Because the sources currently do not publish to RSS is it
> > to think I can convince an entire group to begin doing this?
> This is one of the intentions of the www.syndic8.com website.
> To 'evangelize' websites and have them create a feed that can be
> listed in the syndic8 database. There's a mailing list at:
> > 2. Moreover, 10 am., and others harvest headlines and then
> > redistribute by offering code to put their headlines on others'
> > sites, do website owners need to use 10 am. to display the
> > from 10 am sources or could they harvest those same headlines
> > 10 am? If they could harvest them without 10 am (or moreover)
> > what is the value-added that sites like moreover and 10 am
> The advantage to the scraping services is they do a lot of the work
> for you. They do have some categories of their own. That and
> features like creating personalized channels, searching, forwarding
> to e-mail and more. Take a look at www.newsisfree.com.
> > 3. I would want to harvest headlines and categorize them and
> > service to other websites to choose a category to display and
> > download the code from me to display the headlines very much like
> > am. Also, like moreover, I would expect to have the headlines
> > on other sites to all display the name of my service in the
> > Is this reality?
> Sure. The trick being to get your site working and keep it
> There are lots of ways for folks to get feeds displayed. Some of
> them have been kept up to date. That's usually one of the biggest
> hassles for any new service, beside just staying up.
> > 4. Or is reality that I would convince a couple dozen think tanks
> > distribute daily via RSS so I can harvest and then their content
> > would be so readily available that everyone would circumvent my
> > service?
> Well, that's a good question. I'm of the opinion it's best to have
> the sources of content creating their own feeds. You're, in
> guaranteed a greater degree of 'freshness' than aggregated
> materials. If you choose to scan a feed hourly you might catch
> things faster than the daily schedule many aggregating sites use.
> But when a source doesn't have it's own feed, a scraping of it is
> better than nothing. Mike Krus as Newsisfree does a really great
> of creating new scraped feeds. 'Scraped' usually means obtaining
> text from either a website or other (non-RSS) formatted
> Scraping services usually put the material into an RSS file, to a
> webpage or both.
> Better (any) categorization of feed items would be a good thing.
> -Bill Kearney
> Is the scraping a purely manual function or is there somethingFrom what I understand it's an automatic process. There are several
> automated about it? Thanks.
tools and services that exist. Newsisfree and blogspace are two
sites. Stapler is a tool for Radio Userland. I'm sure there are
Several of the CMS (content management systems) already support the
ability to spit items out into various formats. RSS is but one.
Some of the scraped RSS feeds are coming from files that are export
but not into RSS format. The scraped feed 'simply' transforms it
from the native format into a flavor of RSS.
I suspect these questions (and this thread) might be worth having on
the syndication mailing list. Many folks here are on both lists.