... [snip] ... be ... Mike, I have thought of implementing a similar idea before. Having the scraper keep track of whether or not the page has changed since...
Cross-posted. I've been keeping track of "teach a man how to fish" RSS sites for quite a while at the AmphetaDesk site. I've recently had the chance to update...
Hi all, Is there a canonical list of scraping tools for RSS? I'd like to present a live working sample of all of them. If you know of a scraper send me a URL...
Ah, perhaps I should clarify, I'm looking for sites running HTML page to RSS feed scrapers. Like the ones at voidstar, blogspace and the like. Ones that will...
... Oh. Well, there's myrss.com. How specific do you want? "You must encode your HTML this way", or "Give me the start and end tags, and I'll figure it out". ...
The scrapers that have code available. I'm interested in setting a few of them up on a machine to see how they stack up during validation. Then to see what...
... I'm with James Linden: the ones I run that fit your criteria are one-off Perl scripts (admittedly with a similar structure). Anyone who knows syndication ...
Ah. That's what I was thinking of next, although myrss does a fine job. I was thinking of doing one that was more tunable... Also, IMHO all of these scrapers...
... I like this but ... Is there some way we could generalize it so that it scales? I'd like not to have to put up a different button for Radio and a...
Well, there will always be different buttons, depending on the software on the other end; I never bought into the "XML" button concept; standardizing UI for...
... With that kind of thinking we never would have gotten the general purpose UI that goes under the name Web Browsers. ... Well I'm a bit beyond my deapth...
... on ... standardizing ... HTML is a fine standard. However, it doesn't tell you to use a particular colour, icon or word when linking to a text file vs. an...
... The registry does not need to be centralized. foo can be wherever, it can vary from blog to blog. A centralized registry is a natural thing here, but is...
... Hey, I'm pretty sure this could be made to work, and work very well. I don't have my cgi script tools together or I'd do it myself. If there is anyone...
... So, site A has a feed that I'm interested in, and they use service B to dispatch subscriptions. B knows about aggregators 1,2 and 3, but how does it know...
... Your arse-kicking aggregator should appear in the new ~aggregator's~ RSS feed just one time .... hopefully all the B's will pick it up at that time....
I'm left wondering when reinventing Passport/.Net will start here.... ... You want to use B's service? The tell B about your aggregator Q. I'm wondering when...
... Have had the same thoughts myself... :) ... so ... matching ... it, ... What I proposed earlier, for a start... ... case) ... is ... to do ... know how ......
... How is the address of the service (fooB) known by the site? If I read you correctly, it's carried in a cookie. How does that cookie get set, and what's its...
... What's wrong with tapping into the "Radio Users Subscribe Here" thing that most Radio blogs have? I've added a handler for that to my personal aggregator...
... Don't we have enough problems to deal with already? I see no reason why we need to bring even MORE proprietary/useless "standards" into the picture. James...
... that ... It's specific to Radio Userland. People with other aggregators are out of luck. ... it's ... go ... Of course. What's being debated is the...
... Hee hee, good to see you got the point of my joke. ... do ... how ... The client can only react to what it's being told be the server. If the filename...
... client ... type info ... servers ... from a ... are often ... content ... Manipulating metadata in Web servers is a big problem, yes. However, most *do*...