Loading ...
Sorry, an error occurred while loading the content.
 

how to download entire feed with wget

Expand Messages
  • illesfarkas
    Hi, I d like to download with wget an entire feed, not just the most recent part of it. As I scroll down in a feed with Google Reader, it can add older parts
    Message 1 of 10 , Mar 5, 2009
      Hi,

      I'd like to download with wget an entire feed, not just the most
      recent part of it. As I scroll down in a feed with Google Reader, it
      can add older parts of the feed again and again, and I can go as far
      as the first item of the feed that appeared years ago. Do you happen
      to know how Google Reader or other readers do this?

      I've just looked at some feeds downloaded with wget. Is there a
      "continue" or "next page" URL at the bottom of the feed?

      Thanks,
      Illes
    • Rogers Cadenhead
      Most feeds don t provide an archive of a site s content; they just include the most recent content. To go back further, you d have to find an aggregator that
      Message 2 of 10 , Mar 6, 2009
        Most feeds don't provide an archive of a site's content; they just
        include the most recent content.

        To go back further, you'd have to find an aggregator that saved older
        feed entries, as Google Reader appears to do from your post. I think
        Bloglines does this, and you can request that data with the Bloglines
        API:

        http://www.bloglines.com/services/api/

        The sync API has methods for retrieving blog items.
      • James Holderness
        ... You can do the same kind of thing through Google s API. However, bear in mind that what you re getting back isn t the original feed content - it will have
        Message 3 of 10 , Mar 6, 2009
          Rogers Cadenhead wrote:
          > To go back further, you'd have to find an aggregator that saved older
          > feed entries, as Google Reader appears to do from your post. I think
          > Bloglines does this, and you can request that data with the Bloglines
          > API:

          You can do the same kind of thing through Google's API. However, bear in
          mind that what you're getting back isn't the original feed content - it will
          have been processed and altered by whichever of the services you use.

          Now there's also a "standard" way to get back archived feed documents -
          RFC5005 - but it's very rarely implemented. A couple of the big blogging
          sites support something similar, but the reality of how it's implemented in
          practice is different enough to make the spec essentially useless. At least
          in my experience.

          One last warning: If you do try and get something like this working,
          remember that some feeds can be huge. When they post multiple messages per
          day and have been going for several years, that adds up to a lot of data.
          You can find yourself downloading hundreds of pages and thousands of feed
          entries.

          Regards
          James
        • Randy Morin
          I don t really like the outcome of this thread. Basically, we have proprietary APIs and an Atom solution that isn t adopted or scalable (it requires dynamic
          Message 4 of 10 , Mar 17, 2009
            I don't really like the outcome of this thread. Basically, we have proprietary APIs and an Atom solution that isn't adopted or scalable (it requires dynamic responses). I'll add this to our TODO list.

            Randy


            --- In rss-public@yahoogroups.com, "illesfarkas" <illes.farkas@...> wrote:
            >
            > Hi,
            >
            > I'd like to download with wget an entire feed, not just the most
            > recent part of it. As I scroll down in a feed with Google Reader, it
            > can add older parts of the feed again and again, and I can go as far
            > as the first item of the feed that appeared years ago. Do you happen
            > to know how Google Reader or other readers do this?
            >
            > I've just looked at some feeds downloaded with wget. Is there a
            > "continue" or "next page" URL at the bottom of the feed?
            >
            > Thanks,
            > Illes
            >
          • James Holderness
            ... FWIW the Atom solution (assuming you re referring to RFC5005) can just as easily be used in an RSS feed (see appendix B). As for the issue of
            Message 5 of 10 , Mar 22, 2009
              Randy Morin wrote:
              > I don't really like the outcome of this thread. Basically, we have
              > proprietary APIs and an Atom solution that isn't adopted or
              > scalable (it requires dynamic responses). I'll add this to our
              > TODO list.

              FWIW the "Atom" solution (assuming you're referring to RFC5005) can just as
              easily be used in an RSS feed (see appendix B). As for the issue of
              scalability, I don't believe dynamic responses are necessarily a
              requirement.

              The real problem though is adoption. Inventing yet another solution to this
              problem, no matter how well designed, seems pointless to me if nobody is
              interested in implementing it.

              Regards
              James
            • Sam Ruby
              ... In addition to appendix B, see reference 9, which gives attribution to both D. Winer and a work product of this group. Here is a feed that is served
              Message 6 of 10 , Mar 22, 2009
                James Holderness wrote:
                >
                > Randy Morin wrote:
                > > I don't really like the outcome of this thread. Basically, we have
                > > proprietary APIs and an Atom solution that isn't adopted or
                > > scalable (it requires dynamic responses). I'll add this to our
                > > TODO list.
                >
                > FWIW the "Atom" solution (assuming you're referring to RFC5005) can just as
                > easily be used in an RSS feed (see appendix B). As for the issue of
                > scalability, I don't believe dynamic responses are necessarily a
                > requirement.

                In addition to appendix B, see reference 9, which gives attribution to
                both D. Winer and a work product of this group.

                Here is a feed that is served statically that contains RFC 5005 defined
                elements:

                http://intertwingly.net/blog/archives/2009/03/index.atom

                It's not perfect (the "current" feed duplicates entries that may be
                found in "previous" ones), but mostly workable.

                > The real problem though is adoption. Inventing yet another solution to this
                > problem, no matter how well designed, seems pointless to me if nobody is
                > interested in implementing it.

                s/nobody is/only a few are/

                > Regards
                > James

                - Sam Ruby
              • Randy Charles Morin
                No API is required for export, just common sense. Here s what I do. http://www.therssweblog.com/archive.xml I think a simply solution like this might actually
                Message 7 of 10 , Mar 22, 2009
                  No API is required for export, just common sense. Here's what I do.
                  http://www.therssweblog.com/archive.xml
                  I think a simply solution like this might actually get adoption.
                  MHO,

                  Randy

                  On 3/22/09, James Holderness <j4_james@...> wrote:
                  > Randy Morin wrote:
                  >> I don't really like the outcome of this thread. Basically, we have
                  >> proprietary APIs and an Atom solution that isn't adopted or
                  >> scalable (it requires dynamic responses). I'll add this to our
                  >> TODO list.
                  >
                  > FWIW the "Atom" solution (assuming you're referring to RFC5005) can just as
                  > easily be used in an RSS feed (see appendix B). As for the issue of
                  > scalability, I don't believe dynamic responses are necessarily a
                  > requirement.
                  >
                  > The real problem though is adoption. Inventing yet another solution to this
                  > problem, no matter how well designed, seems pointless to me if nobody is
                  > interested in implementing it.
                  >
                  > Regards
                  > James
                  >
                  >


                  --
                  Randy Charles Morin
                  http://www.talk-sports.net
                  http://www.kbcafe.com
                • James Holderness
                  ... I wouldn t consider this particular solution any better than those already in use, but it s probably workable (assuming you get a proper namespace for the
                  Message 8 of 10 , Mar 23, 2009
                    Randy Charles Morin wrote:
                    > No API is required for export, just common sense. Here's what I do.
                    > http://www.therssweblog.com/archive.xml
                    > I think a simply solution like this might actually get adoption.

                    I wouldn't consider this particular solution any better than those already
                    in use, but it's probably workable (assuming you get a proper namespace for
                    the rar:archive element). If this is what it takes to get widespread
                    adoption from server tools, so be it, but I would have prefered not having
                    to implement multiple solutions to the same problem.

                    Sam Ruby wrote:
                    > James Holderness wrote:
                    >> The real problem though is adoption. Inventing yet another solution to
                    >> this
                    >> problem, no matter how well designed, seems pointless to me if nobody is
                    >> interested in implementing it.
                    >
                    > s/nobody is/only a few are/

                    Actually even one implementation could be considered worthwhile if that one
                    was a service like say LiveJournal. Dozens of implementations by individual
                    bloggers, though, isn't going to impress me much.

                    Regards
                    James
                  • Rogers Cadenhead
                    ... Seems workable to me. Where do you put comments, trackbacks and pingbacks?
                    Message 9 of 10 , Mar 23, 2009
                      On Sun, Mar 22, 2009 at 3:25 PM, Randy Charles Morin <randy@...> wrote:
                      > No API is required for export, just common sense. Here's what I do.
                      > http://www.therssweblog.com/archive.xml
                      > I think a simply solution like this might actually get adoption.

                      Seems workable to me. Where do you put comments, trackbacks and pingbacks?
                    • Randy Morin
                      All existing RSS extensions are in play. I use to implement CommentAPI, Trackbacks and Pinbacks, but trackback and pingback spam caused me to abandon each. I
                      Message 10 of 10 , Mar 23, 2009
                        All existing RSS extensions are in play. I use to implement CommentAPI, Trackbacks and Pinbacks, but trackback and pingback spam caused me to abandon each. I think I still do CommentAPI, but I'm mobile right now and cannot confirm that.
                        Thanks,

                        Randy

                        --- In rss-public@yahoogroups.com, Rogers Cadenhead <cadenhead@...> wrote:
                        > Seems workable to me. Where do you put comments, trackbacks and pingbacks?
                        >
                      Your message has been successfully submitted and would be delivered to recipients shortly.