Loading ...
Sorry, an error occurred while loading the content.

Joel has an RSS scalability problem

Expand Messages
  • Garth Kidd
    Following Dave s comment [1] on Phil Ringnalda s comment-fest [2] about Joel s RSS bandwidth problem, I ve come up with a [maybe slightly too] simple XML-RPC
    Message 1 of 14 , Oct 20, 2002
    • 0 Attachment
      Following Dave's comment [1] on Phil Ringnalda's comment-fest [2] about
      Joel's RSS bandwidth problem, I've come up with a [maybe slightly too]
      simple XML-RPC based protocol for scaling RSS polling by using Radio
      Community Servers (or anything else implementing the protocol) to
      centralise the polling.

      The trick is to keep the protocol simple. Rather than trying to
      centralise article distribution, or come up with a full article
      distribution fabric, I'm just trying to have nodes poll feeds only when
      they know there are changes to be had.

      As I said in my blog posting [3] about it, UserLand are in an ideal and
      probably unique position to Do Something about this one. One code
      update, and they could centralise polling and reduce the load on
      everyone's RSS feeds by a good order of magnitude or two. I think.

      That said, more brains will undoubtedly help a great deal on this one,
      hence my message. I'm in no state time-wise to implement anything at the
      moment (just check my blog posting rate for the last two months). Anyone
      want to wade in?

      1:
      http://scriptingnews.userland.com/backissues/2002/10/20#When:8:36:08AM
      2: http://philringnalda.com/archives/002359.php
      3: http://www.deadlybloodyserious.com/2002/10/21.html#a990

      Regards,
      Garth.
    • Charles Miller
      ... I still don t see what the problem is with using simple HTTP conditional GET. All it requires is that the server remember the last time the RSS file was
      Message 2 of 14 , Oct 20, 2002
      • 0 Attachment
        Garth Kidd propagated the following meme:
        > Following Dave's comment [1] on Phil Ringnalda's comment-fest [2] about
        > Joel's RSS bandwidth problem, I've come up with a [maybe slightly too]
        > simple XML-RPC based protocol for scaling RSS polling by using Radio
        > Community Servers (or anything else implementing the protocol) to
        > centralise the polling.

        I still don't see what the problem is with using simple HTTP conditional
        GET. All it requires is that the server remember the last time the RSS
        file was updated, which really shouldn't be hard for any weblogging
        software to do, whether it's static or dynamic.

        Conditional GET reduces the cost of querying an RSS file to a few hundred
        bytes each way. It's simple, part of a standard that has been around over
        a decade, and puts the minimum effort on both producers and consumers of
        RSS feeds.

        (Note, using a conditional GET will save you the two-hits cost you'd
        incur if you used HEAD then GET)

        If anyone needs an explanation of how to implement conditional get either
        on the client or server side, I'd be quite happy to write one.

        Charles Miller
      • Jeremiah Rogers
        Charles: The problem with conditional-get is that asking for every server to support that is a long shot. I wish every server would have a last-modified
        Message 3 of 14 , Oct 20, 2002
        • 0 Attachment
          Charles:

          The problem with conditional-get is that asking for every server to
          support that is a long shot. I wish every server would have a
          last-modified header, and i wish every dynamic RSS script would honor
          that, but they simply don't. What we're trying to do is work around
          what we have.

          I'd like to see an example of how conditional-get works, it might
          explain your stance better.

          Now to Garth:

          I really like that interface but I'm wondering if making the server
          handle subscriptions really is the right thing to do. Another thing
          that worries me is that you appear to want to create a new user every
          time the script runs (which is why you use clock.now()). I don't really
          like that idea because it will fill the server with dead users.

          I think a nicer way to do it would be to just use your email address as
          your unique identifier and then maybe email a password to that email
          address. I'd rather just send the password back via xmlrpc though. I'm
          not too worried about people faking email addresses. call me insecure.
          But I'm digressing...

          I like your interface, I'm wondering if we should just return a
          last-modified timestamp when the aggregator gives us the URI or if
          actually handling the subscriptions is the right thing to do.

          Either way, I should hopefully be able to support it in a few days, and
          maybe Userland will support it too. That'd be neat. - J

          On Sunday, October 20, 2002, at 10:01 PM, Charles Miller wrote:

          > Garth Kidd propagated the following meme:
          >> Following Dave's comment [1] on Phil Ringnalda's comment-fest [2]
          >> about
          >> Joel's RSS bandwidth problem, I've come up with a [maybe slightly too]
          >> simple XML-RPC based protocol for scaling RSS polling by using Radio
          >> Community Servers (or anything else implementing the protocol) to
          >> centralise the polling.
          >
          > I still don't see what the problem is with using simple HTTP
          > conditional
          > GET. All it requires is that the server remember the last time the RSS
          > file was updated, which really shouldn't be hard for any weblogging
          > software to do, whether it's static or dynamic.
          >
          > Conditional GET reduces the cost of querying an RSS file to a few
          > hundred
          > bytes each way. It's simple, part of a standard that has been around
          > over
          > a decade, and puts the minimum effort on both producers and consumers
          > of
          > RSS feeds.
          >
          > (Note, using a conditional GET will save you the two-hits cost you'd
          > incur if you used HEAD then GET)
          >
          > If anyone needs an explanation of how to implement conditional get
          > either
          > on the client or server side, I'd be quite happy to write one.
          >
          > Charles Miller
          >
          > ------------------------ Yahoo! Groups Sponsor
          > ---------------------~-->
          > Get 128 Bit SSL Encryption!
          > http://us.click.yahoo.com/JjlUgA/vN2EAA/kG8FAA/nhFolB/TM
          > ---------------------------------------------------------------------
          > ~->
          >
          > To unsubscribe from this group, send an email to:
          > radio-dev-unsubscribe@yahoogroups.com
          >
          >
          >
          > Your use of Yahoo! Groups is subject to
          > http://docs.yahoo.com/info/terms/
          >
          >
        • Charles Miller
          ... HTTP 1.1 is 13 years old. I think we can assume that those servers relying on static content and a mainstream webserver (Radio Userland itself, Joel
          Message 4 of 14 , Oct 20, 2002
          • 0 Attachment
            Jeremiah Rogers propagated the following meme:
            > The problem with conditional-get is that asking for every server to
            > support that is a long shot.

            HTTP 1.1 is 13 years old. I think we can assume that those servers
            relying on static content and a mainstream webserver (Radio Userland
            itself, Joel Spolsky, MT users, etc) already support conditional get.

            That's a big, immediate win without getting even the slightest bit
            ambitious.

            > I wish every server would have a
            > last-modified header, and i wish every dynamic RSS script would honor
            > that, but they simply don't.

            Yes, it places a burden on servers that do not implement the standard.
            But the standard is Really Simple, and its adoption makes things easier
            for users of the servers (remember, this came up as an issue because
            a server administrator complained). If RSS aggregators play by the rules
            in a predictable fashion, then server admins will know exactly what
            to do to cut their bandwidth bills.

            > I'd like to see an example of how conditional-get works, it might
            > explain your stance better.

            HTTP Conditional Get for RSS Hackers:
            http://fishbowl.pastiche.org/archives/001132.html#001132

            Also,

            Here's how it works on my vanilla apache server. Previously, I've
            done a regular request, and remembered the value of the
            If-Modified-Since header. The conditional request takes less than 200
            bytes both ways. That's one percent of the size of my 20,000 byte RSS file.

            GET /index.xml HTTP/1.1
            Host: fishbowl.pastiche.org
            If-Modified-Since: Mon, 21 Oct 2002 03:26:41 GMT

            HTTP/1.1 304 Not Modified
            Date: Mon, 21 Oct 2002 03:27:26 GMT
            Server: Apache/1.3.20 (Unix) ApacheJServ/1.1.2 PHP/4.2.1 FrontPage/5.0.2.2510 Rewrit/1.1a
            ETag: "154060-1aa4-3db373f1"


            ... and that's the whole query, both ways.

            Charles Miller
          • Charles Miller
            ... Heh. Charles can t count. Three years. HTTP itself is 12 years old, though. Charles Miller
            Message 5 of 14 , Oct 20, 2002
            • 0 Attachment
              Charles Miller propagated the following meme:
              > HTTP 1.1 is 13 years old.

              Heh. Charles can't count. Three years. HTTP itself is 12 years old,
              though.

              Charles Miller
            • Jeremiah Rogers
              This still won t work with script s who s output changes when the actual script s file doesn t. I suppose we can lock those out though because there aren t too
              Message 6 of 14 , Oct 20, 2002
              • 0 Attachment
                This still won't work with script's who's output changes when the
                actual script's file doesn't. I suppose we can lock those out though
                because there aren't too many of them and they should be easy to change.

                Will the server kill a script after it's done giving off headers? If it
                doesn't than we'd still have a problem with scripts that would be
                accessing databases way too much.

                On Sunday, October 20, 2002, at 11:31 PM, Charles Miller wrote:

                > Jeremiah Rogers propagated the following meme:
                >> The problem with conditional-get is that asking for every server to
                >> support that is a long shot.
                >
                > HTTP 1.1 is 13 years old. I think we can assume that those servers
                > relying on static content and a mainstream webserver (Radio Userland
                > itself, Joel Spolsky, MT users, etc) already support conditional get.
                >
                > That's a big, immediate win without getting even the slightest bit
                > ambitious.
                >
                >> I wish every server would have a
                >> last-modified header, and i wish every dynamic RSS script would honor
                >> that, but they simply don't.
                >
                > Yes, it places a burden on servers that do not implement the standard.
                > But the standard is Really Simple, and its adoption makes things easier
                > for users of the servers (remember, this came up as an issue because
                > a server administrator complained). If RSS aggregators play by the
                > rules
                > in a predictable fashion, then server admins will know exactly what
                > to do to cut their bandwidth bills.
                >
                >> I'd like to see an example of how conditional-get works, it might
                >> explain your stance better.
                >
                > HTTP Conditional Get for RSS Hackers:
                > http://fishbowl.pastiche.org/archives/001132.html#001132
                >
                > Also,
                >
                > Here's how it works on my vanilla apache server. Previously, I've
                > done a regular request, and remembered the value of the
                > If-Modified-Since header. The conditional request takes less than 200
                > bytes both ways. That's one percent of the size of my 20,000 byte RSS
                > file.
                >
                > GET /index.xml HTTP/1.1
                > Host: fishbowl.pastiche.org
                > If-Modified-Since: Mon, 21 Oct 2002 03:26:41 GMT
                >
                > HTTP/1.1 304 Not Modified
                > Date: Mon, 21 Oct 2002 03:27:26 GMT
                > Server: Apache/1.3.20 (Unix) ApacheJServ/1.1.2 PHP/4.2.1
                > FrontPage/5.0.2.2510 Rewrit/1.1a
                > ETag: "154060-1aa4-3db373f1"
                >
                >
                > ... and that's the whole query, both ways.
                >
                > Charles Miller
                >
                > ------------------------ Yahoo! Groups Sponsor
                > ---------------------~-->
                > 4 DVDs Free +s&p Join Now
                > http://us.click.yahoo.com/pt6YBB/NXiEAA/jd3IAA/nhFolB/TM
                > ---------------------------------------------------------------------
                > ~->
                >
                > To unsubscribe from this group, send an email to:
                > radio-dev-unsubscribe@yahoogroups.com
                >
                >
                >
                > Your use of Yahoo! Groups is subject to
                > http://docs.yahoo.com/info/terms/
                >
                >
              • Charles Miller
                ... I m sorry, I guess my explanation wasn t quite as clear as I thought it was when I wrote it. (happens to me a lot) For weblogs running on static files, the
                Message 7 of 14 , Oct 20, 2002
                • 0 Attachment
                  Jeremiah Rogers propagated the following meme:
                  > This still won't work with script's who's output changes when the
                  > actual script's file doesn't. I suppose we can lock those out though
                  > because there aren't too many of them and they should be easy to change.
                  >
                  > Will the server kill a script after it's done giving off headers? If it
                  > doesn't than we'd still have a problem with scripts that would be
                  > accessing databases way too much.

                  I'm sorry, I guess my explanation wasn't quite as clear as I thought it
                  was when I wrote it. (happens to me a lot)

                  For weblogs running on static files, the webserver already does what we
                  need it to do.

                  For weblogs generating RSS feeds dynamically on the fly, we don't trust
                  the server, we have the script do the header-matching and have the
                  script decide whether to send back a 200 reply plus the RSS file, or an
                  empty 304 reply.

                  Charles Miller
                • Jeremiah Rogers
                  cool :) Sorta sucks though b/c I was looking forward to writing that XMLRPC interface. I need something to show to people when they say oh, so you re a
                  Message 8 of 14 , Oct 20, 2002
                  • 0 Attachment
                    cool :)

                    Sorta sucks though b/c I was looking forward to writing that XMLRPC
                    interface. I need something to show to people when they say "oh, so
                    you're a programmer eh?". Oh well.

                    This is what we call the "oh shit I really need something to show to
                    colleges so they'll let me in" routine.

                    -Jeremiah
                  • Garth Kidd
                    Mikel Maron also has his RCS Aggregator Cache: http://radio.weblogs.com/0100875/outlines/rcsAggregatorCache/ It s much more bandwidth and storage intensive for
                    Message 9 of 14 , Oct 20, 2002
                    • 0 Attachment
                      Mikel Maron also has his RCS Aggregator Cache:

                      http://radio.weblogs.com/0100875/outlines/rcsAggregatorCache/

                      It's much more bandwidth and storage intensive for the central node,
                      which now has to serve the XML itself, but there's some actual code
                      there, which is handy. :)

                      Mikel, what does your protocol look like?

                      > I really like that interface but I'm wondering if making the server
                      > handle subscriptions really is the right thing to do. Another thing
                      > that worries me is that you appear to want to create a new
                      > user every time the script runs (which is why you use clock.now()).

                      Aah. I forgot to mention something:

                      You register only once. You poll as long as you're up and running. If
                      you remember your registration details, you can keep polling next time
                      you're up, but there's no requirement that the server take any efforts
                      to remember you. That said, it has fair incentive: if it forgets, you're
                      going to try and re-register again.

                      > I don't really like that idea because it will fill the server with
                      > dead users.

                      The "attempt re-register if you have any trouble" behaviour lets the
                      server cull the list whenever necessary.

                      > I like your interface, I'm wondering if we should just return a
                      > last-modified timestamp when the aggregator gives us the URI or if
                      > actually handling the subscriptions is the right thing to do.

                      Sending up a list of 300 URLs each time you poll is a little nasty.

                      Speaking of which, we'd also want a batchedRegisterFeed(myGuid,
                      mySessionPassword, urlList) call.

                      ---- overcomplication threshold is here ----

                      Later, we might want to add additional information in a

                      changedFeedDetails(myGuid, mySessionPassword [, excludeStructMembers
                      [, includeStructMembers]])

                      call returning a list of structs (.url, .pollTime, .xml, .whatever), but
                      for the first rollout a simple list of strings is fine. Then
                      batchedRegisterFeed2(myGuid, mySessionPassword, feedStructList) would
                      make sense.

                      Note that changedFeedDetails could be used to return the full XML, but
                      having the full XML for 300 feeds could well break a lot of XML-RPC
                      stacks, hence the exclude/include filters. A separate
                      feedContents(myGuid, mySessionPassword, feedurl) call would probably
                      make more sense.

                      Again, everything beyond the overcomplication threshold should only be
                      tackled if the first version of the protocol gets off the ground. There
                      are only incremental gains to be had (e.g. avoiding a redundant poll of
                      everything after registration), as opposed to the pretty massive gain to
                      be had from the initial centralisation. Sure, it's fun, but let's leave
                      it to the next version. :)

                      Back to authentication, though:

                      One change to hack into the first version might be an optional (does
                      XML-RPC do optional?) additionalInformation struct argument to the
                      registerMe call. That'd permit specific client-server authentication,
                      for which .username and .password should be reserved.

                      <think/>

                      Should we just define sessionid = registerMe(username, password), with a
                      convention that open servers accept a URI as a username with whatever
                      password is supplied, and then respond only to that password for that
                      username until it's re-registered? That gives us all the flexibility of
                      the initial protocol, plus easier authentication for the people that'd
                      want to do that (e.g. UserLand, who rely on people buying Radio to pay
                      for those server resources).

                      > Either way, I should hopefully be able to support it in a few
                      > days, and maybe Userland will support it too. That'd be neat. - J

                      Damn, that's quick! :)

                      Regards,
                      Garth.
                    • Charles Miller
                      ... Could be worse. This week I had to back two days solid work out of CVS. Beautiful code, too. Elegant, simple, useful code that formed the centre of a
                      Message 10 of 14 , Oct 20, 2002
                      • 0 Attachment
                        Jeremiah Rogers propagated the following meme:
                        > Sorta sucks though b/c I was looking forward to writing that XMLRPC
                        > interface. I need something to show to people when they say "oh, so
                        > you're a programmer eh?". Oh well.

                        Could be worse. This week I had to back two days solid work out of
                        CVS. Beautiful code, too. Elegant, simple, useful code that formed
                        the centre of a decision engine that allowed any part of the application
                        to find the answer to the question "How do I do X to Y?"

                        After an hour of "discussion", a co-worker convinced me that we don't
                        need an elegant general solution, we just need two config files.

                        Bleh. There's a lesson in humility there somewhere, but damnit, I
                        don't _want_ to be more humble.

                        > This is what we call the "oh shit I really need something to show to
                        > colleges so they'll let me in" routine.

                        That's why Open Source projects exist. :)

                        Charles Miller
                      • Jeremy Bowers
                        ... The problem is, as Dave pointed out on the thread. is that this only creates a linear savings over time. According to my copy of RU, Scripting News has
                        Message 11 of 14 , Oct 21, 2002
                        • 0 Attachment
                          Charles Miller wrote:
                          > I still don't see what the problem is with using simple HTTP conditional
                          > GET.... Conditional GET reduces the cost of querying an RSS file to a few hundred
                          > bytes each way.

                          The problem is, as Dave pointed out on the thread. is that this only
                          creates a linear savings over time. According to my copy of RU,
                          Scripting News has 5000+ subscriptions, and AFAIK, that's *just* RU
                          subscribers on the main Userland RCS. It could be several times higher,
                          I don't know.

                          5000 * 500 * 24 = 60(decimal)MB a day, 420MB a week, about 1.5(real)GB a
                          month. The whole (long-term) goal of this is to scale the system in a
                          rapidly-approaching era where 5000 subscribers is at *best* a "medium"
                          sized site, and we're already hitting monthly limits on many ISP
                          accounts with *just* the conditional get.

                          Imagine a world where a college student's site such as Hack The Planet
                          (before he graduated and got a job) gets 100,000 subscriptions. He
                          updates too often for exponential backdown to be of any serious use (you
                          can eke out a factor of two or so with aggresive settings, but not much
                          more), and 100,000 * 500 * 24 is a gig a day _just_ for the conditional
                          gets; send out 10K on average every fourth hourly request for the actual
                          content and you're into 100,000 * 10,000 * 6 = 6GB/day. That's an
                          expensive site; my ISP became annoyed and charged me extra for much less
                          then that.

                          Now, imagine that you're not imagining... because in the very near
                          future, that sort of thing is going to become a reality. I think we've
                          seen the first couple of 'mainstream' news articles about aggregators...
                          as that continues, the number of aggregator users is going to grow
                          extremely quickly. It probably won't affect any given person, but the
                          promise of personal publishing is made significantly weaker if the
                          popular people are charged stiffly for their popularity.

                          If the aggregator market was going to stay the same as it is now, we'd
                          just tell Joel to suck it up and move on with life. ;-) We really need
                          to take this opportunity to attack the problem and solve it for the day
                          when Joel's feed would be considered "nothing special".

                          This is why in that thread I phrased my suggesting in terms of "quick
                          relief". Even an exponential backdown + conditional GET is still going
                          to be a lot of bytes doing redundent work. (Depending on how you look at
                          it, exponential backdown can be considered only a linear gain bounded by
                          the slowest you'll ever scan the site, and that only applies to slow
                          sites anyhow; we want to work with quickly-moving ones too.) That's
                          still only "quick relief"; it doesn't solve the fundamental problems.

                          (If applying such a fix makes people complacent about the real issues,
                          then it may even have negative value over the long haul.)

                          Going with Morpheus would be good because they have that swarming thing
                          which is nifty; assuming they've implemented it such that all users of
                          the file can share it, and some good algorithm for finding good
                          swarmers, that's probably the rough mathematical and practical ideal.

                          Having a cloud of centralized servers that can do the scanning and
                          report the results back to some number of subscribers is the other
                          architecture that can work, as it directly attacks the linearity
                          problem. Basically you're putting a log factor on the number of useless
                          scans, especially if you have little networks of servers sharing the
                          results with each other over IM or something. (Doesn't have to be one
                          ubernetwork... in fact, it probably shouldn't be.) Then the last
                          question is whether such a server send the client the actual RSS without
                          hitting the main site (as the RCS cache does), or just sends the client
                          to the main site. At that point, one way or another, the data must be sent.

                          Take the 100,000 subscribers scenario above, and consider the RCS cache
                          approach. Give the original RSS-providing site enough intelligence to
                          *require* people to subscribe through a collation server (which now that
                          I think about it probably will happen eventually), and assume the Zipf
                          server size distribution (which is the normal Internet distribution, and
                          I think the forces that produce the Zipf distribution will largely hold
                          for aggregator collation servers (or cluters) too although perhaps not
                          quite as strongly), and the scan rate on the site itself might only be
                          10 or 20 hits an hour. Even serving the whole RSS is irrelevant at that
                          point.

                          Extra bonus points if the central servers do a conditional GET or
                          exponential backdown. ;-)

                          Indidentally, one consequence of this analysis that I had not considered
                          before emerges. The central servers model is useless if the central
                          servers aren't caching the RSS and serving it directly to their clients.
                          If the original site still has to serve the RSS file to every
                          subscriber, gigabytes + a day is still inevitable. (Clearly, the better
                          and/or bigger collating servers are going to have to be for-pay
                          services; little ones for 5-100 people might be able to run
                          out-of-pocket, but larger ones will themselves need a lot of bandwidth
                          to serve out the RSS.) Obvious in hindsight, but I hadn't thought about
                          it clearly.

                          The P2P solutions become more attractive, but have their own obvious
                          issues, in particular "Will somebody who owns a network let me use it
                          this way, or do I have to build my own?" ;-)

                          Also, to wrap one other point into this email that isn't quite worth a
                          seperate mailing:

                          Charles Miller wrote:
                          > HTTP 1.1 is [3]* years old. I think we can assume that those servers
                          > relying on static content and a mainstream webserver (Radio Userland
                          > itself, Joel Spolsky, MT users, etc) already support conditional get.

                          (*: original 11, later corrected by Charles himself)

                          I would not think we can assume. If you produce tests, I would believe
                          it, but I would not assume it.
                        • Dave Winer
                          Jeremy it s a linear saving but it s still worth doing. The next thing on my todo list is etag support in Radio s aggregator. And Morpheus or more generally,
                          Message 12 of 14 , Oct 21, 2002
                          • 0 Attachment
                            Jeremy it's a linear saving but it's still worth doing.

                            The next thing on my todo list is etag support in Radio's aggregator.

                            And Morpheus or more generally, Gnutella, may be the longterm answer.

                            And longterm isn't that far away.

                            Dave
                          • Jeremy Bowers
                            ... *chuckle* Absolutely. I think I mentioned on the thread on Phil s site that these are still The Right Thing and should be included in any system
                            Message 13 of 14 , Oct 21, 2002
                            • 0 Attachment
                              Dave Winer wrote:
                              > Jeremy it's a linear saving but it's still worth doing.

                              *chuckle* Absolutely. I think I mentioned on the thread on Phil's site
                              that these are still The Right Thing and should be included in any
                              system architecture anyhow.

                              > And Morpheus or more generally, Gnutella, may be the longterm answer.

                              Not Gnutella; that system is already unreliable through sheer
                              congestion. It would need serious architectural improvement before it
                              could do anything but *amplify* the problem. I think you'd still need to
                              move towards a partnership with Morpheus or other commercial P2P
                              provider longterm...

                              > And longterm isn't that far away.

                              If you want open source for the independence, Freenet may actually be
                              the way to go on this. It's not a perfect match but it may be workable,
                              and it has other attributes that may be desirable long-term. I think
                              I'll look more closely at Freenet in some of my spare time this week,
                              and see if it might be useful in this context.

                              Page summary on Freenet:

                              http://freenetproject.org/cgi-bin/twiki/view/Main/WhatIs

                              Summary on publishing (esp. below the <hr>):

                              http://freenetproject.org/cgi-bin/twiki/view/Main/Publishing
                            • mikel_maron
                              ... Here s the lowdown on the RCS Cache. If it s helpful, please feel free to use the code. This tool is piggy backed on the RCS s builtin 100 Most Popular
                              Message 14 of 14 , Oct 21, 2002
                              • 0 Attachment
                                --- In radio-dev@y..., "Garth Kidd" <yahoo-spam@d...> wrote:
                                > Mikel Maron also has his RCS Aggregator Cache:
                                >
                                > http://radio.weblogs.com/0100875/outlines/rcsAggregatorCache/
                                >
                                > It's much more bandwidth and storage intensive for the central node,
                                > which now has to serve the XML itself, but there's some actual code
                                > there, which is handy. :)
                                >
                                > Mikel, what does your protocol look like?

                                Here's the lowdown on the RCS Cache. If it's helpful, please feel free
                                to use the code.

                                This tool is piggy backed on the RCS's builtin 100 Most Popular
                                Subscriptions feature. Both the client and server code is packed into
                                the same tool.

                                The RCS maintains a list of the 100 most popular subscriptions of its
                                users. When RSS Caching is enabled, periodically (presently 4 times an
                                hour) the RCS requests and stores these feeds. It only stores the raw
                                XML, and does not process them. A GET interface is provided to clients
                                to request the cached feeds.

                                For the Radio client, when the tool is enabled, a modified version of
                                xml.aggregator.everyMinute is placed in the scheduler. It also checks
                                that the RCS has Caching enabled, via an XMLRPC call. This modified
                                version or everyMinute will request and process the RSS Hotlist from
                                the RCS, then run as a normal aggregator. Except, if a user's
                                subscription is included in the Hotlist, the RCS Cache is queried for
                                the feed. If that fails, the request is made directly from the provider.

                                (Note, that the aggregator code was copied from Radio.root, and
                                modified. It's probably now a bit out of sync with the original.)

                                Mikel
                              Your message has been successfully submitted and would be delivered to recipients shortly.