Loading ...
Sorry, an error occurred while loading the content.
 

notes on swarmcast

Expand Messages
  • Lucas Gonze
    Random notes taken while reading the swarmcast documentation. Many assumptions made, probably many mistakes in understanding etc. The first question people
    Message 1 of 8 , Feb 1, 2001
      Random notes taken while reading the swarmcast documentation. Many assumptions made,
      probably many mistakes in understanding etc.

      The first question people ask about swarmcast is what the selfish incentive is to be
      an uploader. At first glance the value Swarmcast is seller centric rather than buyer
      centric. What I mean is that reducing the cost of uploading is, in the short term,
      the seller's problem. The answer, my guess anyway, is that it is to a downloader's
      advantage to use Swarmcast because they get faster downloads.

      I still have a hard time understanding why a downloader should have to give up half
      their bandwidth for uploads during the download. The system would be twice as fast
      if a downloader never had to contribute bandwidth until a download was done. I think
      the reason that download and upload are interleaved within a node is to force tit for
      tat on the spot. It seems to me that this is a fairly flimsy social contract. for
      one thing, anyone can make a minor alteration to the source to disable uploading.
      ...I'd look for a more compelling way to enforce tit for tat. One possibility is
      just to say that tit for tat only has to be really good at the beginning, when the
      community is small. Later, when the community is large, there can be more
      freeloading.

      Packets are broadcast regardless of whether they are needed, so there is much
      flooding. The flooding is not a waste from the perspective of either the uploader or
      downloader, but it is from the perspective of the net as a whole; because it
      contributes to congestion. Social costs of this technology!

      Justin: how do you plan to bootstrap the network? Who are your initial target users?
      (Emacs and XEmacs, which are giant downloads, and where there is nobody making a
      profit who ought to be covering the bill, might be a good place to start).

      The main insight is that you can gang up to make virtual bandwidth. What an elegant
      piece of thinking.

      If you gang up modem users to act as virtual fat pipes, i.e. as Borg-like
      collectives, then you can apply swarmcast thinking to any bandwidth problem. For
      example, the Gnutella dialup barrier, as enumerated by Clip2. Have very low
      bandwidth nodes join collectives where they share messaging i/o. ...a cool thing is
      that this avoids the need for artificial constructs like the Clip2 Reflector, which
      is a fixed point in the topology.

      Using the word "closeness" to signify "high bandwidth" or "fast response" bugs me.
      Closeness implies an internal characteristic, an aspect of a node's self that other
      nodes really can't know anything about it. This is like assuming that other people
      feel like you do, even though really all you know is that they are also human and
      probably do work by similar mechanisms. "fast response", on the other hand, focuses
      on externalized, measurable behaviors.

      I think that the way FEC is used is that it allows a node that has one packet to get
      the rest. The doc uses the phrase "maximized and equal utility", which implies to me
      that every packet has some extra qualities which allow for reconstructing lost
      packets. Can you say which it is, Justin?

      - Lucas
    • Wesley M. Felter
      ... Most bandwidth-constrained links on the Net are full-duplex, so sending data doesn t necessarily require giving up incoming bandwidth. Dowloading big files
      Message 2 of 8 , Feb 1, 2001
        On Thu, 1 Feb 2001, Lucas Gonze wrote:

        > I still have a hard time understanding why a downloader should have to give up half
        > their bandwidth for uploads during the download.

        Most bandwidth-constrained links on the Net are full-duplex, so sending
        data doesn't necessarily require giving up incoming bandwidth. Dowloading
        big files is a great example: While I'm Napstering the outgoing part of my
        link is essentially unused. Why *not* relay some packets? Unless you
        meant something else that I didn't understand.

        Wesley Felter - wesf@... - http://www.cs.utexas.edu/users/wesf/
      • Lucas Gonze
        I never knew that!
        Message 3 of 8 , Feb 1, 2001
          I never knew that!



          > > I still have a hard time understanding why a downloader should have to
          > give up half
          > > their bandwidth for uploads during the download.
          >
          > Most bandwidth-constrained links on the Net are full-duplex, so sending
          > data doesn't necessarily require giving up incoming bandwidth. Dowloading
          > big files is a great example: While I'm Napstering the outgoing part of my
          > link is essentially unused. Why *not* relay some packets? Unless you
          > meant something else that I didn't understand.
          >
          > Wesley Felter - wesf@... - http://www.cs.utexas.edu/users/wesf/
        • Justin Chapweske
          ... Yes, less bandwidth for provider, faster download for consumer. It also completely flips the slashdot-effect on its head, instead of it being impossible
          Message 4 of 8 , Feb 1, 2001
            >
            > The first question people ask about swarmcast is what the selfish incentive is to be
            > an uploader. At first glance the value Swarmcast is seller centric rather than buyer
            > centric. What I mean is that reducing the cost of uploading is, in the short term,
            > the seller's problem. The answer, my guess anyway, is that it is to a downloader's
            > advantage to use Swarmcast because they get faster downloads.
            >

            Yes, less bandwidth for provider, faster download for consumer. It also completely
            flips the slashdot-effect on its head, instead of it being impossible to get the
            content, it can be retrieved quite quickly and reliably.

            > I still have a hard time understanding why a downloader should have to give up half
            > their bandwidth for uploads during the download. The system would be twice as fast
            > if a downloader never had to contribute bandwidth until a download was done. I think
            > the reason that download and upload are interleaved within a node is to force tit for
            > tat on the spot.

            Yes, the model that we aim for in Swarmcast is as close to real-time as possible, so
            each node should not have to stay up much longer than it took them to download. We
            want to be able to have a large percentage of the users shut off their computers
            immediately after downloading the content and have the system still operate.

            I'm curious as to what sort of network you're talking about where uploading will equally
            cut into the download speed. I realize that some DSLs do this but I don't see this as
            an issue in general. If I am mistaken please point me to some relevent information and
            we can quite easily throttle the uploads in favor of increased download speed.

            > It seems to me that this is a fairly flimsy social contract. for

            Why? It works brilliantly well for Napster to use simple psychological mechanisms, and
            we even have the advantage that each node can do useful work from the moment it gets its
            first packet. For those that choose to exit the application early before they've
            contributed as much as they've leeched we pop up a simple window that states that
            your karmic debt has yet to be repaid and tells you that it will exit automatically
            once that debt is repaid. If you really wish to exit you can of course do that.

            > one thing, anyone can make a minor alteration to the source to disable uploading.

            Well, to save you the trouble I just went ahead and implemented this feature for
            you. It can be done by setting the property swarmcast.client.contribute=false. Have
            fun with it :)

            I have absolutely no concerns about this because you need to understand the distribution
            model of Swarmcast. It is installed as a browser plugin that is distributed through the
            content provider, and they certainly won't distribute a client that doesn't contribute
            because it doesn't help them at all.

            > ...I'd look for a more compelling way to enforce tit for tat. One possibility is
            > just to say that tit for tat only has to be really good at the beginning, when the
            > community is small. Later, when the community is large, there can be more
            > freeloading.

            While the consumers are getting faster downloads, it is really the content providers who
            are winning the most because of the massive bandwidth savings for them. I certainly
            think its reasonable for the content provider to subsidize 20-30% of the bandwidth for
            the system considering it is pure savings off the top for them.

            Another point that everyone seems to be forgetting is that TCP/IP itself only works
            because of altruism. It is trivial for me to modify my TCP/IP stack to add pre-emptive
            ACKs and ignore congestion control mechanisms for massively faster downloads. Read
            http://arstechnica.com/reviews/2q00/networking/networking-1.html for more details.

            Those that are using Swarmcast in an more pure-distributed model may want to add
            more enforcable tit-for-tat mechanisms into their applications and would fully support
            any effort to do so. At this time I don't see endorsing an specific method but I
            think this is a very interesting area of research and am interested in all sorts of
            solutions.

            > Packets are broadcast regardless of whether they are needed, so there is much
            > flooding. The flooding is not a waste from the perspective of either the uploader or
            > downloader, but it is from the perspective of the net as a whole; because it
            > contributes to congestion. Social costs of this technology!

            They said the same thing about that world wide web thing with its crazy images and
            whatnot. And to think that the web didn't even have the benefit if being topologically/ISP
            friendly.

            >
            > Justin: how do you plan to bootstrap the network?

            Thats the wonderful thing, this sucker scales very quickly, so no bootstrapping is
            required. While their are a few number of users or the content is
            unpopular the content provider will providing most of the bandwidth, but as the content
            increases in popularity their bandwidth requirements level off.

            > Who are your initial target users?
            > (Emacs and XEmacs, which are giant downloads, and where there is nobody making a
            > profit who ought to be covering the bill, might be a good place to start).
            >

            For the application, anyone with large, popular content that wants to serve it out
            faster and cut their bandwidth costs. I'm just a developer, so for more info on this
            contact info@....

            For the source code, anyone who has a project where they want fast, scalable data
            distribution in a P2P fashion. For instance this stream broadcasting idea that was
            floating around on here would be a great application to build on top of the Swarmcast
            Core Library (SCL). There are also the low-level Forward Error
            Correction and Reliable Multicast libraries that are released under the LGPL
            and we encourage people to use them in any project either open source or commercial.

            > The main insight is that you can gang up to make virtual bandwidth. What an elegant
            > piece of thinking.

            *shrug* no insight, I just figured it'd be cool to do RAID across the internet :P

            >
            > If you gang up modem users to act as virtual fat pipes, i.e. as Borg-like
            > collectives, then you can apply swarmcast thinking to any bandwidth problem. For
            > example, the Gnutella dialup barrier, as enumerated by Clip2. Have very low
            > bandwidth nodes join collectives where they share messaging i/o. ...a cool thing is
            > that this avoids the need for artificial constructs like the Clip2 Reflector, which
            > is a fixed point in the topology.
            >

            Yes, this is why we deliberatly avoid as much overhead as possible in order to fully
            aggregate low-bandwidth nodes.

            > Using the word "closeness" to signify "high bandwidth" or "fast response" bugs me.
            > Closeness implies an internal characteristic, an aspect of a node's self that other
            > nodes really can't know anything about it. This is like assuming that other people
            > feel like you do, even though really all you know is that they are also human and
            > probably do work by similar mechanisms. "fast response", on the other hand, focuses
            > on externalized, measurable behaviors.

            Yeah, Swarmcast doesn't use closeness because it relates to "high bandwidth", we simply
            use it so that the system is topologically/ISP friendly.

            >
            > I think that the way FEC is used is that it allows a node that has one packet to get
            > the rest. The doc uses the phrase "maximized and equal utility", which implies to me
            > that every packet has some extra qualities which allow for reconstructing lost
            > packets. Can you say which it is, Justin?
            >

            "maximized and equal utility" means that no matter how much of the file you have, there
            is a very high probability that ANY random packet you recieve will be useful in
            reconstructing the file. I think the big misinterpretation of how FEC is used in
            Swarmcast is that people think that there is such thing as a "lost packet", which would
            imply that the "lost packet" was more important than other packet.

            I would highly suggest reading Luigi Rizzo's papers on FEC and multicast. He originally
            wrote the FEC codes that we enhanced for use in Swarmcast and has done some amazing
            things with reliable multicast including an innovate scalable multicast congestion
            control algorithm.

            http://www.iet.unipi.it/~luigi/research.html

            --
            Justin Chapweske, Lead Swarmcast Developer, openCOLA Inc.
            http://www.sourceforge.net/projects/swarmcast/
          • Justin Chapweske
            ... Altruism is not the word I meant to use here, its more like beneficial laziness. Is there a more concise english word for this? -- Justin Chapweske,
            Message 5 of 8 , Feb 1, 2001
              >
              > Another point that everyone seems to be forgetting is that TCP/IP itself only works
              > because of altruism. It is trivial for me to modify my TCP/IP stack to add pre-emptive
              > ACKs and ignore congestion control mechanisms for massively faster downloads. Read
              > http://arstechnica.com/reviews/2q00/networking/networking-1.html for more details.
              >

              "Altruism" is not the word I meant to use here, its more like beneficial laziness. Is
              there a more concise english word for this?

              --
              Justin Chapweske, Lead Swarmcast Developer, openCOLA Inc.
              http://www.sourceforge.net/projects/swarmcast/
            • Lucas Gonze
              ... hmm. since the first requests to re-serve bytes will arrive considerably after the first bytes consumed few nodes will pay off their karmic debt. How is
              Message 6 of 8 , Feb 3, 2001
                > Yes, the model that we aim for in Swarmcast is as close to real-time as
                > possible, so
                > each node should not have to stay up much longer than it took them to download. We
                > want to be able to have a large percentage of the users shut off their computers
                > immediately after downloading the content and have the system still operate.

                hmm. since the first requests to re-serve bytes will arrive considerably after the
                first bytes consumed few nodes will pay off their karmic debt. How is it possible
                to not have to stay up longer than it took to download?

                > Well, to save you the trouble I just went ahead and implemented this feature for
                > you. It can be done by setting the property
                > swarmcast.client.contribute=false. Have
                > fun with it :)

                It was the right thing to do. They shoot horses, why not software?

                :)


                > I have absolutely no concerns about this because you need to understand
                > the distribution
                > model of Swarmcast. It is installed as a browser plugin that is
                > distributed through the
                > content provider, and they certainly won't distribute a client that
                > doesn't contribute
                > because it doesn't help them at all.

                Um. But the plugin is only distributed if the user doesn't have it already, right?
                If the user has it already, then their instance is persistent, and they may have
                turned off uploading.

                It seems to me that being invisible is the best way to keep people from turning off
                uploading. If they never really think about Swarmcast then they are less likely to
                mess around with their settings. What that suggests is that you should reconsider
                the nagware approach to enforcing tit for tat, idea being that the only real cost for
                most uploading is user attention.

                A useful attribute would be screensaver mode for upstream bandwidth. That is, if a
                user is not using upstream bandwidth for some period of time then Swarmcast can start
                up, and if some other app requests upstream bandwidth then Swarmcast shuts down.
                This would do good things for most client+server software I can think of.

                I think this would have to be implemented in the TCP driver. Anybody know what would
                be involved?

                > While the consumers are getting faster downloads, it is really the content
                > providers who
                > are winning the most because of the massive bandwidth savings for them.

                That is true, but it is not an inducement for me as a consumer to re-serve bytes.

                > I certainly
                > think its reasonable for the content provider to subsidize 20-30% of the
                > bandwidth for
                > the system considering it is pure savings off the top for them.

                Maybe a way to sell this to providers is as a way to handle demand spikes. Providers
                have to buy much more capacity than they need on average in order to handle spikes,
                and even then the spike can exceed their capacity. I believe that the cost of
                bandwidth as spike insurance is high.

                A simple marketing message for Swarmcast would be for buyers to compare the cost of
                it to the cost of excess bandwidth. This message is not the same as the message that
                Swarmcast is there for everyday use, even though that is also true.

                > > Packets are broadcast regardless of whether they are needed, so there is much
                > > flooding. The flooding is not a waste from the perspective of either
                > the uploader or
                > > downloader, but it is from the perspective of the net as a whole; because it
                > > contributes to congestion. Social costs of this technology!
                >
                > They said the same thing about that world wide web thing with its crazy images and
                > whatnot. And to think that the web didn't even have the benefit if being
                > topologically/ISP
                > friendly.

                I think it is the internet's job to support users, not the other way around. But if
                I did feel that I owed The Internet some moral debt, I would feel that Swarmcast was
                a fairly gross abuse in a way that is different from fat content. This is because
                Swarmcast is fundamentally about flooding.

                > > Using the word "closeness" to signify "high bandwidth" or "fast
                > response" bugs me.
                > > Closeness implies an internal characteristic, an aspect of a node's self
                > that other
                > > nodes really can't know anything about it. This is like assuming that
                > other people
                > > feel like you do, even though really all you know is that they are also human and
                > > probably do work by similar mechanisms. "fast response", on the other
                > hand, focuses
                > > on externalized, measurable behaviors.
                >
                > Yeah, Swarmcast doesn't use closeness because it relates to "high
                > bandwidth", we simply
                > use it so that the system is topologically/ISP friendly.

                Hmm. I asked you this question before and what you said was that closeness is really
                another word for high bandwidth/excellent pings/etc. So I stick to my main point.
                Closeness is the wrong concept and word here. The Freenet people abuse the term
                particularly badly, and my guess is that you are pretty close to Freenet.

                > "maximized and equal utility" means that no matter how much of the file
                > you have, there
                > is a very high probability that ANY random packet you recieve will be useful in
                > reconstructing the file. I think the big misinterpretation of how FEC is used in
                > Swarmcast is that people think that there is such thing as a "lost
                > packet", which would
                > imply that the "lost packet" was more important than other packet.

                Can you explain how it is that any random packet will be useful in reconstruction?

                - Lucas
              • Tony Kimball
                I m not Justin, but I can play him on TV. He seems to be underground, and I feel like testing my understanding of Swarmcast by pretending to know enough to
                Message 7 of 8 , Feb 4, 2001
                  I'm not Justin, but I can play him on TV. He seems to be underground,
                  and I feel like testing my understanding of Swarmcast by pretending to
                  know enough to intelligently reply.

                  Quoth Lucas Gonze <lucas@...>:

                  > hmm. since the first requests to re-serve bytes will arrive considerably after the
                  > first bytes consumed few nodes will pay off their karmic debt. How is it possible
                  > to not have to stay up longer than it took to download?

                  That is an asymptotic ideal. As the finer points of mesh management
                  are optimized, the ideal is approached more closely. Also, when one
                  first enters a Swarmmesh, one's incoming bandwidth is not initially
                  maximized, because it takes a little while for your participation to
                  be propagated to enough nodes to saturate your pipe. During that
                  time, you can push out more data than you take in, leaving you with a
                  positive karmic balance.

                  > Um. But the plugin is only distributed if the user doesn't have it already, right?
                  > If the user has it already, then their instance is persistent, and they may have
                  > turned off uploading.

                  My suggestion: It can be turned back on by the download metadata.

                  > ... if a user is not using upstream bandwidth for some period of time then Swarmcast can start
                  > up, and if some other app requests upstream bandwidth then Swarmcast shuts down.
                  > ... I think this would have to be implemented in the TCP driver. Anybody know what would
                  > be involved?

                  Amen. There are network activity monitors for every common platform,
                  which demonstrate that this can be done in the OS API, and does not
                  require a driver. It *will* come in time, as Swarmcast becomes more
                  feature-complete.

                  > That is true, but it is not an inducement for me as a consumer to re-serve bytes.

                  Your inducement is that you get a faster download and pay no
                  perceptible penalty. Admittedly the current demo makes the cost
                  more poignantly perceptible than it should be in mass deployment.

                  > > > Packets are broadcast regardless of whether they are needed, so there is much
                  > > > flooding. The flooding is not a waste from the perspective of either
                  > > the uploader or
                  > > > downloader, but it is from the perspective of the net as a whole; because it
                  > > > contributes to congestion. Social costs of this technology!
                  > >
                  > > They said the same thing about that world wide web thing with its crazy images and
                  > > whatnot. And to think that the web didn't even have the benefit if being
                  > > topologically/ISP
                  > > friendly.
                  >
                  > I think it is the internet's job to support users, not the other way around. But if
                  > I did feel that I owed The Internet some moral debt, I would feel that Swarmcast was
                  > a fairly gross abuse in a way that is different from fat content. This is because
                  > Swarmcast is fundamentally about flooding.

                  I think Justin did not address this point clearly: You seem to
                  think that Swarmcast increases network load; in fact, it decreases
                  it.

                  The heaviest network load lies along the backbone, where the de facto
                  star topology of the Internet concentrates cross-sectional traffic.
                  In Swarmcast, you are much more likely to recieve a packet from a
                  nearby neighbor. As a result, that data only has to cross the
                  backbone once in order to serve many clients, thus dramatically
                  reducing the load on the most heavily burdened links. The result is
                  that Swarmcast is not socially costly, but socially profitable.

                  > > Yeah, Swarmcast doesn't use closeness because it relates to "high
                  > > bandwidth", we simply
                  > > use it so that the system is topologically/ISP friendly.
                  >
                  > Hmm. I asked you this question before and what you said was that closeness is really
                  > another word for high bandwidth/excellent pings/etc. So I stick to my main point.
                  > Closeness is the wrong concept and word here. The Freenet people abuse the term
                  > particularly badly, and my guess is that you are pretty close to Freenet.

                  "Closeness" really means "closeness" for Swarmcast. The traffic is
                  supposed to be as local as possible. In the short term, closeness is
                  can be approximated by ping times. In my opinion (not necessarily
                  Justins) the long-term goal for Swarmcast should be to extract routing
                  information and use actual hop counts to compute closeness.
                  (But I think it's silly to argue about the semantics of 'closeness'.
                  It's not a technical term.)

                  > Can you explain how it is that any random packet will be useful in reconstruction?

                  Just like ECC memory. In a 12-bit hamming code for an 8-bit word, you
                  only need 9 bits of actual data to reconstruct the original 8-bit
                  word. The selection of those bits can be random. In this case,
                  selecting 9 bits at random with equal probability will probably not
                  produce 9 distinct bits, and thus will be inadequate to reproduce
                  the original word. But in Swarmcast the numbers are much larger, so
                  that the probability of a collision is vanishingly small. Hence,
                  in effect, *any* random packet suffices.
                • Justin Chapweske
                  ... I m not quite understanding what your suggestion is here and is probably just an issue of your not having seen the software yet. Basically it minimizes to
                  Message 8 of 8 , Feb 5, 2001
                    >
                    > > I have absolutely no concerns about this because you need to understand
                    > > the distribution
                    > > model of Swarmcast. It is installed as a browser plugin that is
                    > > distributed through the
                    > > content provider, and they certainly won't distribute a client that
                    > > doesn't contribute
                    > > because it doesn't help them at all.
                    >
                    > Um. But the plugin is only distributed if the user doesn't have it already, right?
                    > If the user has it already, then their instance is persistent, and they may have
                    > turned off uploading.
                    >
                    > It seems to me that being invisible is the best way to keep people from turning off
                    > uploading. If they never really think about Swarmcast then they are less likely to
                    > mess around with their settings. What that suggests is that you should reconsider
                    > the nagware approach to enforcing tit for tat, idea being that the only real cost for
                    > most uploading is user attention.
                    >

                    I'm not quite understanding what your suggestion is here and is probably
                    just an issue of your not having seen the software yet. Basically it
                    minimizes to a system tray icon like in Napster, but instead of running
                    indefinately like Napster does we try to be nicer and exit the
                    application automatically once their karmic debt is repaid. The only time
                    the user sees a nag is when they double click on the system tray icon and
                    select file->exit...then we just tell them that their karmic debt hasn't
                    been repaid to make them aware of it.

                    > >
                    > > They said the same thing about that world wide web thing with its crazy images and
                    > > whatnot. And to think that the web didn't even have the benefit if being
                    > > topologically/ISP
                    > > friendly.
                    >
                    > I think it is the internet's job to support users, not the other way around. But if
                    > I did feel that I owed The Internet some moral debt, I would feel that Swarmcast was
                    > a fairly gross abuse in a way that is different from fat content. This is because
                    > Swarmcast is fundamentally about flooding.
                    >

                    Swarmcast is not fundamentally about flooding, we do not send any more
                    data across the network than the same number of FTP downloads would. In
                    fact we SAVE bandwidth because we will often push these packets across
                    many fewer hops than FTP or HTTP would.

                    There is a valid argument that since Swarmcast makes content distribution
                    a lot cheaper/faster that it will drive more content to be distributed,
                    but if this is the case then it is purely the user's will that increases
                    the amount of content transferred and not because of some fundamental
                    "evil" of swarmcast. If there is any argument in this direction it should
                    be instead pointed at Napster which does not make ANY attempts to be ISP
                    friendly.

                    >
                    > Hmm. I asked you this question before and what you said was that closeness is really
                    > another word for high bandwidth/excellent pings/etc. So I stick to my main point.
                    > Closeness is the wrong concept and word here. The Freenet people abuse the term
                    > particularly badly, and my guess is that you are pretty close to Freenet.
                    >

                    I am open to any efficient and effective measure of determining
                    topological closeness, for now pings seem to be the most logical
                    solution.

                    >
                    > Can you explain how it is that any random packet will be useful in reconstruction?
                    >

                    Please see my previous conversation with Ben Houston regarding this. I
                    would also suggest you read Luigi's works if you are interested in
                    wondering how the FEC works in this space.

                    --
                    Justin Chapweske, Lead Swarmcast Developer, openCOLA Inc.
                    http://www.sourceforge.net/projects/swarmcast/
                  Your message has been successfully submitted and would be delivered to recipients shortly.