Loading ...
Sorry, an error occurred while loading the content.

nondeterministic acceptance tests

Expand Messages
  • David Carlton
    I ve been having a very helpful discussion with Phlip on comp.software.extreme-programming on nondeterministic acceptance tests, where failures can t be solved
    Message 1 of 8 , Nov 6, 2005
    • 0 Attachment
      I've been having a very helpful discussion with Phlip on
      comp.software.extreme-programming on nondeterministic acceptance
      tests, where failures can't be solved by reverting code; I was hoping
      that the members of this mailing list might help me with a specific
      problem I'm having in this area.

      Here's the basic setup; I apologize for its length. (But if it were
      simpler, it would be easier to avoid the nondeterminism. Which may
      well be a design lesson that we should have taken into mind some time
      ago, but that's a bit beyond the scope of this particular question.)

      The system has multiple interacting programs; call them A, B, and C.
      A sends a command to B. B starts processing data, and sends the
      processed data to C in chunks. In fact, there are two copies of C,
      call them C1 and C2, both storing the same data, for redundancy and/or
      load balancing purposes. And B should report back to A whether this
      has successfully finished, whether there's been an error, and, in the
      latter case, what sort of error this is.

      The story is: verify that we can kill program C1 without unduly
      affecting the rest of the system. So what that concretely means is
      that B should continue sending the data to C2, that B should report
      back to A at the end that it's finished sending the data to C2 and
      that there was an error with C1. And B should free resources
      appropriately, including resources associated with its communications
      with C1.

      That all translates pretty directly into an acceptance test: A sends
      the command, we kill program C1, we let things run until the expected
      completion time, and verify that the data is at C2, that A has
      received the expected response, and that resources haven't been
      leaked. We have code to do all of this; no problem.

      The problem is that, depending on what's going on when C1 gets killed,
      the details of what's going on differ, making this single test in
      effect a family of related tests. Some possible scenarios:

      * B is about to tell C1 to expect some data.
      * B has told C1 to expect data, but hasn't gotten an ack.
      * B has gotten the ack, but hasn't sent the data.
      * B is in the middle of sending data.
      * B has sent the data, hasn't gotten an ack.
      * B has sent the data and gotten an ack.

      And, while all of these are going on in the B<->C1 communication, the
      B<->C2 communication can be in any of the same states. So there are a
      lot of cases. And it's pretty hard (for me, at least) to design an
      acceptance test that can hit all of these cases; even if I could, I'm
      not convinced that doing so would be wise (or, for that matter, that
      it wouldn't be wise).

      And I'm not sure my list of scenarios above is exhaustive; at the time
      that we tried to implement this story, we didn't have a lot of
      experience with what could happen if one of the programs went down.

      The result was that we tried to implement the story, ran the
      acceptance test several times, and it passed. We checked it in, and
      over the next few months, it failed occasionally; each failure we
      managed to reproduce in a unit test and fix. The number of failures
      in this (and related stories) went down over time; I suspect that
      we're in pretty good shape but that defects remain.


      That's where we are; I apologize for the length. And the reason why
      I'm posting is that I'm having a hard time viewing this through XP
      lenses to see where (if anywhere) we went wrong. Some possibilities:

      1) The scenario is fine as long as the Customer agrees; in this case,
      the initial story might be that killing a program works at least X%
      of the time, and (if the Customer isn't happy releasing a product
      for that value of X) there should be followup stories where the
      value of X increases.

      2) We screwed up in our unit testing: the initial unit tests should
      have covered all of the possible failure conditions.

      3) We should have had more acceptance tests covering a wider range of
      what's going on when the program gets killed.

      My current feeling is that a combination of 1 and 2 makes sense, but
      I'd love to hear other people's reactions.

      David Carlton
      carlton@...
    • acockburn@aol.com
      I would consider two avenues: First, use unit rather than acceptance tests, and replace C1 and C2 with mocks that deliberately abend at those key moment. Thus
      Message 2 of 8 , Nov 6, 2005
      • 0 Attachment
        I would consider two avenues:

        First, use unit rather than acceptance tests, and replace C1 and C2 with
        mocks that deliberately abend at those key moment. Thus you test B.

        Second, consider using random numbers to decide when to kill C1, and
        instrument the system to know when that is ... then print to a log and scan for
        those conditions. I.e., let long random runs generate the situations you need
        rather than instrumenting them directly.

        Alistair


        In a message dated 11/6/2005 7:29:36 P.M. Mountain Standard Time,
        extremeprogramming@yahoogroups.com writes:



        The problem is that, depending on what's going on when C1 gets killed,
        the details of what's going on differ, making this single test in
        effect a family of related tests. Some possible scenarios:

        * B is about to tell C1 to expect some data.
        * B has told C1 to expect data, but hasn't gotten an ack.
        * B has gotten the ack, but hasn't sent the data.
        * B is in the middle of sending data.
        * B has sent the data, hasn't gotten an ack.
        * B has sent the data and gotten an ack.

        And, while all of these are going on in the B<->C1 communication, the
        B<->C2 communication can be in any of the same states. So there are a
        lot of cases. And it's pretty hard (for me, at least) to design an
        acceptance test that can hit all of these cases; even if I could, I'm
        not convinced that doing so would be wise (or, for that matter, that
        it wouldn't be wise).






        ==============================================

        Alistair Cockburn
        Humans and Technology

        801.582.3162 | 1814 Ft Douglas Cir | Salt Lake City, UT 84103
        http://alistair.cockburn.us/ | _acockburn@..._
        (mailto:acockburn@...)

        ==============================================

        "La perfection est atteinte non quand il ne reste rien a ajouter,
        mais quand il ne reste rien a enlever." (Saint-Exupery)

        "The first thing to build is trust." -- Brad Appleton

        ==============================================




        [Non-text portions of this message have been removed]
      • William Pietri
        ... I wouldn t say you went wrong. Part of the process is adapting the approach to local circumstances, and it sounds like you re doing that just fine. That
        Message 3 of 8 , Nov 7, 2005
        • 0 Attachment
          David Carlton wrote:

          >The result was that we tried to implement the story, ran the
          >acceptance test several times, and it passed. We checked it in, and
          >over the next few months, it failed occasionally; each failure we
          >managed to reproduce in a unit test and fix. The number of failures
          >in this (and related stories) went down over time; I suspect that
          >we're in pretty good shape but that defects remain.
          >
          >
          >That's where we are; I apologize for the length. And the reason why
          >I'm posting is that I'm having a hard time viewing this through XP
          >lenses to see where (if anywhere) we went wrong.
          >

          I wouldn't say you went wrong. Part of the process is adapting the
          approach to local circumstances, and it sounds like you're doing that
          just fine.

          That said, one of my markers for a team that's doing well is that they
          don't fear that monsters lurk in the code. So if there are any tests you
          worry might still have nondeterministic bugs lurking, I'd suggest you
          get a spare machine and have it run those tests (or perhaps all tests)
          continuously until you have no more fear about this. Things that seem
          nondeterministic at the the individual level can often be managed
          statistically.

          William
        • David Carlton
          ... That s a possibility I hadn t considered. The programs in question are pretty big to unit test as an entity, but we could give it a try. ... I hadn t
          Message 4 of 8 , Nov 7, 2005
          • 0 Attachment
            On Sun, 6 Nov 2005 21:59:45 EST, acockburn@... said:

            > First, use unit rather than acceptance tests, and replace C1 and C2
            > with mocks that deliberately abend at those key moment. Thus you
            > test B.

            That's a possibility I hadn't considered. The programs in question
            are pretty big to unit test as an entity, but we could give it a try.

            > Second, consider using random numbers to decide when to kill C1, and
            > instrument the system to know when that is ... then print to a log
            > and scan for those conditions. I.e., let long random runs generate
            > the situations you need rather than instrumenting them directly.

            I hadn't thought of that in this context, either. Though we've used a
            similar idea in other circumstances, running nightly tests on input
            with a random component, where the seed varies from night to night but
            where the test is reproducible given the seed. I'm not sure it will
            work in this context, because I'm not sure we could control the timing
            of the kills precisely enough, but I'll think about it.

            Thanks for the suggestions!

            David Carlton
            carlton@...
          • David Carlton
            ... Thanks for the reassurance. :-) ... That s a good idea. Fortunately, the existing tests shouldn t be biased towards any particular one of the failure
            Message 5 of 8 , Nov 7, 2005
            • 0 Attachment
              On Mon, 07 Nov 2005 09:45:02 -0800, William Pietri <william@...> said:

              > I wouldn't say you went wrong. Part of the process is adapting the
              > approach to local circumstances, and it sounds like you're doing
              > that just fine.

              Thanks for the reassurance. :-)

              > That said, one of my markers for a team that's doing well is that
              > they don't fear that monsters lurk in the code. So if there are any
              > tests you worry might still have nondeterministic bugs lurking, I'd
              > suggest you get a spare machine and have it run those tests (or
              > perhaps all tests) continuously until you have no more fear about
              > this. Things that seem nondeterministic at the the individual level
              > can often be managed statistically.

              That's a good idea. Fortunately, the existing tests shouldn't be
              biased towards any particular one of the failure possibilities, so
              statistical evidence really should help.

              David Carlton
              carlton@...
            • Ian Collins
              ... You should be able to test most, if not all, of this with unit tests in the communications stack in B. Bugs found in system acceptance tests are much
              Message 6 of 8 , Nov 7, 2005
              • 0 Attachment
                David Carlton wrote:

                >The system has multiple interacting programs; call them A, B, and C.
                >A sends a command to B. B starts processing data, and sends the
                >processed data to C in chunks. In fact, there are two copies of C,
                >call them C1 and C2, both storing the same data, for redundancy and/or
                >load balancing purposes. And B should report back to A whether this
                >has successfully finished, whether there's been an error, and, in the
                >latter case, what sort of error this is.
                >
                >The story is: verify that we can kill program C1 without unduly
                >affecting the rest of the system. So what that concretely means is
                >that B should continue sending the data to C2, that B should report
                >back to A at the end that it's finished sending the data to C2 and
                >that there was an error with C1. And B should free resources
                >appropriately, including resources associated with its communications
                >with C1.
                >
                >
                >
                >
                You should be able to test most, if not all, of this with unit tests in
                the communications stack in B. Bugs found in system acceptance tests
                are much harder to nail than those found with unit tests.

                Test the layer that should detect the loss of communications with a C,
                test the error propagates correctly.

                Test the layer that should clean up does so when it receives an error.

                Test the reporting code generates the correct report.

                I've built a lot of systems like this and it's always better to unit
                test all of the scenarios in the communication code rather than try and
                test the system as a whole.

                Back this up with some random acceptance tests, but treat this as a
                basic sanity test to flush out conditions that were missed in the unit
                tests (there will be some!). The fun part is adding unit tests for one
                of these when it is found!

                Ian
              • David Carlton
                ... I was going to say that hasn t been the case: we didn t miss bugs there. (Or rather, the system testing hasn t yet turned up bugs there.) Except that the
                Message 7 of 8 , Nov 7, 2005
                • 0 Attachment
                  On Tue, 08 Nov 2005 17:22:10 +1300, Ian Collins <ian@...> said:

                  > You should be able to test most, if not all, of this with unit tests in
                  > the communications stack in B.

                  I was going to say that hasn't been the case: we didn't miss bugs
                  there. (Or rather, the system testing hasn't yet turned up bugs
                  there.) Except that the communications stack has gotten a lot more
                  scrutiny over the years than the rest of the system, so I'm sure
                  that's the reason why it didn't cause problems in this scenario.

                  > Test the layer that should detect the loss of communications with a C,
                  > test the error propagates correctly.

                  > Test the layer that should clean up does so when it receives an error.

                  > Test the reporting code generates the correct report.

                  Yeah, that sounds like a good plan.

                  > Back this up with some random acceptance tests, but treat this as a
                  > basic sanity test to flush out conditions that were missed in the
                  > unit tests (there will be some!). The fun part is adding unit tests
                  > for one of these when it is found!

                  Indeed, there were some, but it wasn't too hard to find them and write
                  unit tests for them. Though I was impressed at how quickly my team
                  members found the memory leak that system testing turned up (we're
                  using C++) - I was prepared for that one to take weeks to find.
                  Either a sign that we're doing something right, or that my team is
                  really good, or both.

                  David Carlton
                  carlton@...
                • Ian Collins
                  ... Or they were using decent tools with memory leak detection :) Ian
                  Message 8 of 8 , Nov 7, 2005
                  • 0 Attachment
                    David Carlton wrote:

                    >>Back this up with some random acceptance tests, but treat this as a
                    >>basic sanity test to flush out conditions that were missed in the
                    >>unit tests (there will be some!). The fun part is adding unit tests
                    >>for one of these when it is found!
                    >>
                    >>
                    >
                    >Indeed, there were some, but it wasn't too hard to find them and write
                    >unit tests for them. Though I was impressed at how quickly my team
                    >members found the memory leak that system testing turned up (we're
                    >using C++) - I was prepared for that one to take weeks to find.
                    >Either a sign that we're doing something right, or that my team is
                    >really good, or both.
                    >
                    >
                    >
                    Or they were using decent tools with memory leak detection :)

                    Ian
                  Your message has been successfully submitted and would be delivered to recipients shortly.