Loading ...
Sorry, an error occurred while loading the content.
 

Fwd: RE: [swtest-discuss] Test Driven Development: By Example...

Expand Messages
  • Cem Kaner
    I m cross-posting this from another list. The cited paper appears to provide negative evidence as to the value of test-first development. I have some
    Message 1 of 3 , Feb 1, 2003
      I'm cross-posting this from another list. The cited paper appears to
      provide negative evidence as to the value of test-first development. I have
      some methodological questions about the paper, maybe you can see some
      others. Odds are that this will be published in one of the traditional
      software engineering journals. Maybe someone would like to get a rebuttal
      ready in advance.

      -- cem kaner

      >From: "Gerold Keefer" <gkeefer@...>
      >To: <swtest-discuss@...>
      >Subject: RE: [swtest-discuss] Test Driven Development: By Example...
      >Date: Sat, 1 Feb 2003 21:26:50 +0100
      >X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)
      >Importance: Normal
      >X-Spam-Status: No, hits=-3.3 required=5.0
      > tests=AWL,IN_REP_TO,RCVD_IN_OSIRUSOFT_COM,SPAM_PHRASE_00_01,
      > USER_AGENT_OUTLOOK,X_OSIRU_DUL,X_OSIRU_DUL_FH
      > version=2.41
      >X-Spam-Level:
      >Sender: owner-swtest-discuss@...
      >
      >hello,
      >
      >judging from my experience from kent beck's "extreme programming
      >explained" and some comments from him in newsgroups, i would
      >not expect too much. kent is excellent in fostering ideas, but
      >very poor in refining them.
      >anyway, writing tests before implemetation is a certainly
      >worthwhile approach, although very preliminary research does not
      >show clear benefits (from
      >http://www.ipd.uka.de/~muellerm/publications/ease02.pdf ) :
      >
      >"1. If a developer switches from traditional development to test-rst
      >programming, he does not program necessarily faster. That is,
      >he does not arrive at a solution more quickly.
      >
      >2. Test-first pays off only slightly in terms of increased
      >reliability. In fact, there were five programs developed with test-first
      >with a reliability over 96% compared to one program in the
      >control group. But this result is blurred by the large variance of
      >the data-points. Concentrating on the program versions after the
      >implementation-phase, the result just turns around. The test-first group
      >has signicantly less reliable programs than the
      >control group. So far, we do not know, if this effect is caused by
      >a false sense of security, less importance of the acceptance-test
      >for the test-first group, or if it is quite simply a result of too
      >little testing.
      >
      >3. Test-first programmers reuse existing methods faster correctly.
      >This is caused by the ongoing testing strategy of test-first.
      >Once a failure is found, it is indicated by a test-case and, while fixing
      >the fault, the developer learns how to use the method or interface
      >correctly."
      >
      >what i really don't like is "specification by example". i don't
      >think that serious systems can or should be specified or developed
      >"by example". just as serious math is not based on examples.
      >
      >regards,
      >
      >gerold
      >
      >
      >"At what point shall we expect the approach of danger? By what
      >means shall we fortify against it? Shall we expect some transatlantic
      >military giant, to step the Ocean, and crush us at a blow? Never!
      >All the armies of Europe, Asia and Africa combined, with all the
      >treasure of the earth (our own excepted) in their military chest;
      >with a Buonaparte for a commander, could not by force, take a
      >drink from the Ohio, or make a track on the Blue Ridge, in a trial
      >of a thousand years. At what point, then, is the approach of danger
      >to be expected? I answer, if it ever reach us it must spring up
      >amongst us. It cannot come from abroad. If destruction be our lot,
      >we must ourselves be its author and finisher. As a nation of freemen,
      >we must live through all time, or die by suicide."
      >
      >-- Abraham Lincoln, President of the USA, 1861-1865.

      ______________________________________________________________________
      Cem Kaner, J.D., Ph.D.
      Professor, Department of Computer Sciences, Florida Institute of Technology

      http://www.kaner.com http://www.badsoftware.com

      Author (with Bach & Pettichord) LESSONS LEARNED IN SOFTWARE TESTING (Wiley,
      2001)
      Author (with Falk & Nguyen) TESTING COMPUTER SOFTWARE (2nd Ed, Wiley)
      Author (with David Pels) of BAD SOFTWARE (Wiley, 1998)
    • Phlip
      ... This is one of the trolls who have helped make news:comp.software.extreme-programming useless. Look him up with http://groups.google.com to read tedious
      Message 2 of 3 , Feb 1, 2003
        Cem Kaner sez:

        > From: "Gerold Keefer" <gkeefer@...>

        This is one of the trolls who have helped make
        news:comp.software.extreme-programming useless. Look him up with
        http://groups.google.com to read tedious whining that rarely reveals any
        insights into software engineering.

        > http://www.ipd.uka.de/~muellerm/publications/ease02.pdf

        It would be nice if the experiment address TDD instead of Test-First. TDD is
        TF + Simple Design plus Refactoring. But the paper dismisses Simple Design as
        "another XP practice" and hence rejects it to maintain purity.

        Next, the "without test-first" team runs without unit tests period. But both
        teams must pass an acceptance test. This is not incremental (a real team
        would >start< with the acceptance test). but of course a "real team" wouldn't
        have been experimental.

        The paper concludes that the test-first team was less reliable (with a
        significance p = 0.03) because they failed the acceptance test more often.

        The test-first group implemented, and then passed the acceptance tests a
        little faster.

        The test-first group implemented way faster; almost only half the time.

        The no-first group passed the acceptance tests faster.

        Of course a troll can decide that 1 data point out of 3 "against" TDD means we
        should throw away TDD. The experiment itself is very interesting, and one can
        easily surmise how the slower team spent more time thinking about what the AT
        might do instead of writing their own tests. And, of course, the test-first
        group came out with much more tests at the end ;-)

        "The test-first group had less errors when reusing an existing method more
        than once." That's >the point< here. The tested method was already re-used
        once, in the test.

        The paper leaves open the ideal that this simple effect compounds itself over
        and over again, as the project gets larger, to constrain the design space
        smaller and smaller. So a bigger test should show this contraction at work.

        --
        Phlip
        http://www.greencheese.org/MakeItSo
        -- Friends don't let friends use Closed Source software --
      • Ron Jeffries
        ... It s certainly an interesting paper, and the authors made some interesting decisions to try to control the variables. If it were sent to me for review I d
        Message 3 of 3 , Feb 2, 2003
          On Sunday, February 2, 2003, at 1:15:03 AM, Cem Kaner wrote:

          > http://www.ipd.uka.de/~muellerm/publications/ease02.pdf
          > Experiment about Test- rst programming
          > Matthias M. Muller and Oliver Hagner
          > Computer Science Department
          > University of Karlsruhe

          It's certainly an interesting paper, and the authors made some interesting
          decisions to try to control the variables. If it were sent to me for review
          I'd ask for expansion and clarification in some places. It appears that
          many of the differences in the results are explained by variance in the
          test-first team's performance. Whether that variance is a result of
          test-first, or of the individuals, isn't clear, but I would expect
          individual differences to mask most everything else in any experiment of
          this kind.

          It's good that people are experimenting, and submitting their work to peer
          review. The first experiments are always rough. They'll get better.

          A few notes ...

          1. The entire paper is apparently based on a wiki note from me. It might
          have been better to have considered, e.g., Beck's Test-Driven Development
          book (which unfortunately postdates the experiment), or even the
          material on TDD in /Extreme Programming Installed/ or on my web site. The
          details of the instructions and experience of the subjects are not
          provided.

          2. The paper quotes none of the primary or near-primary XP source books. To
          what extent are the authors familiar with the material?

          3. I would usually expect that acceptance tests would be used as part of
          the TDD process, not as a second phase, but of course in the interest of an
          experiment, one might well want a final independent testing phase. An
          experiment that provides acceptance tests into the programming process, and
          only then subjects the programs to additional testing, might be more
          interesting. Requiring both teams to produce code to the same level of
          reliability might be interesting.

          4. The paper would be improved by provision of more information on the XP
          course that the students had taken in the previous semester, in particular
          the test-driven aspects.

          5. The process appears to have been to provide a class and its method
          names, and for the programmers to fill it in, using existing tests written
          by someone else, back when the code worked.

          6. I don't understand the section about Tester Quality very well. They
          examined the tests for coverage, against the reference code rather than the
          real code. I'd rather see how well the tests tested their own code, and
          would have no particular prediction about how well Joe's tests would test
          someone else's code.

          7. The most important thing to measure, in my opinion, wasn't addressed at
          all. That is the question of whether individual and team performance
          improves with skilled addition of the TDD practice to their existing
          practice.

          Still, it's good to see people out there working. We'll see what happens
          over time.

          Ron Jeffries
          www.XProgramming.com
          Logic is overrated as a system of thought.
        Your message has been successfully submitted and would be delivered to recipients shortly.