Loading ...
Sorry, an error occurred while loading the content.
 

Re: [XP] Test-Code Duplication

Expand Messages
  • William Pietri
    ... I grant the theoretical possibility, but I can t think of a case where I ve come close to having this happen. I think that s mainly because I don t test an
    Message 1 of 13 , Sep 1, 2005
      On Wed, 2005-08-31 at 23:44 -0700, Steven Gordon wrote:
      > There is an obvious danger in not having a "chinese wall" between test code
      > and production code, and then doing deep refactorings:
      > A test that used to be:
      > - apply algorithm A (implemented independently in the test code)
      > - apply algorithm B (implemented in the production code)
      > - see if the results are the same
      > becomes:
      > - apply algorithm A (refactored to have become algorithm B itself if you
      > look deep enough)
      > - apply algorithm B
      > - see if the results are the same


      I grant the theoretical possibility, but I can't think of a case where
      I've come close to having this happen. I think that's mainly because I
      don't test an algorithm against a reimplementation; I generally test
      algorithms against literal data.

      Thinking about it, it seems weird to me that I can't remember a single
      case of testing a complex algorithm against a reimplementation. But I
      guess it makes sense: if I'm doing this in a TDD fashion, I'd be
      developing A and B in parallel. I'd worry that some error in my thinking
      would lead me to do the same thing wrong in both code bases, so parallel
      duplicate development wouldn't feel satisfying.

      William


      --
      William Pietri <william@...>
    • Brian Slesinsky
      ... This might be an issue if you re using a binary format, but usually I try to use human-readable formats (XML, CSV, and so on) so I can just create the test
      Message 2 of 13 , Sep 1, 2005
        On Aug 31, 2005, at 7:38 PM, Tim King wrote:
        >>
        >> You have a class that includes a method for writing a complex record
        >> structure to disk, another for reading it back, and then several that
        >> perform operations on those records.
        >>

        This might be an issue if you're using a binary format, but usually I
        try to use human-readable formats (XML, CSV, and so on) so I can just
        create the test data by hand in a text editor. Also, I try to make the
        test data as small and easy to understand as possible. There should
        be just enough data to exercise the code.

        So part of the answer is "don't do that if you can avoid it".

        If you really do need to read something complex, (someone gives you a
        file and says "parse this") another approach is to check in your sample
        files in a directory somewhere. My favorite directory structure looks
        like this:

        test/
        src/
        com/example/mypackage/MyTest.java
        data/
        MyTest/
        inputFile.bin

        Keeping your test data as data means that you know right away when you
        do something that breaks backward compatibility (assuming that
        matters). A round-trip test is also a good thing to have but it's a
        different test.

        For checking code that writes a file, most of the time you don't parse
        the output because the output is deterministic. A string comparison is
        sufficient, or a binary comparison if it's binary. If the output
        isn't entirely predictable (which is something to avoid), it's usually
        possible to get away with regular expressions and the like.

        - Brian
      • Steven Gordon
        I have seen unit tests that do verify one implementation with another, but they were definitely not produced by doing TDD correctly. If TDD is done correctly
        Message 3 of 13 , Sep 1, 2005
          I have seen unit tests that do verify one implementation with another, but
          they were definitely not produced by doing TDD correctly.
          If TDD is done correctly in small steps using specific literal data, I just
          do not see how we could end up with much duplication at all between your
          test support code and your production code. If there was enough duplication
          to consider merging some test support code with some production code via
          refactoring, I would be wary that some tests were indeed verifying a
          production code implementation via a parallel test code implementation.
          Explicitly not allowing any post-TDD merging of the production code and
          test support code should not pose much of a problem (if any) on properly
          done unit tests, but would prevent the "short circuiting" of improperly done
          unit tests.
          Steven Gordon

          On 9/1/05, William Pietri <william@...> wrote:
          >
          > On Wed, 2005-08-31 at 23:44 -0700, Steven Gordon wrote:
          > > There is an obvious danger in not having a "chinese wall" between test
          > code
          > > and production code, and then doing deep refactorings:
          > > A test that used to be:
          > > - apply algorithm A (implemented independently in the test code)
          > > - apply algorithm B (implemented in the production code)
          > > - see if the results are the same
          > > becomes:
          > > - apply algorithm A (refactored to have become algorithm B itself if you
          > > look deep enough)
          > > - apply algorithm B
          > > - see if the results are the same
          >
          >
          > I grant the theoretical possibility, but I can't think of a case where
          > I've come close to having this happen. I think that's mainly because I
          > don't test an algorithm against a reimplementation; I generally test
          > algorithms against literal data.
          >
          > Thinking about it, it seems weird to me that I can't remember a single
          > case of testing a complex algorithm against a reimplementation. But I
          > guess it makes sense: if I'm doing this in a TDD fashion, I'd be
          > developing A and B in parallel. I'd worry that some error in my thinking
          > would lead me to do the same thing wrong in both code bases, so parallel
          > duplicate development wouldn't feel satisfying.
          >
          > William
          >
          >
          > --
          > William Pietri <william@...>
          >
          >
          >
          >
          >
          >


          [Non-text portions of this message have been removed]
        • Ken Boucher
          ... Wow. correctly and properly done , both in one post. Maybe I can learn something by having the errors of my ways pointed out to me. I like to see the
          Message 4 of 13 , Sep 1, 2005
            > If TDD is done correctly in small steps using specific literal data, I just
            > do not see how we could end up with much duplication at all between your
            > test support code and your production code. If there was enough duplication
            > to consider merging some test support code with some production code via
            > refactoring, I would be wary that some tests were indeed verifying a
            > production code implementation via a parallel test code implementation.
            > Explicitly not allowing any post-TDD merging of the production code and
            > test support code should not pose much of a problem (if any) on properly
            > done unit tests, but would prevent the "short circuiting" of improperly done
            > unit tests.
            > Steven Gordon

            Wow. "correctly" and "properly done", both in one post. Maybe I can learn
            something by having the errors of my ways pointed out to me.

            I like to see the code that reads in the file used to check to make sure the
            code that writes the file is right. I tend to do that a lot. I suppose I could
            just check the bytes on the drive but I like to know the two are in sync.

            On the other hand, I like to use a simple, easy to understand, process
            to check the complicated hard to understand process that is needed
            because it's extremely efficient. So for those tests, I have a complete
            duplication of functionality.

            Which of these is wrong, so I can stop doing it? Or am I misunderstanding
            what you've said entirely?
          • Steven Gordon
            I was generalizing. Now I will try to get specific: Are you writing OS level code for reading files, or utilizing the code that the OS provides for reading
            Message 5 of 13 , Sep 1, 2005
              I was generalizing.
              Now I will try to get specific:
              Are you writing OS level code for reading files, or utilizing the code that
              the OS provides for reading files? If the later, then there is no need to
              TDD code you are not writing.
              If you still feel you must verify that the OS call is reading the file
              correctly, then a unit test should read the file into a buffer and then
              assert that the expected contents are there (byte by byte if you are really
              paranoid, otherwise just a few data points along with asserting that the
              amount of data is as expected).
              Then all your other tests can create whatever objects you are testing
              directly from a buffer, and you can trust that any production code that
              creates any tested object from a buffer that was read from a file.
              Rereading the file a second time in the test code would only establish that
              both ways of reading the file give the same result. But, this does not prove
              much because both ways of reading the file could well be executing virtually
              identical OS code under the covers. If one way of reading the file gave you
              the wrong data, the other way could too, so why not just assert that reading
              the file gives you the exact data you expect?
              A test that just directly asserts that the expected data is present after
              reading a known file is also much less complex, as exhibited by not feeling
              like there is test code that is so similar to production code that
              refactoring to reduce duplication might be in order.
              On 9/1/05, Ken Boucher <yahoo@...> wrote:
              >
              > > If TDD is done correctly in small steps using specific literal data, I
              > just
              > > do not see how we could end up with much duplication at all between your
              > > test support code and your production code. If there was enough
              > duplication
              > > to consider merging some test support code with some production code via
              > > refactoring, I would be wary that some tests were indeed verifying a
              > > production code implementation via a parallel test code implementation.
              > > Explicitly not allowing any post-TDD merging of the production code and
              > > test support code should not pose much of a problem (if any) on properly
              > > done unit tests, but would prevent the "short circuiting" of improperly
              > done
              > > unit tests.
              > > Steven Gordon
              >
              > Wow. "correctly" and "properly done", both in one post. Maybe I can learn
              > something by having the errors of my ways pointed out to me.
              >
              > I like to see the code that reads in the file used to check to make sure
              > the
              > code that writes the file is right. I tend to do that a lot. I suppose I
              > could
              > just check the bytes on the drive but I like to know the two are in sync.
              >
              > On the other hand, I like to use a simple, easy to understand, process
              > to check the complicated hard to understand process that is needed
              > because it's extremely efficient. So for those tests, I have a complete
              > duplication of functionality.
              >
              > Which of these is wrong, so I can stop doing it? Or am I misunderstanding
              > what you've said entirely?
              >
              >
              >
              >
              >
              > To Post a message, send it to: extremeprogramming@...
              >
              > To Unsubscribe, send a blank message to:
              > extremeprogramming-unsubscribe@...
              >
              > ad-free courtesy of objectmentor.com <http://objectmentor.com>
              > Yahoo! Groups Links
              >
              >
              >
              >
              >
              >
              >


              [Non-text portions of this message have been removed]
            • Willem Bogaerts
              ... My opinion exactly. At least, it _used_ to be. I once worked on an application that stored measurements from sinusoid signals. Storing a measurement was
              Message 6 of 13 , Sep 2, 2005
                Ken Boucher wrote:
                >> If TDD is done correctly in small steps using specific literal data, I just
                >>do not see how we could end up with much duplication at all between your
                >>test support code and your production code.

                My opinion exactly. At least, it _used_ to be. I once worked on an
                application that stored measurements from sinusoid signals. Storing a
                measurement was one of the first things I did (test-first). When this
                was showed to the customer, he said: Great, but can you ensure that any
                wave measured has at least 5 measurements? I'd like this business rule
                because 5 measurements can give me a quality notion of the measurements
                themselves.

                This is a pattern that I encountered more and more afterwards.
                Test-and-developing a data structure is easy and involves only small
                sets of data. But developing business rules is quite another story. I
                used to put storage and business rules in one application layer because
                I did not see a reason to break them apart. Now I do. I found out the
                hard way.

                Another pattern that I have found out is the emerging of a "storage
                creator". Assuming a sequential database, my first class test just has
                some ad-hoc code that creates a database and one table (and removes them
                upon exit). My second class test creates the same database and another
                class with similar code.
                This ad-hoc codes get refactored into one class that can create a whole
                database with every table and reference. This may seem a lot of work,
                but it is a useful class to have. For business rules (there will be
                another class that fills the tables with some predefined test data) and
                for deployment of the application.

                So, yes, TDD can be a lot of work. But that is because testing can be a
                lot of work. Especially for business rules. Nobody said TDD would be
                fast, just that it would be faster then doing similar tests afterwards
                AND make more initial mistakes.

                Best regards,
                Willem Bogaerts
              • Kevin Wheatley
                ... I ve got a very similar class that creates test files that are supposed to be read by a file reader. I use it to inject all kinds of faults into the system
                Message 7 of 13 , Sep 2, 2005
                  Willem Bogaerts wrote:
                  > Another pattern that I have found out is the emerging of a "storage
                  > creator". Assuming a sequential database, my first class test just has
                  > some ad-hoc code that creates a database and one table (and removes them
                  > upon exit). My second class test creates the same database and another
                  > class with similar code.
                  > This ad-hoc codes get refactored into one class that can create a whole
                  > database with every table and reference. This may seem a lot of work,
                  > but it is a useful class to have. For business rules (there will be
                  > another class that fills the tables with some predefined test data) and
                  > for deployment of the application.

                  I've got a very similar class that creates test files that are
                  supposed to be read by a file reader. I use it to inject all kinds of
                  faults into the system to test the error paths in the code. I started
                  with a simple class that creates a file and then removes it on
                  teardown of the test, then I slowly developed ways to add content to
                  the file procedurally, as I built up the file reader.

                  I found it useful as you have the test data and the tests all in one
                  place - in the test source code, the only down side is when I get an
                  externally generated faulty file which has a new kind of failure, I
                  may end up just with a "blob of data" test.

                  So far of course I've not had any files that false positived through
                  the main code of course ... so maybe thats theoretical anyway.

                  Kevin

                  --
                  | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this |
                  | Senior Technology | My employer for certain |
                  | And Network Systems Architect | Not even myself |
                Your message has been successfully submitted and would be delivered to recipients shortly.