Loading ...
Sorry, an error occurred while loading the content.

Re: [XP] Test-Code Duplication

Expand Messages
  • Ron Jeffries
    ... Yes, there is that danger. I don t recall ever actually falling into that trap, but probably in this long life, I have. I lean more toward simpler tests,
    Message 1 of 13 , Sep 1, 2005
    • 0 Attachment
      Around Thursday, September 1, 2005, 2:44:56 AM, Steven Gordon wrote:

      > There is an obvious danger in not having a "chinese wall" between test code
      > and production code, and then doing deep refactorings:
      > A test that used to be:
      > - apply algorithm A (implemented independently in the test code)
      > - apply algorithm B (implemented in the production code)
      > - see if the results are the same
      > becomes:
      > - apply algorithm A (refactored to have become algorithm B itself if you
      > look deep enough)
      > - apply algorithm B
      > - see if the results are the same
      > The refactored version of the test can never fail even if algorithm B has a
      > flaw, but you never expect it to fail when refactoring from an unfailing
      > original version to the refactored version.

      Yes, there is that danger. I don't recall ever actually falling into
      that trap, but probably in this long life, I have.

      I lean more toward simpler tests, rather than duplicating some
      complex algorithm. Following the above style all the time would
      result in producing the entire application twice. That's ... too
      much.

      So I'd say my tests are more like this:

      > Of course, we could avoid doing any parallel work in our test code and
      > just write tests in the form:
      > - apply production code (e.g., algorithm B above) to know inputs
      > - see if the results are the expected results
      > But, if we always write tests in the above form, how would we end up with
      > any common code between the tests and the production system to refactor?

      We might not. Would that be a bad thing? Is common code between
      tests and production something to seek?

      I've not sought to create that duplication, but if I do, I try to
      remove it, just like any other kind. It usually leads somewhere
      interesting.

      Ron Jeffries
      www.XProgramming.com
      Only the hand that erases can write the true thing. -- Meister Eckhart
    • William Pietri
      ... I grant the theoretical possibility, but I can t think of a case where I ve come close to having this happen. I think that s mainly because I don t test an
      Message 2 of 13 , Sep 1, 2005
      • 0 Attachment
        On Wed, 2005-08-31 at 23:44 -0700, Steven Gordon wrote:
        > There is an obvious danger in not having a "chinese wall" between test code
        > and production code, and then doing deep refactorings:
        > A test that used to be:
        > - apply algorithm A (implemented independently in the test code)
        > - apply algorithm B (implemented in the production code)
        > - see if the results are the same
        > becomes:
        > - apply algorithm A (refactored to have become algorithm B itself if you
        > look deep enough)
        > - apply algorithm B
        > - see if the results are the same


        I grant the theoretical possibility, but I can't think of a case where
        I've come close to having this happen. I think that's mainly because I
        don't test an algorithm against a reimplementation; I generally test
        algorithms against literal data.

        Thinking about it, it seems weird to me that I can't remember a single
        case of testing a complex algorithm against a reimplementation. But I
        guess it makes sense: if I'm doing this in a TDD fashion, I'd be
        developing A and B in parallel. I'd worry that some error in my thinking
        would lead me to do the same thing wrong in both code bases, so parallel
        duplicate development wouldn't feel satisfying.

        William


        --
        William Pietri <william@...>
      • Brian Slesinsky
        ... This might be an issue if you re using a binary format, but usually I try to use human-readable formats (XML, CSV, and so on) so I can just create the test
        Message 3 of 13 , Sep 1, 2005
        • 0 Attachment
          On Aug 31, 2005, at 7:38 PM, Tim King wrote:
          >>
          >> You have a class that includes a method for writing a complex record
          >> structure to disk, another for reading it back, and then several that
          >> perform operations on those records.
          >>

          This might be an issue if you're using a binary format, but usually I
          try to use human-readable formats (XML, CSV, and so on) so I can just
          create the test data by hand in a text editor. Also, I try to make the
          test data as small and easy to understand as possible. There should
          be just enough data to exercise the code.

          So part of the answer is "don't do that if you can avoid it".

          If you really do need to read something complex, (someone gives you a
          file and says "parse this") another approach is to check in your sample
          files in a directory somewhere. My favorite directory structure looks
          like this:

          test/
          src/
          com/example/mypackage/MyTest.java
          data/
          MyTest/
          inputFile.bin

          Keeping your test data as data means that you know right away when you
          do something that breaks backward compatibility (assuming that
          matters). A round-trip test is also a good thing to have but it's a
          different test.

          For checking code that writes a file, most of the time you don't parse
          the output because the output is deterministic. A string comparison is
          sufficient, or a binary comparison if it's binary. If the output
          isn't entirely predictable (which is something to avoid), it's usually
          possible to get away with regular expressions and the like.

          - Brian
        • Steven Gordon
          I have seen unit tests that do verify one implementation with another, but they were definitely not produced by doing TDD correctly. If TDD is done correctly
          Message 4 of 13 , Sep 1, 2005
          • 0 Attachment
            I have seen unit tests that do verify one implementation with another, but
            they were definitely not produced by doing TDD correctly.
            If TDD is done correctly in small steps using specific literal data, I just
            do not see how we could end up with much duplication at all between your
            test support code and your production code. If there was enough duplication
            to consider merging some test support code with some production code via
            refactoring, I would be wary that some tests were indeed verifying a
            production code implementation via a parallel test code implementation.
            Explicitly not allowing any post-TDD merging of the production code and
            test support code should not pose much of a problem (if any) on properly
            done unit tests, but would prevent the "short circuiting" of improperly done
            unit tests.
            Steven Gordon

            On 9/1/05, William Pietri <william@...> wrote:
            >
            > On Wed, 2005-08-31 at 23:44 -0700, Steven Gordon wrote:
            > > There is an obvious danger in not having a "chinese wall" between test
            > code
            > > and production code, and then doing deep refactorings:
            > > A test that used to be:
            > > - apply algorithm A (implemented independently in the test code)
            > > - apply algorithm B (implemented in the production code)
            > > - see if the results are the same
            > > becomes:
            > > - apply algorithm A (refactored to have become algorithm B itself if you
            > > look deep enough)
            > > - apply algorithm B
            > > - see if the results are the same
            >
            >
            > I grant the theoretical possibility, but I can't think of a case where
            > I've come close to having this happen. I think that's mainly because I
            > don't test an algorithm against a reimplementation; I generally test
            > algorithms against literal data.
            >
            > Thinking about it, it seems weird to me that I can't remember a single
            > case of testing a complex algorithm against a reimplementation. But I
            > guess it makes sense: if I'm doing this in a TDD fashion, I'd be
            > developing A and B in parallel. I'd worry that some error in my thinking
            > would lead me to do the same thing wrong in both code bases, so parallel
            > duplicate development wouldn't feel satisfying.
            >
            > William
            >
            >
            > --
            > William Pietri <william@...>
            >
            >
            >
            >
            >
            >


            [Non-text portions of this message have been removed]
          • Ken Boucher
            ... Wow. correctly and properly done , both in one post. Maybe I can learn something by having the errors of my ways pointed out to me. I like to see the
            Message 5 of 13 , Sep 1, 2005
            • 0 Attachment
              > If TDD is done correctly in small steps using specific literal data, I just
              > do not see how we could end up with much duplication at all between your
              > test support code and your production code. If there was enough duplication
              > to consider merging some test support code with some production code via
              > refactoring, I would be wary that some tests were indeed verifying a
              > production code implementation via a parallel test code implementation.
              > Explicitly not allowing any post-TDD merging of the production code and
              > test support code should not pose much of a problem (if any) on properly
              > done unit tests, but would prevent the "short circuiting" of improperly done
              > unit tests.
              > Steven Gordon

              Wow. "correctly" and "properly done", both in one post. Maybe I can learn
              something by having the errors of my ways pointed out to me.

              I like to see the code that reads in the file used to check to make sure the
              code that writes the file is right. I tend to do that a lot. I suppose I could
              just check the bytes on the drive but I like to know the two are in sync.

              On the other hand, I like to use a simple, easy to understand, process
              to check the complicated hard to understand process that is needed
              because it's extremely efficient. So for those tests, I have a complete
              duplication of functionality.

              Which of these is wrong, so I can stop doing it? Or am I misunderstanding
              what you've said entirely?
            • Steven Gordon
              I was generalizing. Now I will try to get specific: Are you writing OS level code for reading files, or utilizing the code that the OS provides for reading
              Message 6 of 13 , Sep 1, 2005
              • 0 Attachment
                I was generalizing.
                Now I will try to get specific:
                Are you writing OS level code for reading files, or utilizing the code that
                the OS provides for reading files? If the later, then there is no need to
                TDD code you are not writing.
                If you still feel you must verify that the OS call is reading the file
                correctly, then a unit test should read the file into a buffer and then
                assert that the expected contents are there (byte by byte if you are really
                paranoid, otherwise just a few data points along with asserting that the
                amount of data is as expected).
                Then all your other tests can create whatever objects you are testing
                directly from a buffer, and you can trust that any production code that
                creates any tested object from a buffer that was read from a file.
                Rereading the file a second time in the test code would only establish that
                both ways of reading the file give the same result. But, this does not prove
                much because both ways of reading the file could well be executing virtually
                identical OS code under the covers. If one way of reading the file gave you
                the wrong data, the other way could too, so why not just assert that reading
                the file gives you the exact data you expect?
                A test that just directly asserts that the expected data is present after
                reading a known file is also much less complex, as exhibited by not feeling
                like there is test code that is so similar to production code that
                refactoring to reduce duplication might be in order.
                On 9/1/05, Ken Boucher <yahoo@...> wrote:
                >
                > > If TDD is done correctly in small steps using specific literal data, I
                > just
                > > do not see how we could end up with much duplication at all between your
                > > test support code and your production code. If there was enough
                > duplication
                > > to consider merging some test support code with some production code via
                > > refactoring, I would be wary that some tests were indeed verifying a
                > > production code implementation via a parallel test code implementation.
                > > Explicitly not allowing any post-TDD merging of the production code and
                > > test support code should not pose much of a problem (if any) on properly
                > > done unit tests, but would prevent the "short circuiting" of improperly
                > done
                > > unit tests.
                > > Steven Gordon
                >
                > Wow. "correctly" and "properly done", both in one post. Maybe I can learn
                > something by having the errors of my ways pointed out to me.
                >
                > I like to see the code that reads in the file used to check to make sure
                > the
                > code that writes the file is right. I tend to do that a lot. I suppose I
                > could
                > just check the bytes on the drive but I like to know the two are in sync.
                >
                > On the other hand, I like to use a simple, easy to understand, process
                > to check the complicated hard to understand process that is needed
                > because it's extremely efficient. So for those tests, I have a complete
                > duplication of functionality.
                >
                > Which of these is wrong, so I can stop doing it? Or am I misunderstanding
                > what you've said entirely?
                >
                >
                >
                >
                >
                > To Post a message, send it to: extremeprogramming@...
                >
                > To Unsubscribe, send a blank message to:
                > extremeprogramming-unsubscribe@...
                >
                > ad-free courtesy of objectmentor.com <http://objectmentor.com>
                > Yahoo! Groups Links
                >
                >
                >
                >
                >
                >
                >


                [Non-text portions of this message have been removed]
              • Willem Bogaerts
                ... My opinion exactly. At least, it _used_ to be. I once worked on an application that stored measurements from sinusoid signals. Storing a measurement was
                Message 7 of 13 , Sep 2, 2005
                • 0 Attachment
                  Ken Boucher wrote:
                  >> If TDD is done correctly in small steps using specific literal data, I just
                  >>do not see how we could end up with much duplication at all between your
                  >>test support code and your production code.

                  My opinion exactly. At least, it _used_ to be. I once worked on an
                  application that stored measurements from sinusoid signals. Storing a
                  measurement was one of the first things I did (test-first). When this
                  was showed to the customer, he said: Great, but can you ensure that any
                  wave measured has at least 5 measurements? I'd like this business rule
                  because 5 measurements can give me a quality notion of the measurements
                  themselves.

                  This is a pattern that I encountered more and more afterwards.
                  Test-and-developing a data structure is easy and involves only small
                  sets of data. But developing business rules is quite another story. I
                  used to put storage and business rules in one application layer because
                  I did not see a reason to break them apart. Now I do. I found out the
                  hard way.

                  Another pattern that I have found out is the emerging of a "storage
                  creator". Assuming a sequential database, my first class test just has
                  some ad-hoc code that creates a database and one table (and removes them
                  upon exit). My second class test creates the same database and another
                  class with similar code.
                  This ad-hoc codes get refactored into one class that can create a whole
                  database with every table and reference. This may seem a lot of work,
                  but it is a useful class to have. For business rules (there will be
                  another class that fills the tables with some predefined test data) and
                  for deployment of the application.

                  So, yes, TDD can be a lot of work. But that is because testing can be a
                  lot of work. Especially for business rules. Nobody said TDD would be
                  fast, just that it would be faster then doing similar tests afterwards
                  AND make more initial mistakes.

                  Best regards,
                  Willem Bogaerts
                • Kevin Wheatley
                  ... I ve got a very similar class that creates test files that are supposed to be read by a file reader. I use it to inject all kinds of faults into the system
                  Message 8 of 13 , Sep 2, 2005
                  • 0 Attachment
                    Willem Bogaerts wrote:
                    > Another pattern that I have found out is the emerging of a "storage
                    > creator". Assuming a sequential database, my first class test just has
                    > some ad-hoc code that creates a database and one table (and removes them
                    > upon exit). My second class test creates the same database and another
                    > class with similar code.
                    > This ad-hoc codes get refactored into one class that can create a whole
                    > database with every table and reference. This may seem a lot of work,
                    > but it is a useful class to have. For business rules (there will be
                    > another class that fills the tables with some predefined test data) and
                    > for deployment of the application.

                    I've got a very similar class that creates test files that are
                    supposed to be read by a file reader. I use it to inject all kinds of
                    faults into the system to test the error paths in the code. I started
                    with a simple class that creates a file and then removes it on
                    teardown of the test, then I slowly developed ways to add content to
                    the file procedurally, as I built up the file reader.

                    I found it useful as you have the test data and the tests all in one
                    place - in the test source code, the only down side is when I get an
                    externally generated faulty file which has a new kind of failure, I
                    may end up just with a "blob of data" test.

                    So far of course I've not had any files that false positived through
                    the main code of course ... so maybe thats theoretical anyway.

                    Kevin

                    --
                    | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this |
                    | Senior Technology | My employer for certain |
                    | And Network Systems Architect | Not even myself |
                  Your message has been successfully submitted and would be delivered to recipients shortly.