How do you efficiently keep your tests working as you redesign?

Question

A well-tested codebase has a number of benefits, but testing certain aspects of the system results in a codebase that is resistant to some types of change.

An example is testing for specific output--e.g., text or HTML. Tests are often (naively?) written to expect a particular block of text as output for some input parameters, or to search for specific sections in a block.

Changing the behavior of the code, to meet new requirements or because usability testing has resulted in change to the interface, requires changing the tests as well--perhaps even tests that are not specifically unit tests for the code being changed.

How do you manage the work of finding and rewriting these tests? What if you can't just "run 'em all and let the framework sort them out"?
What other sorts of code-under-test result in habitually fragile tests?

How is this significantly different from programmers.stackexchange.com/questions/5898/…? — AShelly
– AShelly, Commented Sep 21, 2010 at 14:39
That question mistakenly asked about refactoring--unit tests should be invariant under refactoring. — Alex Feinman
– Alex Feinman, Commented Sep 21, 2010 at 15:08

Bill · Accepted Answer · 2010-09-21 16:17:59Z

I know the TDD folks will hate this answer, but a large part of it for me is to choose carefully where to test something.

If I go too crazy with unit tests in the lower tiers then no meaningful change can be made without altering the unit tests. If the interface is never exposed and not intended to be reused outside the app then this is just needless overhead to what might have been a quick change otherwise.

Conversely if what you are trying to change is exposed or re-used every one of those tests you are going to have to change is evidence of something you might be breaking elsewhere.

In some projects this may amount to designing your tests from the acceptance tier down rather than from the unit tests up. and having fewer unit tests and more integration style tests.

It does not mean that you cannot still identify a single feature and code until that feature meets its acceptance criteria. It simply means that in some cases you do not end up measuring the acceptance criteria with unit tests.

I think you meant to write "outside the module", not "outside the app". — SamB
– SamB, Commented Sep 21, 2010 at 19:24
SamB, it depends. If the interface is an internal to a few places withing one app, but not public I would consider testing at a higher level if I thought the interface is likely to be volatile. — Bill
– Bill, Commented Sep 21, 2010 at 19:36
I've found this approach to be very compatible with TDD. I like starting in the upper layers of the application nearer to the end user so I can design the lower layers knowing how the upper layers need to use the lower layers. Essentially building top down allows you to more accurately design the interface between one layer and another. — Greg Burghardt
– Greg Burghardt, Commented Jul 18, 2018 at 0:44

Frank Shearar · Accepted Answer · 2010-09-21 16:39:39Z

I just completed a major overhaul of my SIP stack, rewriting the entire TCP transport. (This was a near refactor, on a rather grand scale, relative to most refactorings.)

In brief, there's a TIdSipTcpTransport, subclass of TIdSipTransport. All TIdSipTransports share a common test suite. Internal to TIdSipTcpTransport were a number of classes - a map containing connection/initiating-message pairs, threaded TCP clients, a threaded TCP server, and so on.

Here's what I did:

Deleted the classes I was going to replace.
Deleted the test suites for those classes.
Left the test suite specific to TIdSipTcpTransport (and there was still the test suite common to all TIdSipTransports).
Ran the TIdSipTransport/TIdSipTcpTransport tests, to make sure they all failed.
Commented out all but one TIdSipTransport/TIdSipTcpTransport test.
If I needed to add a class, I'd add it write tests to build up enough functionality that the sole uncommented test passed.
Lather, rinse, repeat.

I thus knew what I still needed to do, in the form of the commented-out tests (*), and knew that the new code was working as expected, thanks to the new tests I wrote.

(*) Really, you don't need to comment them out. Just don't run them; 100 failing tests isn't very encouraging. Also, in my particular setup compiling fewer tests means a faster test-write-refactor loop.

I've done this too some months ago and it worked quite well for me. However I couldn't absolutely apply this method when pairing with a colleague in the ground-up redesign of our domain model module (which in turn triggered the redesign of all the other modules in the project). — Marco Ciambrone
– Marco Ciambrone, Commented Jun 22, 2011 at 9:04

Winston Ewert · Accepted Answer · 2010-09-21 17:19:07Z

When tests are fragile, I find its usually because I'm testing the wrong thing. Take for example, HTML output. If you check the actual HTML output your test will be fragile. But you aren't interested in the actual output, you are interested in whether it conveys the information that it should. Unfortunately, doing that requires making assertions about the contents of user's brains and so can't be done automatically.

You can:

Generate the HTML as a smoke test to make sure it actually runs
Use a template system, so you can test the template processor and data sent to the template, without actually testing the exact template itself.

The same sort of things happens with SQL. If you assert the actual SQL your classes attempt to make you are going to be in trouble. You really want to assert the results. Hence I use a SQLITE memory database during my unit tests to make sure that my SQL actually does what its supposed to.

@SamB certainly that would help, but I don't think it'll solve the problem completely — Winston Ewert
– Winston Ewert, Commented Sep 21, 2010 at 20:23

Mike Nakis · Accepted Answer · 2022-11-13 12:17:07Z

The one rule to follow in order to keep the tests working as you redesign is:

Do Black-Box testing; avoid White-Box testing.

In other words:

Test against the interface, not against the implementation.

This follows naturally from one of the principles listed in the book Design Patterns: Elements of Reusable Object-Oriented Software (Addison-Wesley, 1994) by The Gang of Four (Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides) which says Program against the interface, not against the implementation.

With black-box testing, it is possible to completely rewrite a software component from scratch to make it exhibit the same behavior but with a totally different implementation, and use the existing test to verify that the new version behaves exactly like the old version. It is also possible to have multiple independent teams of developers come up with entirely different approaches to solving the same problem, and write a single test to verify the correctness of all approaches.

Unfortunately, widespread modern testing practices prohibit this.

Specifically, Unit Testing is white-box testing by nature, because in its desire to achieve defect localization it sets out to test each component in strict isolation, which requires mocking its dependencies. The moment you start using mocks, you are engaging in white-box testing, so you are in an ocean of pain.

Other people have identified this problem; Ian Cooper in his "TDD, where did it all go wrong" talk (https://www.infoq.com/presentations/tdd-original/) speaks about it, and in an attempt to avoid sounding so blasphemous as to proclaim that Unit Testing is wrong, he suggests that in the context of Test Driven Development (TDD) the term Unit Testing does not refer to isolating the components under test from each other, but rather isolating the unit tests from each other. I do fully agree that it is the tests that should be kept isolated, but I consider this re-definition of the term to be arbitrary and unwarranted. Unit Testing has already been precisely defined, and according to the existing definition, it is problematic; I do not have a problem at all with sounding blasphemous, so here, I am about to say it. Are you ready? Here it goes:

Unit Testing is Wrong.

There, I said it.

What to use instead of Unit Testing? Use Incremental Integration Testing instead.

Incremental Integration Testing is black-box testing. It achieves defect localization not by eliminating the dependencies, but instead by requiring that the order of execution of the tests must be chosen so that when a component is tested, all of its dependencies have already been tested.

Incremental Integration Testing is described in detail here:

https://blog.michael.gr/2022/10/incremental-integration-testing.html

(Disclosure: I am the author of that post.)

Lewis Pringle · Accepted Answer · 2018-07-17 02:02:44Z

First create a NEW API, that does what you want your NEW API behavior to be. If it happens that this new API has the same name as an OLDER API, then I append the name _NEW to the new API name.

int DoSomethingInterestingAPI();

becomes:

int DoSomethingInterestingAPI_NEW( int takes_more_arguments ); int DoSomethingInterestingAPI_OLD(); int DoSomethingInterestingAPI() { DoSomethingInterestingAPI_NEW (whatever_default_mimics_the_old_API); OK - at this stage - all your regression tests sill pass - using the name DoSomethingInterestingAPI ().

NEXT, go through your code and change all calls to DoSomethingInterestingAPI() to the appropriate variant of DoSomethingInterestingAPI_NEW(). This includes updating/rewriting whatever parts of your regression tests need to be changed to use the new API.

NEXT, mark DoSomethingInterestingAPI_OLD () as [[deprecated()]]. Keep around the deprecated API as long as you like (until you've safely updated all code that might depend on it).

With this approach, any failures in your regression tests simply are bugs in that regression test or identify bugs in your code - exactly as you would want. This staged process of revising an API by explicitly creating _NEW and _OLD versions of the API allows you to have bits of the new and old code coexisting for a while.

Here is a good (hard) example of this approach in practice. I had function BitSubstring() - where I had used the approach of having the third parameter be the COUNT of bits in the substring. To be consistent with other APIs and patterns in C++, I wanted to switch to begin/end as arguments to the function.

https://github.com/SophistSolutions/Stroika/commit/003dd8707405c43e735ca71116c773b108c217c0

I created a function BitSubstring_NEW with the new API, and updated all my code to use that (leaving NO MORE CALLS to BitSubString). But I left in the implementation for several releases (months) - and marked it deprecated - so everyone could switch to BitSubString_NEW (and at that time change the argument from a count to begin/end style).

THEN - when that transition was completed, I did another commit deleting BitSubString() and renaming BitSubString_NEW->BitSubString () (and deprecated the name BitSubString_NEW).

Never append suffixes that carry no meaning, or are self-deprecating to names. Always strive to give meaningful names. — Basilevs
– Basilevs, Commented Jul 17, 2018 at 2:58
You completely missed the point. First - these aren't suffixes that "carry no meaning". They carry the meaning that the API is transitioning from an older one to a newer one. In fact, that's the whole point of the QUESTION I was responding to, and the whole point of the answer. The names CLEARLY communicate which is the OLD API, which is the NEW API, and which is the eventually target name of the API once the transition is complete. AND - the _OLD/_NEW suffixes are temporary - ONLY during the API change transition. — Lewis Pringle
– Lewis Pringle, Commented Jul 17, 2018 at 13:49

Stack Exchange Network

How do you efficiently keep your tests working as you redesign?

5 Answers 5

Linked

Hot Network Questions

How do you efficiently keep your tests working as you redesign?

5 Answers 5

Linked

Related

Hot Network Questions