Unit Testing: Tell Your Doubles Apart

Wahome
11 min readOct 7, 2019
Owl among cats

Have you ever stopped to think:

Why do we write tests?

Most people would say that we do it to verify that code works as expected. Well, while that’s true, it’s not the whole truth. After all, expectations of correctness and accuracy can be made through manual tests. So there has to be something more to it; something more fundamental.

A software application consists of many — often entangled — building blocks. Yet in the midst of this entanglement, we want to move fast, with invigorating confidence.

That’s why tests should:

  • Give confidence that the code does what it should.
  • Provide feedback that is fast, accurate, reliable and predictable.
  • Make maintenance easier, something that is commonly overlooked when writing tests.

A unit test provides a written contract that the piece of code must assure.

Unit testing is an important part of the software development process, because if done correctly, it can help detect early flaws in code which may be more difficult to find in later testing stages.

A unit test exercises the smallest piece of testable software in the application to determine whether it behaves as expected. It is a way of testing a unit — the smallest piece of code that can be logically isolated in a system.

The isolated part of the definition is rather important. In itself, the term unit tests puts a lot of emphasis on the size of these tests, a point that most find to be rather unfortunate.

A Unit of Test (Unit Under Test)

The essence of the various kinds of tests highlighted

The size of the unit under test is not strictly defined. A unit can be almost anything you want it to be — a specific piece of functionality, a program, or a particular method within the application.

Object-oriented design tends to treat a class as the unit of test while procedural or functional approaches might consider a single function as a unit.

In his September 2005 blog post “A Set of Unit Testing Rules“, author and XP practitioner Michael Feathers suggests that a test is not a unit test if:

  • It talks to the database
  • It communicates across the network
  • It touches the file system
  • It can’t run at the same time as any of your other unit tests
  • You have to do special things to your environment (such as editing config files) to run it.

“Tests that do these things aren’t bad. Often they are worth writing, and they can be written in a unit test harness. However, it is important to be able to separate them from true unit tests so that we can keep a set of tests that we can run fast whenever we make our changes”, he explains.

The decision ought to be left to the team for the purposes of their understanding of the system and its testing.

This attempt to define the size of a unit of test notwithstanding, it’s really a situational thing. However it’s defined, it doesn’t really matter because despite the insinuation that a class passes as a unit, it’s often common to take a bunch of closely related classes and treat them as a single unit.

In an earlier post, “Microservice Testing: A New Dawn”, I explored the unit of test in a microservice architecture and concluded that the most important unit of functionality a microservice provides happens to be an abstraction of the underlying I/O involved to its persistent backend, and as such should become the hermetic unit of base functionality under test.

A microservice architecture is the natural consequence of applying the Single Responsibility principle at the architectural level. Domain logic often manifests as complex calculations and a collection of state transitions. Since these types of logic are highly state-based there is little value in trying to isolate the units. This means that as far as possible, real domain objects should be used for all collaborators of the unit under test; a unit of test.

The smaller the unit, the better it is.

The smaller the unit under test the easier it is to express the behaviour using a unit test since the branch complexity of the unit is lower. Smaller tests give you a much more granular view of how your production code is performing. Moreover, your tests can run faster only if they are small.

Test Behaviour not Implementation

“Not sure if code is working or tests are broken”

Testing is a case of art meets science, at the core of which is understanding what to test for — a forte that comes with experience.

Testing for the wrong things can create a suite of tests that are ugly, fragile, and worse still, that are false-positives.

The more test code resembles implementation code the less useful it becomes. Tests that are independent of implementation details are easier to maintain because they don’t need to be changed each time there’s a change to the implementation.

In most cases, tests should focus on testing your code’s public API, and your code’s implementation details shouldn’t need to be exposed to tests.

However, there are less common cases where testing implementation details is necessitated; e.g. you want to ensure that your implementation reads from a cache instead of a datastore.

Besides, test setup may need to change if the some critical parts of the implementation change (e.g. if you change your class to take a new dependency in its constructor, the test needs to pass in this new dependency when it creates the class), but the actual test itself typically shouldn’t need to change if the code’s user-facing behaviour doesn’t change.

Solitary vs Sociable

Sociable vs solitary unit tests. Image from https://martinfowler.com/bliki/UnitTest.html

According to Martin Fowler, when talking about unit tests, a more quintessential distinction is whether the unit under test should be sociable or solitary.

Sociable unit testing focusses on testing the behaviour of modules by observing changes in their state. This treats the unit under test as a black box tested entirely through its interface.

Solitary unit testing looks at the interactions and collaborations between an object and its dependencies, which are replaced by test doubles.

Consider the example of testing an order class’s price method. The price method needs to invoke (delegate to) some functionality in the product and customer classes. If your unit of test is solitary, you don’t want to use the real product or customer classes here, because a fault in the customer class would cause the order class’s tests to fail. On the other hand, if the hermetic unit of base functionality under test requires use of the actual product and customer classes, then your unit of test is sociable.

These styles are not competing and are frequently used in the same codebase to solve different testing problems.

Command Query Responsibility Segregation

CQRS pattern. Image from https://docs.microsoft.com/en-us/azure/architecture/patterns/cqrs

Methods that return some result and do not change the state of the system, are called Query. On the other hand, those that perform some actions that change the state of the system without expectation of any return value are called Command.

A good practice is to isolate and separate an object’s methods into those two distinct categories. This practice developed into what is now popularly known as the Command Query Responsibility Separation pattern; simply shortened to CQRS.

Seeing as the CQRS pattern is not the crux of this post, I shall not dwell too much on it. Martin Fowler’s CQRS bliki is a good read on the subject. Microsoft Azure cloud design patterns documentation is another awesome resource for further reading.

The significance of CQRS for testing however, is in respect of the usage of test doubles. Tests on Query type methods should prefer to use stubs as there is need to verify methods’ return values while those on Command type of methods — like a method sending an e-mail — should prefer mocks.

So what’s the difference between a mock and a stub? Aren’t they all just the same? And what are test doubles?

Test Doubles

Test doubles

In case the penny hasn’t dropped yet, and you’re still wondering, they are not all quite the same.

A test double is an object that can stand in for a real object in a test, similar to how a stunt double stands in for an actor in a movie.

Test doubles is a generic term for any case where a production object or procedure is replaced with another for testing purposes. In automated unit testing, a test double replaces an object on which the System Under Test (SUT), a unit of test, depends on.

They were introduced at XP2000 in the Endo-Testing: Unit Testing with Mock Objects paper, and it took a while for them to gain popularity afterwards. Their role in software development was still being fleshed out in 2004 when Mock Roles, Not Objects was published.

The word “mock” is often used in an informal way to refer to the whole taxonomy of stand in objects that are used in tests; the most common of which are stubs, mocks, and fakes. While test doubles have the same objective, of standing in for real production objects, it’s important to distinguish between the different types since they all have different uses.

1. Dummy

Dummies are objects that can be passed around but are never actually used. They are usually intended as input to fill parameter lists hence never used in the test but needed for the its setup.

If you’re setting up to crash test a car, don’t put your mate in it. Use a dummy.

Consider the example of testing a PaymentService that has a Logger dependency:

DummyLogger
PaymentService with a dependency on Logger
PaymentServiceTest with usage of LoggerDummy

2. Fake

Fakes are objects with complete working implementation in them, but replace an object for which a simplified version of the real object is needed, typically to achieve speed improvements or to eliminate side effects. They usually take some shortcut and have simplified version of production code; rendering them incapable in production..

If you’re testing something that interacts with nukes, don’t launch the bloody nukes. Use a paper fake for now.

A great example of this is the in-memory database object which we can use just for our testing purposes, while we use the real database object in production.

Unlike other test doubles, no mocking framework is required to create fakes.

Consider the example of testing a PaymentService that invokes an AuthenticationService to check if users have access:

PaymentService with a dependency on AuthenticationService
FakeAuthenticationService
PaymentServiceTest with usage of a FakeAuthenticationService

3. Mocks

Mocks are used to test interactions between objects. They verify “indirect output” of the code under test by defining expectations on how that code should interact with the double, before the tested code is executed.

A mock acts as a “higher level” stub, that is pre-programmed with expectations — including the ability to respond to both calls it knows about and doesn’t know about.

These objects are useful in cases where there are no other visible state changes or return results that can be verified. They can throw an exception if they receive a call they don’t expect and are checked during verification to ensure they got all the calls they were expecting. For example, if you want to ensure that code reading from disk doesn’t do more than one disk read, you can use a mock to verify that the method that does the read is only called once.

Consider the example of testing a PaymentService that invokes an AuthenticationService to check if users have access:

PaymentService with a dependency on AuthenticationService
PaymentServiceTest with mocked AuthenticationService

4. Stubs

A stub is an object that provides preset return values to method calls, usually not responding at all to anything outside what’s programmed in for the test. Stubs are used to provide “indirect input” to the system under test.

They have no logic, and only return what they’re told to.

Stubs are useful when there’s need for an object to return specific values in order to get code under test into a certain state.

With stubs, care is not given to the number of calls made(if any), and while it’s usually easy to write them by hand, using a mocking framework is often a convenient way to reduce boilerplate.

Consider the example of testing a PaymentService that invokes an AuthenticationService to check if users have access:

PaymentService with a dependency on AuthenticationService
AuthenticationServiceStub implementing AuthenticationService interface.
PaymentServiceTest with stubbed AuthenticationService

5. Spies

Spies, just like mocks, act as “higher level” stubs that allow recording of information about what happened with the object and how it was called (by the tested code.

A key difference between spies and mocks is that a mock is a bare-bones shell instance of an object, entirely instrumented to track interactions while a spy will wrap an existing instance.

A spy records what functions were called, with what arguments, when, and how often and is used in verifying “indirect output” of the tested code, by validating expectations on how the code under test interacted with the test double; e.g. an email service that records how many messages it has sent or a login service that records what parameters were use to call a method on it.

Consider the example of testing a PaymentService that invokes an AuthenticationService to check if users have access:

PaymentService with a dependency on AuthenticationService
PaymentServiceTest with a spied AuthenticationService

Summary — A Cautionary Tale

Unit tests exercise the smallest pieces of testable software in the application to determine whether they behave as expected.

Tests that are independent of implementation details are easier to maintain. In most cases, tests should focus on testing your code’s public API, and your code’s implementation details shouldn’t need to be exposed to tests.

A test double is an object that can stand in for a real object in a test, similar to how a stunt double stands in for an actor in a movie.

Tests on query type methods should prefer to use stubs as there is need to verify methods’ return values while those on command type of methods should prefer mocks.

Do not mock what you do not own

Care must be exercised while applying test doubles. They are about object communication and interface discovery, using them for isolation, especially from 3rd party code tends towards being a misuse. In fact a general rule of thumb when mocking is “do not mock what you do not own”.

When your tests use the same collaborators as your application, they always break when they should. The value of this cannot be underestimated.

Wrappers and Anti-Corruption Layers are more appropriate tools for avoiding contamination by 3rd code than mock objects.

Thank you for reading. I sincerely hope it was a nice read.

You can catch me at:

GitHub: kwahome

Twitter: @kwahome_

--

--