Microservice Testing: A New Dawn

Sunrise at the Maasai Mara. Photo from Javi Lorbada on unsplash.com

“A best effort verification of the system — and making smart bets and tradeoffs given our needs”

What is testing anyway; software testing to be precise?

Software testing means to a great many people something along the lines of: evaluating and verifying that a software product or application does “what it’s supposed to do” before being shipped to production. It means any activity aimed at evaluating an attribute or capability of a program or system and determining that it meets its “required results”.

Although crucial to software quality and widely deployed by programmers and testers, software testing still remains an elusive art perhaps due to limited understanding of the ever evolving principles of software.

There’s a pretty good explanation for it; requiring a trip down memory lane, that we shall get to in a short while.

In the meantime, let’s explore Cindy Sridharan’s 2017 blog post titled Testing Microservices, the sane way in which she makes a revolutionary observation about what testing really is:

“… Bigger companies can afford this level of sophistication, but for the rest of us treating testing as what it really is — a best effort verification of the system — and making smart bets and tradeoffs given our needs appears to be the best way forward.” ~ Cindy Sridharan

Right. Time for that trip to the Smithsonian museum.

Golden Age of Software Testing

Photo by Fredy Jacob from unsplash.com

The 1990s were the golden age of software testing.

As an industry, there was still quite a lot to figure out. Global or local data? File and variable naming conventions. Time constraints versus memory utilization. Library, procedure or inline code? Use or reuse? But the biggest of them all: fixing bugs that occur in the field when the only way to get bug reports is by post, phone or email and the only way to update software is by mailing a new set of floppies.

You see, we were once upon a time not very experienced in writing code. And after shipping that code, fixing it was a really, really painful proposition. No wonder so much time and effort was put into testing. There was no choice but to double check developers’ work and try to ensure that as few bugs as possible made it into the released product.

The 1990s was also an era of monoliths being the default architectural style. Well, applications composed of other applications began to surface (although the first experimentations were during the 1980s) but monoliths were all the rage.

While things have moved along considerably — with microservices and what not — , we might still be hung up on the 90s; at least in the testing end of things as Cindy points out:

“As an industry, we’re beholden to test methodologies invented in an era vastly different to the current one we’re in.

People still seem to be enamored with ideas such as full test coverage (so much so that at certain companies a merge is blocked if a patch or a new feature branch ends up fractionally decreasing the test coverage of the codebase), test-driven development and complete end-to-end testing at the system level.”

The Distributed Monolith

Microservice complexity

João Vazao Vasques makes a bold assertion that he calls “an important truth, our north star, that will guide us on this journey” in his 2018 blog post Your Distributed Monoliths are secretly plotting against you:

“Most implementations of microservices are nothing more than distributed monoliths.”

Matthew Skelton in his presentation at the London DevOps Enterprise Summit of June 2017 spoke about the types of “software monoliths” that can creep into a project:

  • Application monolith: single block of code deployed as a unit
  • Joined at the DB: difficulty to change separately
  • Monolithic build: rebuild everything; one gigantic CI build
  • Monolithic releases: coupled release; smaller components bundled together into a “release”
  • Monolithic thinking: standardization; “one-size-fits-all” for teams

The “testing monolith”, similar to the monolithic build and releases types that Matthew describes, is a sixth type of monolith that has been suggested. Continuous Delivery consultant Steve Smith has already suggested that he believes end-to-end testing is considered harmful.

Steve’s point of view is that (monolithically) spinning everything up in order to verify the presence — or lack thereof — of issues in a system is fundamentally incompatible with Continuous Delivery. Besides, it’s a proposition that greatly suffers from the fallacy of decomposition and the cheap investment fallacy; the idea that testing a whole system will be cheaper than testing its constituent parts.

“Any advantage you gain by talking to the real system is overwhelmed by the need to stamp out non-determinism” ~ Martin Fowler

Testing is more than just debugging.

A further complication has to do with the dynamic nature of programs. If a failure occurs during preliminary testing and the code is changed, the software may now work for a test case that it didn’t work for previously. But its behavior on pre-error test cases that it passed before can no longer be guaranteed. To account for this possibility, testing should be restarted; an expense at which it’s often prohibitive.

An interesting analogy that parallels the difficulty in software testing with a pesticide is The Pesticide Paradox defined by Boris Beizer:

“Every method you use to prevent or find bugs leaves a residue of subtler bugs against which those methods are ineffectual.”

Full Stack in a Box

Real-time graph of microservice dependencies at Amazon in 2008. https://twitter.com/Werner/status/741673514567143424

The full stack in-a-box testing strategy entails replicating a cloud environment locally and testing everything all in one local instance.

As you must be imagining — cringing even — it’s no mean feat given how elaborate and fragile a setup it is, often requiring a team in its own right to build, maintain, troubleshoot and evolve the infrastructure.

“If anyone so much as sneezes, my service becomes untestable.” ~ Tyler Treat

With a two services setup, Cindy Sridharan recounts the trials and tribulations of her first hand experience with this “fallacy”. She explains that at one of her previous companies, they tried to spin up the entire stack in a Vagrant box — the Vagrant repo itself called something along the lines of “full-stack in a box” — with the idea being that a simple vagrant up would enable any engineer to spin up the stack in its entirety on their laptops.

“… asking to boot a cloud on a dev machine is equivalent to becoming multi-substrate, supporting more than one cloud provider, but one of them is the worst you’ve ever seen (a single laptop)” ~ Fred Hébert

“Software complexity (and therefore that of bugs) grows to the limits of our ability to manage that complexity.” ~ Boris Beizer

The Spectrum of Testing

Image from Cindy Sridharan

Historically, testing has been something that refers to an activity in the purview of a pre-production or pre-release phase, often with siloed testing/QA teams.

Driven by the “build it / run it / own it” ethos popularized by Amazon — at whose core is the rationale that being woken up at 2 am severally by your service’s pager is quite a powerful incentive to find and fix root causes and focus on quality while writing code — this model is slowly being phased out. Development teams are now responsible for testing as well as operating the services they author.

“Good testing involves balancing the need to mitigate risk against the risk of trying to gather too much information” ~ Jerry Weinberg

You can’t test in quality, but you can code it in.

The spectrum of testing illustrated above, adapted from Cindy Sridharan, broadly partitions software testing into pre-production testing and testing in production. This, she says, can be used to encompass a variety of activities, including many practices that traditionally used to fall under the umbrella of “release engineering” or Operations or QA.

“…but it encompasses some of the most common forms of testing seen in the wild all the same.”

Pre-production Testing

Test Pyramid

Pre-production testing is predominantly, as Cindy states:

“A best effort verification of the correctness of a system as well as a best effort simulation of the known failure modes”

Software bugs will almost always exist in any software module with moderate size: not because programmers are careless or irresponsible, but because the complexity of software is generally intractable — and humans have only limited ability to manage complexity. It’s also true that for any complex systems, design defects can never be completely ruled out.

Well, if that’s the case, what is to be said about the scope of pre-production testing?

“The scope of pre-production testing is only as good as our ability to conceive good heuristics that might prove to be a precursor of production bugs.”

Any scope is heavily curtailed by the implicit assumptions the system is built upon and by a plethora of biases held by software engineers on a development team because invariably, the person (team) writing the code also writes the test; with code reviews nonetheless.

The infographic below highlights the essence of some of the pre-production test methods:

Image from https://martinfowler.com/articles/microservice-testing/#conclusion-summary

The Unit of Test

Single Responsibility Principle. Image from LearnStuff.io

A microservice architecture is the natural consequence of applying the Single Responsibility principle at the architectural level. Microservices are thus built on the notion of splitting up units of business logic into standalone services, where every individual service is responsible for a standalone piece of business or infrastructural functionality.

Microservices are inherently (and often) stateful entities: they encapsulate state and behavior akin to an Object or an Actor.

“Not all I/O is equal”

Consider the example of testing a microservice that is responsible for managing inventory. It’s certainly more prudent to verify items being created successfully in the database than it is to ascertain that HTTP parsing works as expected. Granted, a bug in the HTTP parsing library can act as a single point of failure for such a service; and is hence an important aspect to verify, but it’s also verily subservient to the primary responsibility of the service.

“What is of the essence here is that the most important unit of functionality a microservice provides happens to be an abstraction of the underlying I/O involved to its persistent backend, and as such should become the hermetic unit of base functionality under test.”

Testing (QA) In Production

QA in Production. Image from https://martinfowler.com/articles/qa-in-production.html

“I’m more and more convinced that staging environments are like mocks — at best a pale imitation of the genuine article and the worst form of confirmation bias.” ~ Cindy Sridharan

In reality, every deploy is a test; in production (because every deploy is a unique, never-to-be-replicated combination of an artifact, environment, infrastructure, and time of day). Every user performing an action on your system is a test; in production. Increasing scale and changing traffic patterns are tests; in production.

Oh! There’s more.

Distributed systems exist in a perpetual state of partial degradation. Failure is the only constant. Failure is happening on your systems right now, in a hundred ways you aren’t aware of and may never learn about. So obsessing over individual errors will drive you straight to the madhouse.

“This is an industry that’s largely in denial about failure, and the denial is only just beginning to lift.” ~ Charity Majors

On the contrary. There’s a lot of daylight between throwing your code over the wall past any form of pre-production or pre-release safeguards waiting to get paged and shipping with alert eyes on it as it goes out, watching your instrumentation, and actively flexing the new code. The job of modern software engineers is not done until they have watched users use their code in production.

Pre-production testing is great for finding defects you expect to happen, but many production defects are surprises.

Developers need to get comfortable with the idea of testing and evolving their systems based on the sort of accurate feedback they can only derive by observing the way these systems behave in production. Sole reliance on pre-production testing won’t stand them in good stead, not just for the future but also for the increasingly distributed present of even the most nominally non-trivial architecture.

The difference between Observability and Monitoring boils down to the known-unknowns and the unknown-unknowns.

Summary

Because a microservice architecture relies more on over-the-wire (remote) dependencies and less on in-process components, your testing strategy and test environments need to adapt to these changes.

Given how broad a spectrum testing is, there’s really no one true way of doing it right. Any approach is going to involve making compromises and tradeoffs.

Your application is being tested in production every single day by the people who use it. You just need to find a way to use all the data users are already generating.

Finding the right balance of pre-production and production quality practices can help you gain a more realistic and holistic understanding of the quality of your system.

Thank you for reading. I sincerely hope it was a nice read.

You can catch me at:

GitHub: kwahome

Twitter: @kwahome_

--

--

First, solve the problem. Then, write the code. [https://github.com/kwahome][https://www.linkedin.com/in/kelvinwahome]

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store