This article is a commentary for the “Mutation and Contract Testing” lecture presented at CodeteCON #KRK4. If you would like to see the slides, you can find them at https://github.com/mbryla/contract-testing. You can read this and more articles of this author at https://medium.com/@mateusz.bryla.

Are you sure that your tests do what they’re supposed to? Are you certain that your integration suite fails only if your code is to blame? If not then I might have an answer for you.

Introduction

The purpose of this article is to introduce two testing concepts that have been around for quite some time, but have never gained enough popularity to become widespreadly used: mutation testing and contracts testing. Both provide with little effort a substantial value to how your projects can be tested.

Mutation Testing

Concept

The idea behind spending precious (especially for the end-client) time of programmers to design, create and maintain tests is to increase the general safety of the project in various aspects:

  • confidence that the code we deploy to production will not fail on Friday evening
  • expectation that a new intern’s commit won’t silently break the legacy code that’s a nightmare to delve into
  • certainty that the bug that took us last two weeks to fix will not come back

“Quis custodiet ipsos custodes?” – this is a question asked by a Roman poet Juvenal which summarizes one of the biggest problems with tests written by humans: who watches the guardian? Since we base the safety of our project on our tests we should be 100% certain that they work correctly. But how can we achieve it? Sure, we can perform extensive code reviews when adding tests, but based on my experience the developers often give the tests only a quick look or disregard them completely. We could rely on static analysis of the test code, but what about logical errors? In the end we can always write more tests. At least some of them will work, right? But do we really want to keep the precious pipeline time busy with useless tests? Here comes the saviour: mutation testing. It helps us to verify that our tests do exactly what we expect them to do.

The concept behind this test approach is actually really simple. Let’s imagine we’re hired to design a solution for vehicles that want to cross a valley. We’ll build a bridge.

mutations problem

To be sure that what we designed satisfies the needs of the client we’ll create a test – a single lorry (or a truck if you prefer to cross the Colorado River)  that starts on one side of the valley and using our solution is expected to arrive at the other side. We can build the bridge first or follow the TDD (test-driven development) and send the lorry to its destruction before the construction is done – which approach is chosen does not matter for the mutation testing since they come into play after we have created our test suite.

mutations test

The bridge is complete and so is our test with a single lorry that tries to cross it. Now we can use this new testing concept to verify that our tests do what we expect them to. The first step is to perform a mutation – random modification of the production code (not the test). In our example we have mutated the bridge by removing all its support pillars.

Mutation Testing

After we have mutated the production code we execute all our tests suites on the modified implementation. Expected result in our bridge construction product would be a collapsed bridge with our lorry falling down into the river… right?

mutations expected result

But what if the result is different that that? What if the test passes, our truck manages to somehow make it to the other side of the valley even though the support pillars are gone? Well then obviously either the bridge does not really need these seemingly important construction elements or more probably there is something wrong with our test case and this is exactly what we have been looking for.

Principle

Coming back to the world of programming the principle of mutation testing can be summarized in the following steps.

  1. Run your test suites
  2. Mutate a piece of production code (create mutants)
  3. Run your tests again (try to kill mutants)
  4. Find tests that did not fail (find surviving mutants)
  5. Repeat steps 2 – 4 with different mutations

To produce mutants our production code is processed using various mutation operators. These range from the most basic ones, like conditionals boundary operators, which as the name suggests are designed to check boundary conditions to more sophisticated ones like class method operators which can remove method calls, replace class members with nulls and default values and many other. A few examples are presented in the following list.

  • Conditional
    • Replaces “ < “ with “ <= “
    • Replaces “ == “ with “ != ”
  • Math
    • Replaces “ + “ with “ – “
    • Replaces “ * “ with “ / “
  • Return value
    • Returns true if the original method returns false
    • Returns 0 if the original method returns an integer not equal to 0
  • Method call
    • Removes void method call
    • Removes non-void method call returning default value

After all the mutations have been generated, the testing framework runs all your test suites with the mutated production code. The expected result is at least one failing test for a single mutation. If no tests fail it means that the mutated situation is not covered by anything in the suite – so either the tests are incorrect or they do not cover enough logic.

If at least one test is failing we say that the mutant was killed – this is the desired state. When all the tests passed we say that the mutant survived – it marks places that might need some improvements. There are other possible results – usually when mutation operators make the code unusable – leading to errors or infinite loops. This is why each test run has a configurable timeout to prevent stalling the whole suite.

The final result consists of a regular coverage report and mutation tests report. The latter shows the production code with red lines marking code that was the cause of surviving mutants.

Setup

Mutation testing frameworks are available for almost every programming language. Pitest for Java, Stryker for JavaScript and Scala and many others. In most cases the setup only requires you to modify your continuous integration pipeline configuration with an addition of a single plugin or snippet. For instance in Java projects you just have to add the following to your Maven pom.xml file. If you want more fine-grained control like selecting mutation operators, narrowing down the mutated classes or blacklisting specific methods you just need to provide additional configuration properties.

Summary

So what are the disadvantages of running mutation tests? There are only two that come into play. The first one is a somewhat high number of false positives – in my experience roughly 25%. They include cases like debug methods, equivalent mutations (when mutated code behaves the same way as the original). As a result a human has to manually go through all the marked lines and assess whether a fix or additional tests are required.

The second one is the time needed to analyze all the possible mutations. Assuming a reasonably large codebase consisting of 1000 classes having 10 unit tests each, a brute force analysis of 10 mutations per class would take 28 hours (assuming 1ms per single unit test). Thankfully this number can be greatly reduced by running only those unit tests that actually cover the mutated production code – the aforementioned example should take around 2 minutes. This is a decent time, but with a higher number of mutation operators and a larger codebase we can expect between 10 and 20 minutes or runtime which is way too much to be executed during every commit or pull-request.

The only tangible advantage of mutation testing is their ability to provide some metric on the quality of our tests. This may seem like something not worth the effort. But take into consideration that besides the Code Coverage there is no other way to assess how good tests suites protect your production code from bugs. Introducing mutation tests into your continuous integration pipeline is not a cumbersome task to do and it may yield some really interesting insights about your tests suites.

Contract Testing

Concept

There are three most basic scopes of application tests: unit (tests of sub-components), integration (tests of the whole product with mocked boundaries) and end-to-end (tests of multiple applications working together as a whole). Contract testing tries to solve problems that occur in integration tests (and sometimes end-to-end tests) – namely problems with unreliable and hard to maintain external services integrations.

There are only two approaches towards integration tests with third party services. The first one is to use test double, or more commonly mocks. Here we have two sub-options – we can either code the mocks ourselves or use mocks provided by the external service development team. The second approach is to use what is called a sandbox – a live instance or a group of services running on the third party infrastructure exposed for integration testing (see PayPal sandbox). Those also appear in a slightly modified version – with sandbox accounts to the production instance of the third party service (see Facebook sandbox).

The biggest problem with any of the above approaches is that developers do not pay the same amount of attention to their sandbox environments as they do to the production ones. The documentation is always full of sentences like “almost identical instance”, “with few exceptions, the behavior on the test site exactly mirrors”. The bug tickets always appear with the lowest priority in the backlog. There are rarely any SLAs for the availability of the sandbox environments and when there are, they are often not met. Test doubles provided by third party services are often faulty and differ from the actual implementation. If you have ever spent a few hours debugging both your production code and test suites only to realise that the tests failed because of the sandbox then you know how troublesome it can be. The worst case scenario is when you reach this moment when the answer to the question “Why is this test suite failing? Should we roll back?” is “Not really. It’s the sandbox of service X – it fails most of the time”. And trust me – I’ve heard those. If you experience any of those problems then contract testing will help you in that.

The idea behind contract testing is based on two elementary definitions – an interface is a “shared boundary across which two or more separate components of a computer system exchange information”. We can visualize an exemplary interface as a set of endpoints exposed by a producer component to a number of consumer components. Based on an interface we can define a contract as a set of specific requests and responses based on an interface between producer and consumers.

An example of such contract in a very well known world of RESTful interfaces could be a HTTP GET request made by the consumer to the /echo/hello resource to which the producer responds with a HTTP 200 OK containing JSON object { “echo”: “hello” } as a body of the response.

contract testing

The important remark here is that an interface might exist not only between separate applications (or as the hip approach teaches us – between separate microservices), but also within the application itself – between layers of domain driven components, libraries and logic modules.

Principle

You can see where we’re heading with this. Instead of testing our code against test-doubles, we perform our integrations against a contract. The same can be then performed by the team responsible for the external service. Mutually testing against the same contract guarantees compliance without relying on the availability and stability of the services – it shifts this responsibility on the contract.

Testing against a contract is as easy as testing against an integration. Libraries dedicated for contract testing create mocked services reflecting the contents of the contracts. The only modification in the code is to point your consumers to a mock created from the contract instead of the sandbox integration environment.

contract test

The ground rule of this approach is that the contract is binding. It is always up to date with the respective version of the APIs provided by the producer. It is only natural to ask “why should we expect external service developers to pay more attention to the contracts they provide than to the sandbox?”. Well we shouldn’t, but we can expect it at least to some degree. First of all, generating the contracts can and should be automated. Dedicated libraries (more about them later) allow to generate them from your code and publish alongside with your API specification (go OpenAPI! ;)). Secondly, the resources needed to take care of the sandbox are by orders of magnitude higher than those necessary to keep the contracts up to date. And finally having the contracts as part of your environment is also a good way for the producers to verify their code against what they claim to provide in the API – so the motivation for the contract correctness is much higher.

But what if there are no contracts provided for the service I am using? Well then you can always create them on your own. Test your code against the contract you’ve created and validate it by testing against the sandbox. You will know straight away whenever your code or the external service strays away from what is expected. And if the test suites fail on the sandbox integration side then unless there are some unexpected API changes you can rest peacefully.

Setup

One of the libraries for contract testing is Spring Cloud Contract (Java). You can read their docs or check out this project on GitHub containing simple producer and consumer Java applications using this library for contract testing. It is also worth to check Pact for Ruby, JS, Go and others.

Configuration of a producer consists of adding a spring-cloud-contract-maven-plugin to the project’s pom.xml, a base class for the tests and a contract description listing example requests and responses. This will automatically generate acceptance tests for the producer to verify compliance with the contract it is providing along with the contract itself in a form of a jar file that can be published as an artefact with other binaries of the project.

The configuration of the consumer is much simpler. The pom.xml file should list the contract jar as a dependency and using a provider of choice (WireMock in case of the example project) it is served as a mocked service for all the integration suites.

Summary

The biggest disadvantage of this approach is that you have to create and take care of the contract yourself if it is not provided by your external integration service. This requires additional effort that you have to convince your business to agree to. But if the contract is already provided then there are no clear reasons not to try this approach. The benefits of contract testing are usually visible only to those who had to deal with often unexpected problems with their sandboxes and test doubles – they can see the value of this approach straight away. And if you ever find yourself in that situation, think for a second whether it is not worth to spend some time introducing contract testing to save countless hours of fighting with sandboxes.

Final remarks

The concept of mutation testing was first proposed by Richard Lipton in 1971. It was formally defined 7 years later and implemented for the first time by Timothy Budd in 1980 – see history of the mutation tests. Contract testing is quite younger. At the beginning it was described as Consumer Driven Contracts – see Ian Robinson article from 2006. Clearly both of these techniques have been known for quite some time but they have never gained a widespread popularity. We can only try to guess why. Personally I believe that mutation test were just too resource-consuming at that time while the benefits of contract testing only became obvious once we’ve stepped into the microservices era. The most important thing to remember is that they both are powerful tools that can bring a lot of value if used correctly in the right moment. A precision chisel in the toolbox of a real craftsman. I hope that after reading this article you’ll have enough curiosity to try and use it to carve some code 😉

mateusz.bryla

Dreamer, programmer, consultant, trainer. Founder at Lingmates, Senior Software Engineer at Codete.