Fixing Flaky Tests: The Fast Lane to Success
You’ve implemented a test suite into your pipeline for a good reason—to create the best possible product, improve user satisfaction, and debug issues quickly. But what can you do when your test suite seems like it’s getting in the way of your CI/CD pipeline and stopping you from your ultimate goal of speeding new releases out the door?
Flaky tests are tests that fail or succeed seemingly randomly when executed on the exact same code. The frustration is obvious. In fact, a run of flaky tests can shatter your team’s confidence in the test suite as a whole and create the temptation to bypass certain tests.
Flaky tests slow down the pipeline and interfere with productivity—not just because you can’t release, but because you’re wasting precious time fixing the tests. What happened to all those productivity gains your test suite was supposed to offer?
Let’s look at a couple of basic strategies to determine whether you have a truly flaky test on your hands. Then, we’ll explore several tips that experienced developers rely on to keep products speeding on down that pipeline.
Is Your Test Really Flaky?
The only way to know if a test is flaky is to run it. When code hasn’t been changed between test runs, yet failures keep popping up, the test is almost certainly flaky. In that case, the test itself can undergo troubleshooting and repair, creating a major bottleneck in your pipeline.
You can run the test, or even the entire test suite, a number of times. This can involve running tests in the same order or changing the order of tests, which can be a valuable tool in turning up inter-test dependencies that you haven’t accounted for.
Fortunately, a few tools are available to simplify this process and help you clear that bottleneck:
- Java developers can use the Test Retry Gradle Plugin, along with other flaky test analysis features to automate the process, though you won’t experience any time savings.
- Custom-code is an internal “Flakybot” tool similar to the one that developers at Spotify created to help their engineers determine if tests were flaky before merging code.
- Foresight makes debugging simpler with CI tools to troubleshoot tests, giving you full visibility into what’s happening and why in your full CI process down to the every step of the jobs in a CI workflow.
All of these techniques and tools are essentially there to help you determine whether or not your tests are flaky. But what’s the best way to fix a flaky test once you’ve discovered it? That depends on what the problem actually is.
Best-Practice Fixes for Flaky Tests
Once you’ve identified which test or tests are flaky, you still have to go in and identify what’s going wrong. Here are a few ways that expert developers tackle fixing flaky tests.
One surprisingly low-tech tool the development team at Spotify created in their effort to fix testing bottlenecks was a simple table that shows a limited number of basic statistics for each test, such as execution time. This will let your whole team keep an eye on trends over time.
Every organization deals with flaky tests, and when this happens at Google, they quarantine the test and continue testing the rest of the tests. This eliminates the roadblock, but can add another problem in that they still need to be run, usually in a separate test suite, to cover the gap in testing that has resulted. Obviously, if the isolated test runs perfectly once it’s in quarantine, then the problem was probably an issue of inter-test dependence or something similar that’s caused a change in state that wasn’t cleaned between tests.
As mentioned above, if a flaky test runs perfectly in quarantine, the problem is probably due to interdependence. Further issues can be caused by other tests that have run before the one that’s being flaky. Ensuring that you’ve cleaned up between tests can ensure that your test won’t be derailed.
Ensure that all state and data are removed. While most test suites handle cleanup, there can still be problems that might be undetected, such as clean-up errors that are silently ignored and databases that have been modified by previous tests. When it comes to databases, using transactions can give you a simple way to roll back after a test run.
Tests running asynchronously using network resources may encounter timeouts that can lead to flaky results. This is because network bandwidth can vary depending on the number of services using it. Timeout variables that are too short is a simple issue to fix; simply set these in bulk, and you’ll be able to modify them easily based on the situation at test time. If at all possible, before running complex tests that need to rely on asynchronous services, make sure the service is available to avoid issues with timeouts.
Sometimes, just to get to the heart of the issue quickly when a test is flaky, you can use a double of the service you’re testing. This is usually some type of dummy service or stub that mimics the behavior of the service being tested in a highly simplified way. Obviously, this is not an ideal situation, because it’s not accurately representing the service itself. But it can provide much-needed information. And there are ways to ensure—for example, with contract tests—that the calls to the double produce the exact same results as calls to the original service.
Sometimes, a source of flakiness could be caused by data that can’t be determined ahead of time, and that changes unpredictably, such as the system clock. In fact, any tests with time-based elements can be a source of flakiness. So look for date and time issues while you’re trying to track down a possible culprit. To sidestep this issue, wrap these data sources in your code and use hard-coded data instead for testing.
Is Your Memory Leaking?
Finally, take a good look at how your test code is using memory. According to Google, memory leaks are a potential culprit for flaky tests, because they eat up valuable system resources. Keep a close eye on memory consumption for clues—if it goes up every time you run the test, you probably have a leak. One way to sidestep this is using resource allocation pools as wrappers to create a barrier between your code and the actual memory. When excess memory is requested, the code will fail and you’ll be on your way to solving the problem.
Get Back in the Fast Lane
Testing is an integral part of every modern CI/CD pipeline. Certainly, it’s better to find and fix problems before the product is released, but when the tests themselves seem to be putting up roadblocks and you’ve got deadlines to meet, it’s natural to feel frustrated.
Flaky tests can be ignored within the CI/CD process for a long time until the entire test suite becomes unreliable. They can have a huge range of causes—from the test itself to the framework and resources it uses, its service and library dependencies, and even the OS and hardware. It can sometimes seem nearly impossible to identify the cause. The techniques described here can help you identify the problem and, once identified, fix things quickly so your team can get back to their full productivity.
With tools that give you total insight into your environment during testing, you can pinpoint the source of problems more easily.
Foresight is a comprehensive monitoring solution that provides insights and analytics for both CICD pipelines and tests furthermore automatically assessing the level of risk of software changes and suggesting optimization and prioritization tips for automated tests in CI pipelines.