CI/CD Insights and Analytics

5 Causes of Flaky Tests and How to Mitigate Them

Monitoring and debugging failed tests are one of the biggest challenges we face when maintaining our software. In this article, we look at what causes flaky tests & how to mitigate them.
Burak Kantarcı
5 mins read

With the ocean of open-source tools available today, building functional software is easy. Making high-quality software that meets the expectations of customers, however, is not.

Delivering a high-quality product in the software industry requires a lot of tests. That means applying a CI/CD pipeline to our workflow to ensure every feature is well tested.

Automated tests can provide some relief, but from time to time, we’ll have more features that come with many more tests. After a while, we’ll notice that some of the tests are flaky. On a beautiful day, they’ll pass. On a bad day, they’ll fail. Eventually, monitoring and debugging failed tests will become one of the biggest challenges we face when it comes to maintaining our software.

In this article, we’ll look at what causes flaky tests and how to mitigate them.

Why Tests Become Flaky

There are many reasons why tests become flaky. Here are five of the most common.

1. Your Tests Depend on Each Other

When implementing automated tests, we usually need to set up environments in which to run the tests. To save time, a quality engineer might run the test to create a user first, then run the test to authenticate the user, and then run the test to execute specific actions (for example, creating a new blog).

This kind of setup often leads to flaky tests because the order of your tests might change. For example, the test to authenticate might be run before the test to create a user, and you might end with the test for creating a new blog. In this order, your tests to authenticate and create a new blog will fail because a new user wasn’t created.

If you make your tests as isolated as possible, it won’t matter what order they’re run in. More importantly, if the tests are run concurrently or in parallel, they won’t create additional issues.

2. Your Concurrent or Parallel Tests Share Resources

We know that running tests concurrently or in parallel will reduce the time it takes to execute them. But that’s easier said than done.

To run your tests in parallel, you’ll need to isolate them as much as possible. We already mentioned that the tests should not depend on each other, but they shouldn’t share resources either.

For example, there might be a case where we have the test use the user “aklahomabest@gmail.com” for creating a new blog. Then we have another test that checks the delete user function, which uses the same account. Running these two tests in parallel will make the third test, for creating a new blog, fail. This is because when executing, the system cannot find the user to authenticate.

3. Your Tests Rely on Third-Party Software

Making good software requires many unit tests. However, setting up a mocking database and third-party service can be a challenge and take a lot of time. That’s when a developer might decide to call the mocking service provided by a cloud service.

As a result, when the test is run in the local environment, it’ll be fine; but when the test is executed in the CI/CD pipeline, it fails. This is because in the deploy environment, the network automatically blocks the external service.

To avoid this issue, try to mock third-party software as much as possible. This will keep your unit tests isolated so that they can run in all kinds of environments. Moreover, removing the dependency on a third-party software will make your build faster.

4. Your Tests Involve Caching Data

There may be cases where we have tests that accidentally involve caching data. For example, we might have a service that shows all the gene data of users from Canada. Since gene data is considered to be sensitive, we might want to mask it from regular users.

To make the data display fast enough for users to view, we’ll apply a caching solution using the Redis database instead of calling the Elasticsearch database. The Redis database is a great solution for caching data since they use RAM for storage. Redis also supports storing the data on the hard drive with little configuration.

If a regular user is upgraded to a power user, they should be able to view all the data, including the sensitive information. However, due to the Redis caching, they instead see the previous caching data without the sensitive information.

For a quick solution, we can clear the Redis cache when running this kind of test.

5. Your End-to-End Tests Use Sleep, Instead of the Flexible Wait Method

Automating end-to-end tests ensures that our application meets the user’s expected behavior. But they’re never easy. With some screens, we would want to wait a little longer so the page can load the data we need, but a less-experienced tester will start with sleep (wait method) for a flat 2 or 3 seconds.

This approach will surely lead to flaky tests. With a different environment and different time, the page could take 1 second to load or 5. This means the test might run fine in the local environment but will fail when included in the CI/CD pipeline.

Instead of waiting for 1 or 2 seconds as a rule, we’d be better off waiting until some specific element, or part of the page, is shown; with timeout being set for about 10 seconds.

Mitigating Flaky Tests

Now that we understand why tests become flaky, how do we identify them? And most importantly, how do we fix them?

First, we need to set up the infrastructure that allows us to monitor the tests. Then we can get a summary that allows us to find out which kinds of tests repeatedly change from pass to fail. Then we’ll grab the list for them.

Monitor Flaky Tests

Once we have a list of the tests, we’ll try to run only those tests many times. This will help us get as many logs as possible, so we can find out why they failed. This time, we might not be able to run the tests in parallel. Instead, to isolate problems from running the tests concurrently or in parallel, try to run them sequentially.

When you’re confident that no data is being exchanged between tests, you can apply concurrent tests. In the end, the time it took to run your tests should equal your previous longest-running test.

In order to monitor the tests, we need a good tool. We could combine open source tools in order to monitor the tests, but that can be daunting and will take a lot of time and resources to implement. With Foresight, we can easily monitor the tests from a single dashboard.

Example Flaky Test

To better demonstrate the flaky test and how to identify and fix it, let’s look at a real-life example.

We already had a running application that allows users to create new movie content. You can see a screenshot of the app below.

Figure 1: Foresight movie application

We’re going to write an end-to-end test that checks whether a user can create a new movie or not. We’ll use Selenium with Jest (a Node.js library for test runners) and run it in BrowserStack. You can see the full test code in this gist.

Okay, let’s run the tests a few times and see what comes up. To be able to view the historical results of a specific test, all the way up to the present day, go to the Foresight Dashboard.

Figure 2: Foresight repositories list

From the dashboard, click on the repository we are using, which is “thundra-browserstack-nodejs.”

Figure 3: Overall test run for recent commits

Here we see the overall test run of all the recent commits. Click on one commit and go to the "Apply Foresight in” test.

Figure 4: The test result for the specific commit

Click on the test we want to check, then go to the performance tab.

Figure 5: History of a test

Here we can see all of the recent tests—including our only passing test.

This is because we set our test to use the sleep method every time we need to wait for the element to appear on the web page.

Let’s update the test to wait for elements to be visible instead of waiting for a hard timeout. You’ll find the updated code in this gist.

If we run the tests a few more times, we see they’re now stable and that they’ve passed.

Figure 6: The test is now stable after updated code

Foresight captures test history so we can easily identify flaky tests and understand what went wrong. This saves precious time for both developers and quality engineers. It also provides the DevOps team with more time to build infrastructure for the application instead of spending time building tools and services for monitoring tests.

Conclusion

Having flaky tests in your test environment is a dangerous thing. If you can’t mitigate that flakiness, your team can’t trust the tests. Without confidence in their tests, your team will have to rely on manual tests, which will slow down the release cycle.

Monitoring and investigating flaky tests is a daunting task. Without the right tool, developers and quality engineers can take days to debug them. Luckily, with tools like Foresight, we can monitor and get insights on flaky tests from a single place. This will allow your team to consistently solve each flaky test.

Gain confidence in your software release cycle with Foresight.