CI/CD Insights and Analytics

Generating Workflow Telemetry Data for GitHub Actions

We've implemented an open-source GitHub Action to generate and collect workflow telemetry data for CI pipelines. This Action monitors the resource metrics of the CI machine where the workflows run.
Oguzhan Ozdemir
2 mins read

Introduction

With the recent trends in the observability ecosystem, monitoring your CI/CD processes became much more important. Monitoring your test performance and overall CI performance is usually the first step toward this trend. However, we think monitoring the resource metrics of the CI machine itself is important as well.

For this reason, we’ve implemented a new GitHub Action to put your workflows to monitor the IO, CPU, and Memory of your CI runs.

See the action yourself: https://github.com/thundra-io/workflow-telemetry-action

Scenarios Where We Tried Different Caches

The action works in a simple way. It starts at where you place it in your workflow and starts collecting the telemetry data in memory and processes it in the post-action to create graphs to put it in the run’s details as well as the related pull request as a comment. That way, we have minimal impact on your general CI performance with little to no overhead.

To create an example for this post, we’ve tested the action in one of our demo projects. We’ve tested different scenarios where we tried different caches to see what the output would be like.

Scenario #1: With No Cache

Without any cache, we get a result like the following graph for the run’s Network IO.

No cache

In this workflow, we have a simple Java application with integration tests. So, the tests include Docker. In this graph, the last 3 peaks are Docker images being pulled. The first peaks are for Maven packages.

Scenario #2: With Maven Cache

With a Maven cache in the workflow, we see that the Network IO graph doesn’t change in terms of the data it transmits. However, we see a drastic decrease in the duration of the workflow.

With Maven cache

As seen in the image above, Network Read doesn’t change a lot for the Maven caches. When we dig a little bit into why this happens, we see that GitHub’s action-cache still uploads and fetches the cache outside of your action, which is understandable. The upside is that it does this in one bulk. As a result, the process is way faster than fetching each package separately. But, it’s using the network nonetheless. You can see the implementation of this behavior here.

Scenario #3: Maven and Docker Cache Together

When we add Docker cache to the workflow, we see a change we want to see.

Maven and Docker cache together

With Maven and Docker cache together, all the Network read happens at the start of our run and for the rest, we see very little Network read action. Although Docker cache didn’t change the workflow run’s duration that much, we still see some improvement.

Conclusion

As a summary, this action will generate CPU, Memory, and IO metrics and post them as the build summary of the workflow run.

IO Metrics and Results

To get the usage details and the codes for this action, visit the GitHub repository. In the future, we want to give the developers the ability to export these metrics to their choice of an observability tool as well as Foresight to get further details about their CI/CD runs.

In the meantime, we are open to feedback and suggestions on how to improve our approach to CI observability.