Breaking Down the Codecov Attack: Finding a Malicious Needle in a Code Haystack

April 28, 2021 Nimrod Stoler

Codecov Breach Deconstruction

Earlier this month, San Francisco-based technology company Codecov discovered that attackers had compromised its software platform — used by more than 29,000 customers worldwide to test software code — in the latest digital supply chain attack to make headlines. While that was troubling enough, there was an added hitch.  Although the attack was identified and reported in April, the tampering reportedly started back in January. And it may have continued undetected had it not been for some astute observations by a customer.

The ripple effects and long-term ramifications of this attack have not yet been determined, and investigation is still ongoing. However, based on reports, we can examine the Tactics, Techniques and Procedures (TTPs) used by attackers to place a needle in a haystack of code, surreptitiously infect Codecov’s CI/CD pipeline and potentially gain access to thousands of customer networks. What’s clear is that this attack, like so many before it, followed a familiar path: target and steal credentials to get to the intended target.

The Codecov Breach

Codecov produces an array of code testing software, and the software that was reportedly impacted during this attack was made specifically for CI/CD pipelines. When developers at a customer organization finish testing, they will often download a script directly from Codecov’s servers, which will check the code coverage of the testing apparatus. It will then report back to Codecov’s servers.

An online statement by Codecov details the initial breach discovery: “On Thursday, April 1, 2021, we learned that someone had gained unauthorized access to our Bash Uploader script and modified it without our permission. The actor gained access because of an error in Codecov’s Docker image creation process that allowed the actor to extract the credential required to modify our Bash Uploader script.”

This uploader tool works with popular development platforms like GitHub. They use secrets and other credentials that enable interaction between applications and other tools in the CI/CD pipeline, along with access to cloud resources.

After gaining a foothold and obtaining the necessary credentials, the attackers created a backdoor by hiding a single line of malicious code within the approximately 1,900 lines of code that made up the uploader. They did so with relative ease, as two-factor authentication was not required to access the uploader, according to reports.

Each time a developer downloaded the Codecov testing script, the malicious software would begin running on the customer organization’s test machines. This allowed the attackers to export the secrets, credentials and other sensitive data stored in the victim’s continuous integration environments and send them to an attacker-controlled server outside of the customer’s infrastructure.

The particulars of this attack share commonalities with the large-scale SolarWinds breach and present an interesting perspective on the CI/CD pipeline and how efforts to protect these dynamic environments often fall short. The Codecov supply chain attack was clearly designed to extrapolate weaknesses and scale efforts for maximum impact. It took advantage of the fact that Codecov’s script was an unusually large one, based heavily on environment variables — sets of dynamic name-value pairs used by Linux and Windows operating systems — that often contain hard-coded API keys, database credentials and more. These secrets are stored and used heavily within CI/CD pipelines. And because there is often very limited security oversight around how they are managed and protected, these secrets are easy targets for attackers who pinpoint and harvest them.

For months, the attackers apparently had code execution access into each and every system that was using the Codecov script. They avoided arousing suspicion by hiding the malicious code inside a larger code series — and by concentrating their efforts on those environment variables. Based on the line of script the attackers used — one that specifically centers on sending Git repository URLs to the attacker-controlled server — it appears that GitHub was the focus.

The Discovery

It’s not abnormal for an organization to consider code downloaded from a business partner to be trustworthy, and so customers don’t always pay much attention to the granular details of the code, such as its digital fingerprint. The fact that Codecov attaches a signature to its proprietary code, however, is ultimately what led to the discovery of the attack, months after the initial breach. When the signature on one machine didn’t match up to that on another, a red flag was sent up by a customer and the attack revealed. Had the attackers changed the code signature, they might well have been able to operate unnoticed for an indefinite amount of time.

The Takeaway

Development environments are complex, with numerous places where secrets and credentials can be, and often are inadvertently exposed. For example, while code repositories such as GitHub are an essential part of the development process, credentials can be inadvertently exposed by making code public and allowing attackers to include malicious code within the builds. An organization’s code and intellectual property can be tampered with or stolen from repositories if credentials are compromised.

While there is no one vendor or tool that can completely prevent digital supply chain attacks like this from happening, there are steps organizations can take to strengthen their security postures and minimize risk:

Perform Code Signature Checks. By simply checking the software’s digital fingerprint to verify its integrity, the “dwell time” of this attack could have been limited to mere days or even hours, rather than months.

Mandate Multi-Factor Authentication (MFA). By looking at this attack process from the attackers’ point of view, the Codecov breach was made a lot simpler because they didn’t need a second authentication factor or another piece of approval to insert their code.

Do Not Store Credentials and Secrets in Environment Variables. The reasons are numerous: most notably, it is extremely hard to track usage of environment variables, as they are passed down to child processes that allow for unintended access and break the principle of least privilege. Instead, if an application requires a secret be handed over in an environment variable, use a secrets manager to help ensure only authenticated users get access to the clear text secrets.

Implement Threat Detection Capabilities. CI/CD pipelines are highly automated, which means there is little human interaction, making it easier for attackers to fly under the radar. Having threat detection capabilities in place could help detect anomalies and potential breaches easier and earlier.

Finally, with cybersecurity, most things boil down to effective and consistent communication. Developers simply don’t always have the awareness of inherent security issues along CI/CD pipelines, nor do they often have a clear sense of what their tools are actually doing with credentials and secrets. Making the dangers of vagueness and ambiguity clear to developers is a core part of Shift Left, and it’s an important — if not always welcome — way forward to more stringent security.

Developers may sometimes look at security specialists as just one more layer that makes their jobs harder. But there’s obviously a compelling need for this important layer. And while putting in the time and effort to clearly identify the risks involved and the measures needed to mitigate them may mean extra steps now, it could very well result in fewer security headaches later.


Previous Article
Checklist for 6 Approaches to Engage Developers
Checklist for 6 Approaches to Engage Developers

Security teams can use this checklist to take six actionable approaches to more effectively engage with dev...

Next Article
Securing Cloud-Native Apps and CI/CD Pipelines at Scale
Securing Cloud-Native Apps and CI/CD Pipelines at Scale

Read this white paper to understand how PwC applies its strong capabilities in working with clients to iden...