GitHub Repositories Leak Thousands of Secrets, Study Shows

April 11, 2019 Jessica Sirkin

In case you were ever in doubt about how well users are protecting credentials in GitHub repositories and other code repositories, researchers at North Carolina State University recently discovered many thousands of leaked secrets and credentials.

I spoke with CyberArk Lab’s Security Research Team Lead Lavi Lazarovitz (@__Curi05ity__) to get a better understanding of how serious this situation is, who it will affect and whether we’re looking at the beginning of a larger trend. But, first some background:

The North Carolina State University researchers Michael Meli, Matthew McNeice and Bradley Reaves scanned billions of GitHub files as part of an academic study that found that over 100,000 of the service’s code repositories contain exposed authentication secrets, such as cryptographic keys and API tokens, and thousands more repositories are leaking new, unique secrets every day. Researchers scanned nearly 13% of GitHub’s public repositories to collect this information.

The researchers used two approaches to identify leaked secrets. The first method was querying a GitHub repository search engine for a period of almost six months. According to the paper, this was a means of discovering in real-time 99% of newly committed files containing secrets. The second method used BiqQuery – a web service for analyzing massive datasets – to query a weekly snapshot of GitHub activity, which is what provided the researchers with their scan of 13% of Github’s public repositories.

In their scan, the researchers found 85,311 unique Google API keys, 37,781 unique RSA Private Keys and 47,814 unique Google OAuth IDs. The researchers also estimated that of the secrets enabled access to sensitive systems or data and their exposure caused real risk.

How serious a problem is this?

Lazarovitz: It clearly highlights a huge and very real problem. While the researchers didn’t release the names of specific organizations with exposed secrets, they did share that they were dealing with some large, prominent organizations. This included AWS credentials for a site used by millions of college applicants in the U.S. and AWS secrets for a major government agency in a Western European country.

The other significant finding from the study was that 81% of the secrets discovered took two weeks or longer to be removed. This means that, the developers using these sensitive secrets probably weren’t aware that their secrets were exposed or drastically underestimate the risk this poses. Furthermore, even developers who were aware of the exposed keys and the need to keep them safe might delete the key, but fail to wipe the entire GitHub repository. In this case, the keys could still be found in the GitHub repository’s commit history, which holds changes made to the GitHub repository – including deleted keys.

Why are secrets and credentials exposed in GitHub and leaked? What are the repercussions of GitHub repositories leaking secrets?

Lazarovitz: Developers write code and scripts which become part of the build or manage the build. The code and scripts typically need secrets and other credentials to do their work interacting with other applications, with other tool chains in the CI/CD pipeline, accessing cloud resources and so on. The code and scripts are, of course, version controlled and stored in code repositories. DevOps processes require speed and agility, which effectively makes GitHub repositories and other code repositories an essential part of the CI/CD pipeline. But, that’s not the problem; the problem is there is very little security or oversight for how these credentials are managed or protected. It’s too easy for developers to hardcode credentials and, once the code is marked public, the organization’s cloud access keys are now in the public domain. An easy mistake for a developer to make when their primary focus is getting the next new feature out to customers.

What do you recommend GitHub users do to keep their keys secure? What behaviors and technologies are necessary to use GitHub responsibly?

Lazarovitz: There are some easy fixes, and many organizations do seem to be proactive. First and foremost, I’d strongly recommend never embedding credentials in code. Hardcoding secrets and credentials in code is a terrible business practice and the easiest thing to mitigate. Hardcoded credentials are one of the most common ways for credentials to leak to a repo. Instead secure and manage credentials in a vault and use API calls or other mechanisms to securely use them. Ideally secrets would never be exposed in an application. Other important controls to limit the privileged attack surface include changing or managing credentials on a regular basis or after use or using short lived or temporary tokens that are valid for a limited time. There are many valid approaches to take control.

Is this an example of a larger problem or trend? Is this kind of thing something we should expect to see more of?

Lazarovitz: As more and more organizations turn to using external services, repositories and resources whether part of their digital transformation or as cloud native, we should, unfortunately, expect the credential leakage to grow. Basically, if credentials are hardcoded, the cost of a misconfigured code repository, build pipeline or other tool in the tool chain is exposed IP and credentials, so every organization that is migrating code or apps using a GitHub repository or other code repository, DevOps or similar processes adds to the probability that credentials will leak.

What can attackers who get their hands on these keys do with them? What are they most likely to do with them?

Lazarovitz: Public repos are available to the whole world and don’t require great skill to scrape – hacking tools are readily available for attackers to put to misuse. And they do – attackers regularly troll GitHub, for example, for cloud access keys and other easily monetized credentials. They can also look at the history of code commits to find exposed credentials. The first attackers to find unprotected credentials are mostly opportunistic attackers that use the cloud access keys to take over compute resources to run crypto miners for immediate profit. In other cases, the attackers might sell the data or encrypt the data and ask for ransom.

What are your key takeaways? Should organizations avoid code repositories?

Lazarovitz: Of course not, Github repositories and code repositories, whether private or public, are an essential part of application development process. Instead, organization’s need to establish processes and approaches to avoid hardcoding credentials in the first place or even storing credentials on GitHub. Basically, as the research highlights, organizations and individuals are, for whatever reason exposing, potentially valuable credentials to the public and to attackers. Please don’t, as many organizations have found that it does not end well.

To learn more about CyberArk Labs research, visit the Threat Research Blog.

Privileged Account Management Best Practices for Social Media Security

Social media plays a vital role in an enterprise’s marketing strategy, helping to build brand awareness, pr...

Google Cloud Identity and CyberArk: Supercharging BeyondCorp

Today’s workplace is transforming rapidly. With the rise of BYOD for business and cloud services for work, ...

Up Your Security I.Q. by Checking Out Our Collection of Curated Resources.

GitHub Repositories Leak Thousands of Secrets, Study Shows

Previous Article

Next Article

STAY IN TOUCH

GitHub Repositories Leak Thousands of Secrets, Study Shows

Previous Article

Next Article

Recommended for You

If your identity governance program feels like a relic from a simpler time, you’re not alone. Traditional identity governance and automation (IGA) was built for a world where job titles told the...

In today’s smart factories, production doesn’t go quiet at shift change. Behind the scenes, modern manufacturing systems never cease. They continuously exchange data, adjust software and processes...

Most organizations have gotten very good at protecting the front door. We invest heavily in single sign-on (SSO), mandate multi-factor authentication (MFA), and lock down who can log in, from...

When a production line stops, the clock starts ticking. In manufacturing environments I’ve worked in, every minute of downtime can translate into missed delivery commitments and revenue you’ll...

In my experience working inside banks, identity security can be like plumbing: when it’s working, no one wants to talk about it. When there’s an incident, an audit, or a regulator—suddenly...

For years, identity has been treated as a supporting function, authenticating users, gating access, and satisfying audit requirements. Important, but rarely foundational. That era is over. In...

As AI systems are used in our day-to-day operations, a central reality becomes unavoidable: AI doesn’t configure itself and must be set up with human approval and oversight. It requires engineers...

The enterprise has entered a new era – one defined by cloud, automation, and artificial intelligence operating at unprecedented speed and scale. As organizations transform, one truth has become...

Cryptographic failures have a knack for turning a quiet weekend into a chaotic, all-hands-on-deck emergency. Consider the SHA-1 to SHA-2 deprecation, sometimes referred to as “Shapocalypse,” which...

The viral surge of OpenClaw (formerly Clawdbot and Moltbot) has captured the tech world’s imagination, amassing over 160,000 GitHub stars and driving a hardware rush for Mac Minis to host these...

Over the past week, multiple research teams have documented a renewed wave of voice-led social engineering (vishing) targeting identity providers and federated access. The entry point is not...

Most organizations never planned for AI to start making real decisions. They started with simple helpers. An agent answered basic questions or generated small automations so teams could avoid...

In Pac-Man, ghosts seem pretty easy to dodge. You’re clearing the maze, racking up points, three more pellets away from leveling up. Then, out of nowhere, they close in and cut off all hope of...

Gone are the days when attackers had to break down doors. Now, they just log in with what look like legitimate credentials. This shift in tactics has been underway for a while, but the rapid...

For years, businesses have treated public key infrastructure (PKI) as background plumbing, quietly securing access across enterprise systems and devices, and rarely drawing executive attention...

If privilege has changed, compliance can’t stay static. As organizations accelerate digital transformation, the compliance landscape is shifting beneath their feet—especially when it comes to how...

For the past two years, AI agents have dominated boardroom conversations, product roadmaps, and investor decks. Companies made bold promises, tested early prototypes, and poured resources into...

In conversations about AI, there’s a tendency to treat the future like a horizon we’re walking toward, always somewhere ahead, always a question of when. But if we look closely, the forces...

In 2025, we saw attackers get bolder and smarter, using AI to amplify old tricks and invent new ones. The reality is, innovation cuts both ways. If you have tools, AI is going to make...

In my house, we consume a lot of AI research. We also watch a lot—probably too much—TV. Late in 2025, those worlds collided when the AI giant Anthropic was featured on “60 Minutes.” My husband...