May 15, 2025

EP 7- Resilience in Identity Management: Avoiding Single Points of Failure

In this episode of Security Matters, host David Puner sits down with Eric Olden, co-founder and CEO of Strata Identity, and a pioneer in modern identity management. Eric shares his career journey, from founding Simplified to leading Oracle’s global identity division, and discusses the critical importance of resilience in identity systems.

Discover how organizations can eliminate single points of failure, test their backup plans and ensure their digital operations remain robust even in the face of unexpected outages. Eric also delves into the concept of identity orchestration, explaining how it can unify multiple identity systems and enhance security.

Tune in to learn about the latest trends in identity management, including the intersection of AI and identity, and gain insights into how businesses can proactively assess and mitigate risks associated with identity outages.

Don’t miss this engaging conversation filled with practical advice and forward-thinking strategies to help safeguard your organization’s identity infrastructure.

David Puner:
You are listening to the Security Matters podcast. I’m David Puner, a senior editorial manager at CyberArk, the global leader in identity security.

You are running late for a meeting. You log into your company portal—nothing. You try again—still nothing. Your workplace communication platform’s down, your VPN stalls, no access to files. You assume it’s temporary, but 10 minutes later, you hear someone down the hall holler, “Everything’s offline.” Turns out your identity provider failed—and there’s no backup.

This isn’t theoretical. Our guest today recently worked with a major company that accidentally deleted 5,000 Active Directory groups. No backup, no rollback. Just panic. In another case, hurricane season knocked out cloud connectivity for retail stores. The stores still had power, but without identity access, operations ground to a halt.

Today, Eric Olden, co-founder and CEO of Strata Identity—and a pioneer of modern identity management—joins us to talk about resilience. Specifically: how to fail gracefully, how to eliminate single points of failure, and why it’s not enough to have a plan B—you need to test that plan B before disaster strikes.

You’re right on time for this one. Let’s get into it.

David Puner:
Eric Olden, CEO and founder at Strata Identity—thanks so much for coming on the podcast. Welcome.

Eric Olden:
Thanks for having me, David.

David Puner:
You’ve had an extensive career in identity management—from founding Securant and Simplified to leading Oracle’s global identity division—and lots of other interesting and notable experiences along the way. So to start things off, how did you come to be known as the father of modern identity management, and how did you come to co-found Strata Identity?

Eric Olden:
Most things start, in some way, as a happy accident. In my case, I was in college when the web had just come out in 1994 or 1995. I was looking around thinking, “Wow, this internet thing—there’s something here.” Early on, people were saying it was going to stay academic and wouldn’t turn commercial, but I didn’t buy that. Eventually, I figured the number one thing people would need was security—you need a way to have confidence and trust.

Early web applications didn’t have any of what we take for granted today, like sessions that recognize it’s you from one click to the next. So one of the first things I was trying to figure out from an engineering standpoint was how to create a dynamic website—and what that would look like if you could maintain a session. Once we figured out how to do that, we could create access control systems.

My best friend and I started a company in our twenties, and one thing led to another. We had really found something people needed. Next thing you know, we had a 300-person company. We were about to go public around 2000—then the dot-com crash happened. People still needed security, though, so instead of an IPO, we were acquired. That helped kick off the identity management market. At first, we didn’t even call it “identity”—we called it web access management. People didn’t really start using the term “identity management” until the early 2000s.

I was the CTO, and I’m proud of our work on the SAML standard—Security Assertion Markup Language. That was also sort of a happy accident. We had big bank customers that needed a way to recognize a user from New York, London, and Tokyo as the same person. Early tech couldn’t handle that, so we helped co-author a standard to make it possible.

Eric Olden:
That showed me that if you want to make a big difference in an industry, you look at how you can influence it—make things easier through standards. So, those are really the two main ways my career started: first on the technical side, then building products and contributing to standards.

After that first acquisition, I took some time to catch my breath. When my non-compete expired, I wanted to start another company. That became Simplified, because around 2005–2006, people were moving to this new thing called software-as-a-service. I figured if they were going to use SaaS apps, they’d need the same level of confidence and trust they had with on-prem systems—but built for the cloud, starting again with standards. That was the whole promise behind Simplified.

Coincidentally, Simplified was acquired by the same company—RSA—that had acquired my first company, Securant. After that, I found myself at Oracle, sitting on the other side of the table, so to speak, at one of the world’s biggest software companies. It gave me a new lens—when you’re trying to help hundreds of thousands of customers do things, your approach changes. I had a great experience at Oracle.

But as cloud adoption continued, I started thinking about multi-cloud—not just Oracle Cloud, but Amazon, Azure, on-prem, all at once. From the customer’s perspective, the question became: how do we make our apps work securely when they’re spread across all these environments?

That’s when I realized nobody had figured that out. It was a really hard problem—but incredibly rewarding from an engineering perspective if you could solve it. So we built a company around that challenge.

That’s how we started Strata—to build the VMware of identity. The idea is to make it easy to use whatever cloud you want, whatever identity system you need, and have it all just work.

David Puner:
Thank you for taking us along that career ride. I think if there’s anything we’ve established here at the top of the episode, it’s that you’ve got foresight. So I’m looking forward to seeing what’s in your crystal ball later in the episode.

So let’s turn to identity infrastructure and resilience. What are some of the critical dependencies in identity systems that organizations need to be aware of to ensure resilience?

Eric Olden:
One of the big things people often overlook is just how critical identity is to digital operations. If you think about it as a Tier 0 service, identity is right up there with electricity and bandwidth.

For decades, people have had backup strategies for the data layer—how to ensure a transaction doesn’t get corrupted, how to implement redundancy. But they haven’t applied that thinking to the identity layer.

That’s partly because most organizations are all-in on one identity provider. But that’s always felt untenable to me. As someone who’s built a lot of distributed systems, I know you have to avoid single points of failure. If you design with redundancy from the beginning, you’ve got a real shot at achieving resilience.

That’s why we built a product to make it easy to fail over from one identity provider to another—without compromising security or trust in the application.

Eric Olden:
A lot of times, people think about single points of failure only after something breaks. Take the CrowdStrike outage from last year—it wasn’t directly an identity issue, but it exposed the danger of having just one way to operate. When that system failed, suddenly businesses were offline and had no idea when they’d come back. Airlines couldn’t board passengers. Movie theaters couldn’t sell tickets.

That really made a lot of C-level executives ask, “Where are we exposed?” So in that mindset, you start tracing through your operations and identifying those single points of failure.

Step one is identifying them. Step two is recognizing that not all risks are equal—you have to triage. Organizations typically operate under constrained budgets. They can’t solve everything at once, so they have to decide what to protect first. Risk modeling becomes essential.

And finally, you need to avoid a false sense of confidence. By that I mean: having a plan B doesn’t help if you haven’t tested it. I’ve talked to CISOs and CIOs who said they had a backup plan—but never ran it because simulating an outage would’ve been too disruptive. Then, when the real outage hit, they realized it would’ve been cheaper to test it than to clean up the mess afterward.

To fix that, test your assumptions. Document your backup plan. Validate it. If you’ve already seen what happens in a failure scenario, you’ll be far better prepared from a risk perspective. Much better than saying, “Well, we paid a vendor, and they gave us an SLA.” SLAs aren’t a safety net. Maybe you’ll get a refund on your compute costs, but what if the outage costs you $12 million in lost ticket sales?

That’s why you want to get ahead of it.

David Puner:
What are some other industry-specific examples of identity infrastructure outages that have impacted businesses? What are some common cases you see—or just notable examples others can learn from?

Eric Olden:
There are a few common ways outages happen. First, everything’s connected today, and a lot of times the problem is downstream—caused by a vendor or someone in your supply chain.

Human error is another big one. We recently worked with a company that accidentally deleted 5,000 Active Directory groups—and didn’t have a backup. That’s a stressful realization. You’re thinking, “If I could just go back 30 seconds and undo that click…”

These errors aren’t usually malicious—they’re accidents. But they’re still costly.

Weather is another factor. We’ve got customers in retail, especially in the Southeast U.S., who deal with hurricane season. When that happens, you really see how connected everything is. In those cases, it’s often not the cloud provider that goes down—the data centers are secure and hardened. It’s the connectivity that fails. The route to the cloud breaks.

So when that happens, it’s important to have a fallback—to be able to fail over on-premises so your store can keep running. Whether that means selling cleanup supplies or just staying operational, that resilience really matters.

David Puner:
Obviously these organizations learn the hard way when these things happen. But what about the ones trying to be proactive? How can they approach assessing the financial, reputational, and regulatory risk associated with identity outages?

Eric Olden:
The best-case scenario is you plan with the assumption that an outage will happen. You’re not planning to fail—you’re planning for failure. It’s a mindset, kind of like the “assume breach” concept in zero trust. In this case, it’s “assume outage.” So: what do you do when it happens? How do things continue to work?

That kind of resilient mindset is key. Document the plan. Run tabletop exercises so it’s not just one person who knows what to do—get different teams involved. It’s like red team exercises, but instead of looking for vulnerabilities, you’re testing resilience.

You also want to structure your software so it has no single point of failure. Make sure your critical systems can be monitored, and if something goes wrong, you can remediate quickly. That might mean a person stepping in and saying, “System A is down—we’re switching to system B.” Or it could be automated, depending on the circumstances.

Sometimes you want to wait and see if something recovers on its own. Other times, you need to act instantly. It depends.

And there’s the reputational aspect. People think their brand is strong and users will be forgiving—but that’s not always true. If you’re a bank and customers can’t access their money, that’s a crisis. People get upset fast because it triggers this basic human emotion: “I need this thing—and I can’t get to it.”

That’s why making that stitch in time isn’t just about saving nine—it could save you nine million. So take it seriously, and get ahead of it.

David Puner:
You mentioned having a plan. You also mentioned failover. What are failover mechanisms, and how do they work in identity orchestration? And what exactly is identity orchestration?

Eric Olden:
Identity orchestration is a new way of managing multiple identity systems. It’s built on a concept called an identity fabric.

Let’s start there. Your identity fabric is all the different identity services you use—authentication mechanisms, access controls, risk signals, all of it. Historically, those were all siloed—different systems in different places.

Identity orchestration lets you unify them under a common abstraction layer. Think of it as a control plane that integrates those systems and allows you to orchestrate how users move through them in real time.

So for example, a user could authenticate with Microsoft, check access with Oracle, and then use passwordless login via HYPR—all in one flow. The orchestration system ties all of that together.

Eric Olden:
To the end user, it looks seamless. Let’s say you’re opening a new bank account. You want it to be easy, but the bank needs to verify that you’re a real person—not a bot—and that you’re allowed to open an account based on location or other criteria.

With orchestration, you can create a multi-step onboarding journey. Step one: collect basic user info—name, address, etc.—and save it in an identity system. Step two: use a tool like OneCosmos to verify the user’s human—using bot detection, CAPTCHA, or even liveness validation in this AI era. Step three: if those pass, the system creates an account and issues credentials—maybe even passwordless ones.

That whole process spans multiple systems. Orchestration stitches them together and manages the flow.

Now, in terms of resilience and failover—this orchestration system can monitor identity providers in real time. Before sending traffic to one, it checks if it’s available. If it’s not, it routes the user to a backup provider. That’s the failover part.

The orchestrator acts as a smart proxy, dynamically sending traffic to the best available identity provider. And that’s key—because this logic isn’t built into the identity provider itself. If your main provider goes down, it’s not going to reroute anything. The orchestrator can.

So that’s how orchestration supports both the identity fabric concept and resilience across your identity systems.

David Puner:
And not to be too much of a homer here, but identity orchestration can be part of identity security, correct?

Eric Olden:
Absolutely. Most of our customers—including some of the world’s biggest banks and retailers—use orchestration to power zero trust, continuous access, and continuous authentication.

Those architectures are usually made up of tools from different vendors. Orchestration is what helps them work together—so yes, it’s a critical piece of identity security, especially when you’re managing multiple identity providers.

David Puner:
So then getting into the human factor of all this—how important is it for organizations to regularly simulate identity outages, and what are some best practices for conducting these simulations?

Eric Olden:
There are a couple of ways to approach it, and it really depends on your level of risk and what downtime would cost your business.

For very high-risk, high-cost applications, you can set up the orchestrator to do load balancing between two identity providers. So if one goes down, the other one is already handling half the traffic—and you can immediately route everything over to it. That’s called active-active.

For most Tier 2 apps—business-critical but not mission-critical—you might go with active-passive. You have one primary provider, and if something happens, you fail over to the secondary. These are the apps where being offline for a few seconds or even a few minutes won’t sink your business, but still matters.

And of course, your failover system itself shouldn’t have any single points of failure. You can run your orchestration software on a Kubernetes cluster behind a global load balancer, which gives you redundancy across the orchestrator proxies too.

Testing is critical. If you’re doing active-active, you’re basically testing constantly because traffic’s always flowing to both systems. With active-passive, test it at least once a quarter—once a month is better. Best case? Have the ability to run ad hoc tests. Simulate a “break glass” scenario on a Tuesday before a big system upgrade on Friday. Make it a normal part of your ops.

Because let’s be honest: not testing is like skipping rental car insurance thinking, “What are the odds I’ll need it?” And then boom—someone backs into you. For 20 bucks, you could’ve avoided a $2,000 problem. Same idea here: invest in the test.

David Puner:
Are you always getting that extra coverage when you rent a car?

Eric Olden:
I do. Maybe I go overboard, but yeah. I use a credit card that offers coverage by default—and then I buy the extra rider on top of that. Knock on wood, I haven’t been in an accident. And I’m not trying to tempt fate.

David Puner:
During an identity outage, how can dynamic policies help prioritize critical roles and automate enforcement to maintain operations?

Eric Olden:
A big part of managing availability is understanding what an outage looks like.

The source of the outage could vary—it might be a storm, a cut fiber line, or a malicious actor. So the first step is understanding what counts as an outage. For example, if your identity provider returns a 500 HTTP error, that’s a clear sign the server is down.

Then you’ve got to decide: do we fail over, or wait it out? That’s where dynamic policies come into play. Let’s say you have a shipping application, and it can tolerate a five-minute outage. You may configure a rule: “Wait five minutes, then take action at six.” If it’s still down by then, odds are it won’t resolve quickly—so the system fires off an alert or initiates failover.

Sometimes it’s better to wait—maybe the server’s just re-indexing a database. But other times, you have to move fast. Let’s say it’s a trading application and your system slows down—not even fully down yet, just slow. That latency can have huge downstream consequences. In that case, you might want to fail over before the full outage hits.

So it’s not always black-and-white. You need nuance in your definitions of what an outage is—and different protocols for different scenarios.

And whatever you do: log everything. Audit everything. After an incident, you’re going to want to dig into the logs to see what went wrong. If those logs aren’t detailed or complete, it’s incredibly frustrating. Without knowing what caused the outage, it’s much harder to prevent the next one.

David Puner:
So, speaking of nuances—or lack thereof—let’s talk regulatory standards. What strategies can organizations adopt to keep their identity continuity plans current and aligned with evolving regulatory standards?

Eric Olden:
Regulatory issues are a huge part of our world. Most of our customers are large, multinational corporations, which means they operate across multiple jurisdictions. A company based in the U.S. might have operations in Europe, for example—so when you think about regulations, you have to consider where your users are, not just where you are.

In Europe, they’ve led the way with identity-related regulation—most notably GDPR. American companies realized that even if they’re based here, having French users, for instance, means you’re handling their data—and it needs to be protected under European rules. Non-compliance can lead to fines, and nobody wants that.

More recently, there’s a regulation out of the EU called DORA—the Digital Operational Resilience Act. It focuses specifically on ensuring resiliency and minimizing disruption. It’s essentially asking: do you have a plan? What are you doing to avoid outages? And if they happen, how do you respond?

While DORA originates in the EU, it’s having a ripple effect. American companies are paying attention because it’s not just about legal risk—it’s a good idea. Even if the regulations aren’t enforced domestically, the principles of resilience are worth adopting everywhere.

David Puner:
You mentioned earlier that you’re known as the father of modern identity management. What we haven’t mentioned yet is that you also authored Identity Orchestration for Dummies, published in 2024. As a pioneer in the field, how does identity orchestration differ from traditional IAM approaches, and what benefits does it offer?

Eric Olden:
I can’t really claim full credit for being “the father of identity”—success has many parents, and failure’s an orphan, right? But I do take pride in being part of the early work, like helping define SAML and now this push into orchestration.

The key difference with identity orchestration is that it’s an overarching layer. That’s actually why we named our company Strata—“strata” means layers. Just like the stratosphere contains all the clouds, Strata’s orchestration layer helps manage what happens inside all your clouds.

Orchestration and identity fabrics give you a way to make everything work together—on-prem systems, cloud platforms, legacy tools, and modern solutions. It’s like virtualization for identity, and to me, it feels like everything’s coming full circle.

David Puner:
Really interesting. So, to wrap things up on a broader note: what emerging trends in identity management are you most excited about? How do you see them shaping the future of IAM?

Eric Olden:
I’m really fascinated by the intersection of AI and identity.

David Puner:
Can’t believe it took this long in the interview to mention AI. That’s probably a record for us.

Eric Olden:
Right? The hype’s been off the charts. But this one’s different—AI is evolving fast, and it’s changing things in real ways. Originally it was machine learning, and now it’s generative AI. The Turing test—Alan Turing’s idea for distinguishing humans from machines—is becoming harder to apply.

Humans are identities. And protecting against threats—like someone pretending to be a human when they’re not—is becoming more complex. We’ve got phishing, deepfakes, and AI-driven identity impersonation. Passwords just won’t cut it anymore. AI can brute-force passwords or replay credentials.

One of the wildest recent examples? A deepfake on Zoom that mimicked a CFO. The attacker convinced someone to wire $25 million, thinking they were taking instructions from a real person. It worked because the fake looked and sounded real.

So now we’re at the point where we need to detect whether something is a human or an AI—not just in login screens, but across systems. And we also have to secure our APIs and data when AI systems like ChatGPT or LLMs are accessing them.

This is just the beginning. The landscape is shifting fast, and identity management is right in the middle of it.

David Puner:
And teaser alert for our listeners—by the time this episode drops, our new Identity Security Threat Landscape Report will be live. And spoiler: machine identities now outnumber human identities by more than 80 to 1. So yeah, there are a lot of identities out there.

Eric Olden:
That’s for sure.

David Puner:
Eric Olden—a man with many plans—thank you for coming on Security Matters. Really enjoyed speaking with you.

Eric Olden:
Thanks for having me, David. It’s been great.

David Puner:
All right—there you have it. Thanks for listening to Security Matters. If you liked this episode, follow us wherever you get your podcasts so you can catch new episodes when they drop. And if you feel so inclined, please leave us a review. We’d appreciate it—and so will the algorithmic winds.

Got comments or questions? Want to suggest a guest? Drop us a line at [email protected].

We’ll see you next time.