June 7, 2023

EP 29 – Synthetic Identity: Unmasking a New AI-Fueled Cyber Threat

Scattered across the internet are jigsaw puzzle pieces containing your personal information. If reassembled by an attacker, these puzzle pieces could easily compromise your identity. Our returning guest today is Len Noe, CyberArk’s resident transhuman (a.k.a. cyborg), whose official titles these days are Technical Evangelist, White Hat Hacker and Biohacker. Noe joins host David Puner to shed light on the concept of synthetic identity, which involves gathering publicly available, unprotected data and then using AI chatbots and platforms like ChatGPT along with predictive analytics to correlate the data and generate deep digital portraits of individuals. Then, thinking like an attacker, Noe dives into how this new digital clairvoyance has the potential to up threat actors’ games and what organizations and individuals should be doing to combat it. Noe also shares his POV on the implications for cybersecurity and his concerns about sharing personal and proprietary information with AI chatbots and platforms. 

[00:00:00.300] – David Puner
You’re listening to the Trust Issues podcast. I’m David Puner, a senior editorial manager at CyberArk, a global leader in Identity Security.

[00:00:24.110] – David Puner
What’s the name of the street you grew up on? What’s your mother’s maiden name? Are password recovery prompts like these the keys to the human experience? How about these classic fun ones? What’s your zip code? Just opt-in for text and you’ll get 15% off your first order. Like this.

[00:00:46.460] – David Puner
We have these simple, seemingly benign exchanges with all sorts of brands and other online entities all day, every day. But what are we giving up in these exchanges, both today and in the long run? It’s all about data. Your data, your digital identity. And when you pair all that data with your physical identity, when your physical and online identities get put into a blender, an AI chatbot blender to go with the latest greatest in mixers, you wind up with a synthetic identity puree.

[00:01:21.340] – David Puner
But what’s this mean for you and your organization? That brings us to today’s guest, Len Noe, who’s CyberArk’s resident technical evangelist, white hat hacker, and biohacker. The biohacker part of that title is a nod to Len technically being transhuman or a cyborg. And you can hear a lot about that if you go back and check out our conversation from last year in Trust Issues episode number two. That conversation, along with this one, is just some of the publicly available content that gives us, or anyone, data that somewhat defines Len Noe.

[00:01:59.140] – David Puner
I’m mentioning it because on today’s episode, Len’s going to explain synthetic identity as he sees it, a guy who gets paid to think like an attacker. Here’s my conversation with the always-engaging Len Noe. Welcome back to the podcast, Len.

[00:02:17.120] – Len Noe
Good to be here.

[00:02:17.960] – David Puner
Great to have you back. You’re CyberArk’s technical evangelist, white hat hacker, and biohacker.

[00:02:24.450] – Len Noe
I am a white hat, a technical evangelist, and a biohacker. Essentially, that means that I am an augmented human being that has utilized technology in the form of subdermal implants to actually produce offensive attack vectors that can actually be spawned through physical contact.

[00:02:42.770] – David Puner
Today, we’re going to talk about synthetic identity. What is synthetic identity?

[00:02:47.930] – Len Noe
It’s a term that’s being thrown around quite a bit in different ways. Google it, you might come up with an explanation that it’s where somebody’s going to use stolen PII and deep fake to try and create a way to counterfeit someone’s identity to either fish information or gain access through impersonating somebody else.

[00:03:10.140] – Len Noe
My definition is what would happen if we took a look at all of the different little bits of information that we leave on the internet through standard use? What if I was able to actually collect all of that information, almost like doxing someone, then correlate that information through some type of AI chat model?

[00:03:30.780] – Len Noe
I use a lot of OSINT, and if you’re not familiar with that term, it stands for open-source information technology. This is all public information, things that you either put out there on the internet, anything from places you’ve lived, vehicle registrations, school attendance, utility bills, all of that stuff is publicly available.

[00:03:53.370] – Len Noe
But the problem is all those little bits and pieces of information are spread out across multiple different databases, so it’s very hard to make correlating points between them. But if you take something like ChatGPT, take all of that information and put it into something that’s basically its main intended purpose is to be able to find those types of correlations, I was actually able to interact with a digital version of myself that had understanding as well as information about my physical existence as well.

[00:04:24.290] – David Puner
Interact with a digital version of yourself.

[00:04:27.000] – Len Noe

[00:04:27.810] – David Puner
It sounds futuristic, but I guess that’s what we’re living here with ChatGPT and its brethren platforms. What does that look like when you’re talking to yourself and how did you do that?

[00:04:38.330] – Len Noe
Basically what I did is I used the tool called BeenVerified, and I’m not endorsing anything. It was just the first one that came up in my Google search. These are paid services. All you really need is someone’s name. Any additional information beyond that just makes it easier to refine the results. But I put in Leonard Noe, and I live in Texas.

[00:05:00.030] – Len Noe
From just that, I was able to get things like my mother’s name, my father’s name, my brother’s name. It found the first apartment that I ever moved into after I left my parents’ house. I’m pushing 50 years old. It had vehicles that were registered to me. It had my neighbors. It had my ex-in-laws from previous marriages.

[00:05:22.410] – Len Noe
The scary thing about these services, these people search services, is if you look at the header page that comes out on any report, the first thing it says is this can’t be used for employment purposes, credit checks, business transactions, hiring of household workers, educational qualifications. If you remove everything that you can’t use this type of information for, the only thing that’s left is the fact that we are the product.

[00:05:53.870] – Len Noe
It’s our information that’s for sale. Every single service, everything we buy, everything we do, wants more and more information about us. It may not necessarily be our name or our social security number, but everything wants more information. The purposes are typically for targeted marketing and advertising. At the end of the day, it’s the human experience that’s actually for sale. This is information that most people would never allow someone else to have, but yet we agreed to let people have it as part of EULAs.

[00:06:28.290] – David Puner
When we forget these usernames and passwords, which, of course, is all the time, how is it that we’re seemingly asked the same recovery questions for every self-service portal? How safe is that whole equation? How does it factor into the challenges around synthetic identity?

[00:06:47.740] – Len Noe
It’s a great question. Every single thing we do is all stemmed from identity. We can talk about Identity Security. We can talk about the business definitions, but at the end of the of the day, the truth is the lines between business life and personal life really don’t exist anymore. We see BYOD, how many people are accepting emails and doing business functions on their personal mobile devices? Even if it was an actual business resource.

[00:07:19.370] – Len Noe
Let’s say it’s a laptop. What’s the first thing that’s going to happen when that laptop walks out of our environment? It’s going to be plugged into a home network, or it’s going to be logged into a home WiFi. The truth is, once these things leave our environment, they’re out in the wild and it’s up to us as well as the people that we give those resources to keep security in the front of their minds, and be vigilant and realize that everything out there is out to get in and it’s our job to keep them out.

[00:07:48.730] – David Puner
Aside from vigilance, we will get into what organizations and individuals can do. But I wanted to get back to what you were talking about, about interacting with your own… Was it synthetic identity?

[00:08:01.860] – Len Noe

[00:08:01.860] – David Puner
What was that? What did it look like?

[00:08:04.190] – Len Noe
It started off pretty simple. First step was using the paid service, like I said, BeenVerified. I got that report. Then I used three different open-source tools. One of them was called Blackbird, which is a username scan, Spiderfoot which is a Swiss army knife for searches on the web, and then a third tool called IKY which is short for I Know You. That one basically just keys off of an email.

[00:08:29.060] – Len Noe
I ran those searches, found websites with my user accounts on it. Did a search for my name on the internet. I’m a public speaker. I took all of that information, transcribed everything, and I shoved it into ChatGPT. I think the first question I asked it was, “Tell me about Len Noe.” This is after I’ve already populated the data set with the information I was able to get from the doxing and the open-source tools.

[00:08:56.060] – Len Noe
I’ll be honest with you, the first thing it spit out looked like it was just a bio that you would see for a talk I would be giving. I wasn’t really impressed. I said, “Tell me more about Len Noe.” That’s when the whole thing just started to open up. It started making connections between interviews I’ve given and presentations that I’ve done. It was able to actually start telling me about me from a perspective that I didn’t give.

[00:09:27.130] – Len Noe
I said, “All right, tell me about Len’s family.” It spit out my mom, my dad, my brothers, my kids. It even was able to pull from one of my interviews that I gave that my grandkids like to call me Robo Papa. All of this is out there, but as humans, we may not have the capability of looking at an interview I did a month ago and find a connection to an interview I did two years ago.

[00:09:54.970] – Len Noe
I went even deeper. Where did I go to school? Where did I live? I was actually able to fish pretty much every single answer to those self-service recovery portals that you were just talking about.

[00:10:06.310] – David Puner
Wow. The implications for this, first and foremost, are that the answers to those questions are out there. Having ChatGPT give you the readout, essentially, is something that anyone can do.

[00:10:19.970] – Len Noe
Oh, yeah. This is only one way that you could look at this collection of data. I may be able to fish out self-service recovery questions, but if I have that deep of an understanding of a person’s identity, the ability to try and do a spear phish, or even utilizing some deep fake, I would be able to potentially impersonate the target.

[00:10:45.030] – Len Noe
Without even trying to get to their credentials, I might be able to do more of a social engineering-style attack. Based on the data set, I would be able to answer questions that typically somebody outside of that individual may not be able to answer.

[00:11:00.090] – David Puner
What are the implications from a security offensive and defensive standpoint?

[00:11:05.100] – Len Noe
Well, from an offensive standpoint, I think this really does open up the story about identity, much more than what we see from an actual business definition. When we talk about identity, one of the things that comes up is the concept of zero trust. Trust but verify. What’s the authentication or the validator that you are who you say you are at most pharmacies? What’s your date of birth?

[00:11:32.920] – Len Noe
Let’s be honest, that is one of the most easy to find out things there is. But yet we’re still allowing this as a validation method to provide potentially dangerous narcotics. From an offensive standpoint, I think that this actually exponentially increases the ability for spear phishing.

[00:11:58.790] – Len Noe
These identity services that you can pay for to just buy people’s information just so that you can have it. I don’t know if that needs to be something that’s regulated or if nothing else, people need to at least be aware of the information that is actually marketable and that people can buy on them.

[00:12:17.440] – David Puner
If this opens up a lot of opportunity for threat actors, they’re now armed with these new AI and machine learning capabilities, how can they be stopped or thwarted?

[00:12:31.110] – Len Noe
Let’s look at this from a multifaceted answer. Finding out someone’s birthday or their mother’s name or the street they grew up on, these typical standard question-answers for self-service recovery portals. If this is information that we know is available to be gleaned from public and open sources, don’t give the answer that they expect it to be.

[00:12:55.620] – Len Noe
For starters, I would definitely recommend getting away from direct login. I think single sign-on with adaptive multifactor authentication on the front end is probably going to be the best solution from an enterprise perspective. If we’re talking about consumer or individualism, I would say don’t use your mother’s maiden name. Maybe you spell it backward. Maybe you use a nickname for your mother.

[00:13:22.840] – Len Noe
If the information is out there and you know that it’s out there, don’t use it. Change it. Obfuscate it. We’re back to the same thing that we’ve talked about, David, you and I, multiple times. How do I validate an individual? How do I know that this is the person I’m talking to?

[00:13:41.100] – David Puner
We’ve been talking about this synthetic identity and the challenges around it quite a bit from an individual level. But you did mention earlier how our home life and work life identities factor into all this. What, from an organizational or enterprise level, should we be thinking about?

[00:14:00.250] – Len Noe
For starters, adaptive multifactor. The best option we have is a true adaptive multifactor to validate that I am the person that I say I am. Multifactor does not mean two factor. Two is a minimum. Username, password, fingerprinted device, geographic location, times of operation. From an enterprise perspective, we need to start looking at is exactly what you had just said, and that’s the crossover between personal and business.

[00:14:30.800] – Len Noe
If I’m not able to get to you from a business perspective, and I know that you’re doing business functions on your personal device, why would I go after your corporate asset that I know has multiple layers of controls on it when I can go after your personal device, which might be less hardened and just ride the applications in the private tunnels back into the enterprise? Personal life and enterprise life, that demarcation is no longer there.

[00:14:58.440] – Len Noe
What happens in our physical life can affect our digital experience, and what can happen in our digital experience can have direct effects on our physical life. We’ve actually seen a synthetic identity be monetized recently. There was actually a YouTube influencer that recently worked with an AI company to create a virtual girlfriend based off of all of her content that she had made through her YouTube videos.

[00:15:24.520] – Len Noe
We can even look a little bit further back to last year, where a Russian advertising company called Deepcake had utilized Bruce Willis’ likeness from the movie Die Hard to sell telecoms. We’re seeing more of this virtual or synthetic identity coming out. I think this is actually going to lead to new legislation around legalities, liabilities. Basically, who would be liable if a synthetic version of someone broke the law, said something offensive? We’re seeing it in real life now. Where it goes from here is completely up in the air.

[00:16:04.500] – David Puner
One of the things that I was thinking about as we were talking about this is now that you’ve fed all that information about yourself in the ChatGPT, what becomes of that? The follow-up question to that is there are people inside corporations, organizations of all sorts right now typing things into ChatGPT on company devices. What are the implications there?

[00:16:27.830] – Len Noe
Oh, my God. Huge. From a business perspective, huge. There have been multiple very large tech companies who, through utilizing ChatGPT to try and either reformat a document or to validate code, they have actually let loose proprietary information out into the wild. Once it’s on ChatGPT, it’s pretty much fair game and it’s open source at that point.

[00:16:55.820] – Len Noe
Do I have any personal concerns? Not really. Because like I said, everything that I put in there, somebody else could go and do the exact same scan. They could run the exact same people searches and they would get the exact same information. If they put it into ChatGPT, they would get the exact same answers.

[00:17:15.760] – Len Noe
That, to me, is the bigger thing that I’m trying to bring to light. We all look at our activities online, especially on personal devices, we act like there’s nothing to be afraid of. God help me if I’m talking about social media and all these little quizzes and, “Oh, get your rockstar name.” It only takes the street you grew up on and your mother’s maiden name.

[00:17:39.300] – David Puner
Are you saying there’s an ulterior motive for those fun quizzes that we get thrown all the time?

[00:17:45.280] – Len Noe
We could talk about Cambridge Analytica if we had another three hours, but yeah, that’s exactly what I’m saying. A lot of these little quizzes, they’re either fishing you for data or they’re based in psychology to try and get a better understanding, to turn around and market to you. As I said in the beginning, the human experience is what they’re going after. They’re trying to be able to sell to you as an individual, based on your behavioral patterns. The more information that we give them, the better they are at their job.

[00:18:16.230] – David Puner
It was interesting to me that after looking at your presentation that had the video where you did the search in ChatGPT, I asked it the same question, “Who is Len Noe?” It basically told me, I don’t know.

[00:18:30.270] – Len Noe
Because that was actually only in my individual chat. Now, that’s not to say that OpenAI does not have access to the data that I gave it. It just hasn’t been released to the rest of the population that are utilizing the tool yet.

[00:18:44.350] – David Puner
All of this data that it is receiving right now is not necessarily then accessible to everyone right away. We don’t know what they’re doing with it, but at some point, something will be done with it.

[00:18:58.160] – Len Noe
They’re doing something with everything that’s being put in. The fact that OpenAI is a private company, they don’t necessarily have to disclose what they’re doing with that data in the backend. But if you take a look at the EULA that you have to agree to in order to even get access to the tool, it says everything that goes into this becomes accessible to OpenAI.

[00:19:20.800] – David Puner
Interesting. I don’t remember signing that.

[00:19:23.150] – Len Noe
That’s why they say, RTFM, read that manual. That’s the point. We don’t really look at what we’re giving up in order to utilize things. It’s all this PII that we’re giving away is what I’m actually going back and scraping. When was the last time you actually looked at an app that you installed on your mobile device and actually looked at the permissions that it was asking for.

[00:19:46.810] – Len Noe
Everybody out there, they want what they want, they want it when they want it. It’s those of us like me and CyberArk and other security organizations, we’re the ones that go back and go, “Hey, wait a minute. This is what you’re giving them access to.” I think a lot of this comes down to behavioral modification at an individual level.

[00:20:06.060] – Len Noe
Think about it. If you’re at the hardware store and you’re buying a hammer, why do they need to know where you live? The way that a lot of this is marketed, it’s not coming out and telling you, “Hey, just because you want to use this, I’m going to know every single contact in your contact list, and I’m going to be able to see all of the emails that come in.” If that was in big, bold letters, nobody would sign up for these apps.

[00:20:29.360] – Len Noe
Remember, nobody works for free. All these free apps and these free utilities, they’re getting paid somehow. If you can’t figure out what they’re selling, the product is probably you. We see ChatGPT. We had an end date of what the original data set. I think it was 2021 it didn’t know anything past. Now we’re seeing ChatGPT 4. I think once these AI chat models get the ability to spider, we don’t need search engines anymore.

[00:20:59.450] – David Puner
You are taking this synthetic identity show on the road this summer?

[00:21:04.160] – Len Noe

[00:21:04.890] – David Puner
And going to lots of places with it. Where are you going to be?

[00:21:07.660] – Len Noe
We’re actually going to be debuting this work at Identiverse in Las Vegas. After that, I’ve got some events in Johannesburg, South Africa, Cape Town, Bahrain, Qatar, Abu Dhabi, Dubai. We really hope that everybody gets a chance to come out and see this because this will affect enterprises as well as Joe average sitting down the block.

[00:21:35.570] – David Puner
If there’s one thing that our audience should take away from this conversation about synthetic identity, what would it be?

[00:21:44.940] – Len Noe
We are ultimately responsible for our own safety and our own security at the end of the day. I think when we look at what we leave out there as far as breadcrumbs, we need to be conscious that the internet is not this friendly utopian society. You wouldn’t walk down a dark alley with your wallet out. Why would you do the same thing out on the net?

[00:22:13.570] – Len Noe
The other thing that I would say would be the key take away is if I can find all of these answers to your recovery portal questions, change them. Change them so you are the only one that knows what these answers are.

[00:22:28.000] – David Puner
You sent me a preview of a synthetic identity presentation you’re going to be giving on your road show. One of the things that I thought was really interesting in there had to do with counterfeit passports. Could you maybe explain to the audience a little bit about what was going on there?

[00:22:43.690] – Len Noe
Absolutely. In the presentation, I show how much it would cost to get a passport here in North America. It’s about $2,500 US. Then I showed the same type of document for a country in Europe. The reason I even showed that was the European Darknet site, actually in their ordering instructions, it says, “After you’ve completed your purchase, please send us an email with your age, your race, and your gender so that we can actually find a matching data set.”

[00:23:17.410] – Len Noe
The reason I wanted to highlight that is where do you think they were actually coming up with those data sets to match to? These would be actually some of those stolen identities that we’ve been talking about through the majority of this podcast. But you can’t just have a passport that ties to nothing.

[00:23:33.590] – Len Noe
I actually have to compromise an individual and then use his information as the backstory for a legitimate passport. The reason they’re asking you is because they have to find somebody that matches you close enough that you shouldn’t be discovered when you actually attempt to utilize the fake document.

[00:23:53.260] – David Puner
Len, wild stuff as always. Really appreciate you coming back on and we look forward to catching up with you again real soon.

[00:24:01.560] – Len Noe
Thank you for the opportunity. I’ll keep doing my crazy research and maybe we’ll talk again soon.

[00:24:15.980] – David Puner
Thanks for listening to Trust Issues. If you like this episode, please check out our back catalog for more conversations with cyber defenders and protectors. Don’t miss new episodes. Make sure you’re following us wherever you get your podcasts. Let’s see. Oh, yeah. Drop us a line if you feel so inclined. Questions, comments, suggestions, which come to think of it are like comments. Our email address is [email protected]. See you next time.