White FAANG: Devouring Your Personal Data

December 3, 2024 Lior Yakim

Generated using Ideogram

Abstract

Privacy is a core aspect of our lives. We have the fundamental right to control our personal data, physically or virtually. However, as we use products from external vendors, particularly the FAANG companies (Facebook, Amazon, Apple, Netflix, Google), our digital footprint is continuously being expanded. Fortunately, FAANG provides a service that enables us to export our data to a local drive in just seconds. From now on, we will refer to it as the “export service.”

While ease of use benefits customer experience, there are two sides to every coin. That often comes with a cost.

Attackers can use the export service to obtain extensive information about you, including PII (Personally Identifiable Information). To achieve their goal, they have a variety of initial entry points to choose from.

Some notable examples include cookie stealing, planting Malware on your device or even gaining physical access to your data. Each of these operations could enable the threat actors’ access to your information. We’re going to show you how this can permanently ruin someone’s life.

This article will explore the hidden threats within your sensitive data and outline the mitigation steps you and your company can take to enhance your security posture.

Introduction

In Jack Londonʼs wonderful tale White Fang, he describes the story of a lone wolf navigating a wilderness teeming with predators. Many characters in the story exploit Fang’s survival skills, much as the way threat actors can exploit the personal data collected by FAANG. While Fang does not necessarily intend to harm anyone, we all know that with great power comes great responsibility, and FAANG׳s excessive data harvesting can quickly become a predator׳s tool.

One of FAANG׳s loyal customers is Joe. He is just an average guy who works for a large tech company and is generally security aware. We have analyzed his exported information using the export services of Apple, Google and Meta.

Based on our analysis, we will explore adversarial scenarios where Joe׳s personal data could be exploited by each of the companies mentioned above. For each scenario, we will present the associated risks through pseudocode as a proof of concept. These POCs will utilize various input files exported from the respective export services while ensuring Joe׳s anonymity is preserved by using de-identification.

We primarily focus on the following adversarial tactics: defense evasion , lateral movement and reconnaissance. Reconnaissance is a challenging tactic for defenders, as there’s often very little they can do about it in terms of prevention.

Our starting point of the story is a compromised personal account (AKA an unprivileged user in Apple, Google or Meta) as we demonstrate how attackers can collect and exploit our personal data.

The full code examples, including the input file paths from the export that direct to the sensitive information, can be found in the following repository: WhiteFAANG.

Predators in the Wild

Using personal accounts in corporate environments is much more common than you might think.
Many seasoned tech industry professionals and non-tech users utilize their personal Google accounts for various tasks. Some examples include file sharing and writing documents, effectively creating a “creativeˮ bypass of email filtering and DLPs (Data Loss Prevention systems). According to a CyberArk survey conducted with more than 14,000 participants, approximately 63% of employees reported using personal accounts on their work laptops, with Google being the most used platform.

Predators in the Wild

Not only do these employees put their entire organization at risk of possible exfiltration attempts, but they also expose it to accidental password synchronization issues.

One company that suffered from this kind of behavior is Cisco. In 2022, a threat actor compromised a Cisco employee’s “personalˮ Google account. Unfortunately, the employee entered sensitive corporate credentials while connected to his Google account, enabling the passwords to synchronize with the Google account. The attacker could then leverage these credentials to access the corporate VPN using MFA bypass techniques.

Another incident involving using a personal Google account was the Okta support system breach . David Bradbury (Okta’s Chief Security Officer) said, “An employee had signed in to their personal Google profile … the username and password of the service account had been saved into the employee’s personal Google account.ˮ The personal account was the initial entry point used by the attacker, which later became a major incident for Okta, affecting significant customers such as Cloudflare, 1Password and BeyondTrust.

At this point, it has become clear that our personal accounts serve as a common attack vector for threat actors. We will now cover adversarial scenarios for Apple, Google and Meta under the base assumption of a compromised personal account. Let’s start with Apple.

Apple

The output of Apple׳s export service is structured in the following manner:

Predators in the Wild

Figure 2 Apple export directory structure

The adversarial use cases are:

Find Physical Devices

Today, most organizations integrate multi-factor authentication (MFA) as a core practice in the authentication process. The MFA challenge is usually performed using edge devices such as mobile phones.

To bypass a corporate MFA flow, we first need to map Joe׳s available edge devices and collect as much metadata as possible. The export allows us to gather the precise OS version of the active device and learn about Joe׳s patching habits. An adversary can use these insights to exploit proper vulnerabilities.

Another meaningful piece of information available to us through the export involves Bluetooth accessories. The information is available from “AccessoryDeviceInfo.json” as part of the export.

Not only can we map Joe’s Bluetooth devices, but we can also access their respective MAC addresses.

By analyzing the mapping we can exploit specific vulnerabilities, such as the recent AirPods CVE-2024-27867, which enables attackers to eavesdrop on an AirPods microphone using only a MAC address. It is worth noting that Android users who use AirPods do not receive automatic updates and are, therefore, likely susceptible to this vulnerability.

Find a trusted device:
Read JSON file "Devices Registered with Apple Messaging.json"
Print os-version from devices array

Find Bluetooth devices:
Read JSON file "AccessoryDeviceInfo.json"
For accessory in devices
    print 'Accessory Name'
    print 'Bluetooth Mac Address'

Output
iPhone OS,17.2.1,18D42
AirPods Pro 1a:2b:3c:4a:5b:6c

Find ISP (Internet Service Provider) and Mobile Carrier Name

Reconnaissance is an integral part of every attack. We can look for mobile carrier information, which might lead us to move laterally into Joe׳s primary mobile device using social engineering. The export, generated by the export service, contains a great deal of PII, including the last four digits of Joe׳s credit card number (which can be found in “Billing Information History.csvˮ as part of the export). The digits are used as a standard identification method for banks and help desks, including Joe׳s mobile carrier and ISP.

We can combine Vishing (Voice Phishing) with the four last digits of his card to manipulate the help desk employee to our advantage. In nature, this situation can be likened to quicksand. The target — in this case the help desk employee— feels safe and unsuspecting, much like someone walking on what they believe to be solid ground. Once lured in, they sink deeper and deeper into the trap, revealing the hidden dangers only after the damage is done. In the past, we have seen actual incident examples that included MFA resetting (an MGM attack ), SIM swapping (an attack on us insurers) and many others.

Another possible attack vector we can look for is identifying vulnerable equipment — unpatched routers made by the specific vendor, AKA our discovered ISP. Unpatched products lack critical security updates and, therefore, pose a serious risk.

Find IP company:
Read CSV file "iTunes Payment Stack - Activity.csv"
Extract IP Company column from file
Find value in the column where value != None
Print value
Find mobile carrier:
Read CSV file "Subscription Click Activity.csv"
For entry in file
    Use Regex to find pattern: carrier": "([^"]+)
    print match

Output
Verizon
AT&T

Find All Developers Who Created the Apps on Your iOS Device (Top 10, Sorted by Date)

Adversaries commonly use software supply chains to avoid defense mechanisms by targeting the weakest link. This attack vector can be leveraged by using our downloaded applications. The export contains information about every application we have ever downloaded to our iOS device. Each app developer represents an extension of trust, enlarging the attack surface.

We can flag the weakest app developers as candidates for a supply chain attack. Joe inherently trusts these providers, and compromising their CI/CD flows could, in turn, compromise his device.

Looking at the big picture, Joe has trusted more than 1,000 development organizations over the years. Did he intend to trust such a tremendous number of developers scattered across more than 50 countries? How many of these countries do he and his government consider hostile?

Find paid app providers:
Read CSV file "Store Transaction History.csv"
Extract Seller column
Drop duplicates
Count occurrences
print count
Sort by 'Item Purchased Date' in descending order
Print top 10 rows

Find free app providers:
Read CSV file "Store Transaction History - Free Apps.csv"
Perform the Same steps as the paid app

Output

The user trusted 431 different paid app providers and 842 free app providers.

Here are the top 10 paid providers:

Seller
Audible, Inc.
JoyTunes
Apple Inc.
Sony Music
Microsoft Corporation
Netflix, Inc.
Spotify Ltd.
Adobe Inc.
Disney
Nintendo Co., Ltd.

Here are the top 10 free providers:

Seller
Apple Inc.
Duolingo
Grammarly, Inc
Google LLC
OpenAI, L.L.C.
Meta Platforms, Inc.
Snap Inc.
Zoom Video Communications, Inc.
TikTok Inc.
Imangi Studios (known for ˮTemple Runˮ)

Find the Top 3 Most Common Event Locations Based on Joe’s Personal Calendar

The export gives us access to Joe׳s schedule via his calendar. It enables us to identify patterns in Joe׳s routines, including the exact locations in which he is expected to be.

This information gives us various options if we wish to target Joe. We can map the physical security posture of the discovered locations to find the weakest link, which could enable physical damage, asset theft or espionage.

Based on our research, Apple even records events that were deleted from your calendar (deletions might indicate a desire to hide something).

Read ICS file "Joe.ics"
Read calendar from file
For event in calendar
    extract location
Count occurrences per location
Print top 3 rows

Output

Location	Count
Shake Shack, 400 W 8th St, Los Angeles, CA 90014, United States	9
Corgi Cafe, C. de la Indústria, 78, Gràcia, 08025 Barcelona, Spain	6
South Jersey Sports Center, 100 Pike Rd Bldg C, Mt Laurel Township, NJ 08054, United States	3

Google

The output of Google׳s export service is structured in the following manner:

Directory structure

Figure 3 Google export directory structure

The adversarial use cases are:

Find the Top 3 User Agents Used by Joe

Identities are at the heart of most security incidents. Therefore, large enterprises implement security controls, which often include UBA (user behavior analytics) and ITDR (Identity Threat Detection and Response).

To bypass these security controls, an attacker can simulate the victim׳s actions using the victim׳s most common user agents. A user agent is a string that represents a client, including its software version and operating system. Like a wolf in sheep’s clothing, the threat actor disguises himself by adopting the victim׳s identity, waiting for the right opportunity to strike.

Although user agents are considered a “weakˮ user identifier in comparison to other methods (IPs, session tokens, etc.), security vendors commonly integrate them into a multi-layered anomaly detection engine.

Read HTML file ".SubscriberInfo.html"
Extract user agent table from the file
Group BY "Raw User Agents"
Count occurrences
Sort by "IP Address" in descending order
Print top 3 rows

Output

User-Agents	Count
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36	8
Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Mobile/15E148 Safari/604.1	5
Mozilla/5.0 (Linux; Android 13; SM-G991U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.5790.136 Mobile Safari/537.36	3

Find Sensitive Data for Blackmail Purposes Using Past Searches

According to our research, Google export contains every term you have searched for on Google in recent years. Adversaries could leverage this goldmine of confidential information for blackmailing purposes.

An attacker can build a list of custom keywords that he considers sensitive (focusing on addiction, financial trouble, etc.), and use it against Joe for blackmail.

The export even includes terms typed locally in the browser URL without the user hitting “enter.ˮ This behavior gives Joe a false sense of privacy, as Joe believes his keystrokes are local to his endpoint.

Read HTML file "My Activity.html"
For word in sensitive word list
  if word in file.text
     print word, text

Output

ˮloanˮ, You searched and visited “affordable loans for everyone!ˮ
ˮoverdueˮ, You searched and visited – “Your bill is overdue? Contact our lawyers now!ˮ

Find Joe’s Physical Location

The export can be used to find the exact user location (including postal code). Its accuracy depends on whether Joe has enabled the GPS settings on his mobile device. Otherwise, Meta will only use IP and publicly available information.

Knowing Joe׳s home address allows us to target his Wi-Fi network using known methods, such as the Evil Twin and deauthentication attacks.

Another possible adversarial route is the use of social engineering or espionage practices.

Read HTML file "primary_location.html"
Find div '_2ph_ _a6-p'
For section in div
    if section isdigit
       assign location and break
Print location

Output

Address	postalCode
redacted address in US	redacted postal code

Find the Top 20 Posts That Captivated Joe’s Attention the Most

One of the more interesting data pieces adversaries may use against you is your post preferences. We found that the data collected about you over the past couple of years is so detailed that it records every Facebook post you view and the exact number of seconds it spends on your screen.

This highly intrusive data collection enables us to create a sophisticated social engineering campaign based on Joe׳s interests by sorting Joe׳s posts based on the time spent on each one.

We can then use an LLM to auto-generate a phishing email tailored for Joe.

Define file pattern as page_.html
Define post array
For file in pattern
    read HTML file
    extract table
    extract headers
    for row in table
        extract cells
        map headers to cell value
        if cells > 2 and third cell contains a link
           extract URL
           extract Time
           add {URL, Time} entry to post array
Sort post array by Time
Pick top 10 posts
print post array

Output

Content	secondsViewd
https://www.facebook.com/groups/politicsForFun/permalink/163343214453678/	2984.7
https://www.facebook.com/loanMasterMoneyTalks/videos/332116553356/	2690.2

Recap

What do we know about Joe so far?

Based on “personalˮ information only, we managed to map Joe׳s most critical assets. These include his active MFA device, which likely serves as the active MFA factor for his corporate assets. Knowing the exact device metadata information (such as OS version) enables us to target this device effectively.

We have learned much about our victim’s digital footprint, including his common user agents, mobile carrier, IP company and primary location. The information gathered has helped us with defense evasion and reconnaissance of our target.

Using his Bluetooth devices (which are commonly used in a corporate office environment), we listened in on Joe׳s microphone and heard his deepest secrets. Is he having any problems with his wife? Or maybe he has been kind enough to share the password to our corporate VPN?

The voice samples we obtain can be fed into a Deepfake model, facilitating easier social engineering attacks.

These adversarial examples are just the tip of the iceberg. Hundreds of other use cases lurk in the shadows, waiting to exploit your data. Some notable unexplored examples include guessing security questions using AI, extracting sensitive documents from Google Drive for social engineering purposes, password guessing to reduce brute-force search space and many more.

Attackers can leverage the scenarios we present in the article to create a fruitful reconnaissance framework that can be easily expanded to include various new techniques.

Joe׳s life may never be the same, as the threat actor targeted all of Joe’s most important domains, as described in the diagram below.

Ruined Diagram

Figure 5 Joe’s life ruined diagram

Mitigation

So, what can we do to protect ourselves?

Use strong phishing-resistant MFA for all accounts, while ensuring proper password complexity. Many do not view social media accounts as “sensitiveˮ and ignore critical security controls.
Do not sync passwords between your personal and work accounts. It is easy to do this accidentally, so the best practice is to avoid using personal accounts in corporate environments whatsoever. In case you already synced one of your passwords to your Google account, visit Goo g le Password Manager to view and remove it.
Monitor personal account export actions as sensitive operations and respond accordingly. This detection should be integrated with an effective ITDR strategy.
Request a deletion of your personal data for idle accounts. This is your legal right, as defined by GDPR and similar regulations, the “right to be forgotten.ˮ
Perform a secure disk wipe of the local export information by overwriting the relevant section of the local drive with random 0s and 1s to permanently delete the data. This action can be done using shred command on Linux or SDelete (Sysinternals) on Windows. Use caution when operating these tools, as files will not be recoverable after deletion.
Use an enterprise-grade protected browser. These products enable auditing capabilities while reducing the attack surface originating from web session data, password and form autofill syncs, risky commands, and personal data uploads and downloads.

Conclusion

In the predatory landscape in which we operate, threat actors can use our personal information against us. This not only allows threat actors to harm us, but it also puts our employer at risk. We should take responsibility by treating personal information as a critical asset, including security hygiene practices (MFA, password complexity) and proactive observability measures (detection and response) set by the blue teams. By employing these practices, we can avoid the next major breach while keeping our due right to privacy.

The attack vectors we have covered are expected to evolve drastically in the future in terms of both variety and quantity. Let’s ensure that, as a community, we are well-prepared and vigilant in securing our precious information.

Lior Yakim is a threat researcher at CyberArk Labs.

ByteCodeLLM – Privacy in the LLM Era: Byte Code to Source Code

TL;DR ByteCodeLLM is a new open-source tool that harnesses the power of Local Large Language Models (LLMs) ...

Discovering Hidden Vulnerabilities in Portainer with CodeQL

Recently, we researched a project on Portainer, the go-to open-source tool for managing Kubernetes and Dock...

Up Your Security I.Q. by Checking Out Our Collection of Curated Resources.

White FAANG: Devouring Your Personal Data

Abstract

Introduction

Predators in the Wild

Apple

Find Physical Devices

Find ISP (Internet Service Provider) and Mobile Carrier Name

Find All Developers Who Created the Apps on Your iOS Device (Top 10, Sorted by Date)

Find the Top 3 Most Common Event Locations Based on Joe’s Personal Calendar

Google

Find the Top 3 User Agents Used by Joe

Find Sensitive Data for Blackmail Purposes Using Past Searches

Meta

Find Joe’s Physical Location

Find the Top 20 Posts That Captivated Joe’s Attention the Most

Recap

Mitigation

Conclusion

Previous Article

Next Article

STAY IN TOUCH

White FAANG: Devouring Your Personal Data

Abstract

Introduction

Predators in the Wild

Apple

Find Physical Devices

Find ISP (Internet Service Provider) and Mobile Carrier Name

Find All Developers Who Created the Apps on Your iOS Device (Top 10, Sorted by Date)

Find the Top 3 Most Common Event Locations Based on Joe’s Personal Calendar

Google

Find the Top 3 User Agents Used by Joe

Find Sensitive Data for Blackmail Purposes Using Past Searches

Meta

Find Joe’s Physical Location

Find the Top 20 Posts That Captivated Joe’s Attention the Most

Recap

Mitigation

Conclusion

Previous Article

Next Article

Recommended for You

In July 2024, Google introduced a new feature to better protect cookies in Chrome: AppBound Cookie Encryption. This new feature was able to disrupt the world of infostealers, forcing the malware...

Unless you lived under a rock for the past several months or started a digital detox, you have probably encountered the MCP initials (Model Context Protocol). But what is MCP? Is this just a...

The Model Context Protocol (MCP) is an open standard and open-source project from Anthropic that makes it quick and easy for developers to add real-world functionality — like sending emails or...

TL;DR In this post, we introduce our “Adversarial AI Explainability” research, a term we use to describe the intersection of AI explainability and adversarial attacks on Large Language Models...

Introduction The term “Agentic AI” has recently gained significant attention. Agentic systems are set to fulfill the promise of Generative AI—revolutionizing our lives in unprecedented ways. While...

In the past two years, large language models (LLMs), especially chatbots, have exploded onto the scene. Everyone and their grandmother are using them these days. Generative AI is pervasive in...

Cryptojacking malware—a type of malware that tries to steal cryptocurrencies from users on infected machines. Curiously, this kind of malware isn’t nearly as famous as ransomware or even...

Introduction Identity providers (IdPs) or Identity and Access Management (IAM) solutions are essential for implementing secure and efficient user authentication and authorization in every...

You might not recognize the term “OAuth,” otherwise known as Open Authorization, but chances are you’ve used it without even realizing it. Every time you log into an app or website using Google,...

While Kubernetes’ Role-based access control (RBAC) authorization model is an essential part of securing Kubernetes, managing it has proven to be a significant challenge — especially when dealing...

TL;DR ByteCodeLLM is a new open-source tool that harnesses the power of Local Large Language Models (LLMs) to decompile Python executables. Furthermore, and importantly, it prioritizes data...

Recently, we researched a project on Portainer, the go-to open-source tool for managing Kubernetes and Docker environments. With more than 30K stars on GitHub, Portainer gives you a user-friendly...

As large language models (LLMs) become more advanced and are granted additional capabilities by developers, security risks increase dramatically. Manipulated LLMs are no longer just a risk of...

In software development, CI/CD practices are now standard, helping to move code quickly and efficiently from development to production. Azure DevOps, previously known as Team Foundation Server...

tl;dr: Large language models (LLMs) are highly susceptible to manipulation, and, as such, they must be treated as potential attackers in the system. LLMs have become extremely popular and serve...

Over the short span of video game cheating, both cheaters and game developers have evolved in many ways; this includes everything from modification of important game variables (like health) by...

Following our post “A Brief History of Game Cheating,” it’s safe to say that cheats, no matter how lucrative or premium they might look, always carry a degree of danger. Today’s story revolves...

During a recent customer engagement, the CyberArk Red Team discovered and exploited an Elevation of Privilege (EoP) vulnerability (CVE-2024-39708) in Delinea Privilege Manager (formerly Thycotic...

Golang applications that use HTTPS requests have a built-in SSL verification feature enabled by default. In our work, we often encounter an application that uses Golang HTTPS requests, and we have...

What Are Cookies When you hear “cookies,” you may initially think of the delicious chocolate chip ones. However, web cookies function quite differently than their crumbly-baked counterparts....