Fuzzing RDP: Holding the Stick at Both Ends

August 27, 2021 Shaked Reiner and Or Ben-Porath

 

stick image

Introduction

This post describes the work we’ve done on fuzzing the Windows RDP client and server, the challenges of doing so, and some of the results.

The Remote Desktop Protocol (RDP) by Microsoft continues to receive attention from the security community. From several critical vulnerabilities discovered in 2019 that had the potential to compromise millions of internet-facing servers, to RDP being used as one of the main initial access vectors by attackers. These risks have been further amplified by pandemic driven work from home.

Our initial interest for this project was VM related. Because the default way of connecting to an Azure Windows machine or a Hyper-V virtual machine is RDP, we decided RDP was the perfect target.

Like most successful projects, ours too began with some intense Googling, and we quickly stumbled upon a great BlackHat Europe 2019 talk on RDP fuzzing by Park, Jang, Kim, and Lee. The speakers found a couple of vulnerabilities in a matter of just a few hours using a not-so-fast fuzzer, so we decided to take on the challenge to build on top of their work, expand the fuzzing capability to other channels, improve its performance and find our very own RDP Remote Code Execution (RCE).

Unfortunately (or fortunately, depending on your point of view), not all fuzzing projects result in critical vulnerabilities (see Survivorship Bias). We were unable to find our RDP RCE (yet), but we did manage to find a handful of bugs, and gain a better understanding of the protocol, its components, and the fuzzing process and tools. The fuzzing infrastructure we created is generic enough to be helpful in fuzzing other targets.

In this post, we’re going to share our process of doing all that’s mentioned above. First, we’ll provide an overview of RDP and the fuzzing setup we created, then we’ll share the challenges we faced and how we dealt with them, and finally, we’ll review a couple of bugs we uncovered through this process.

RDP Overview

The Remote Desktop Protocol is a popular protocol for remote access Windows machines. Malwaretech recently described it as a “protocol for protocols”. RDP allows multiple channels to operate in every connection, and each one of them has a different purpose. This means that every channel has its own code that handles its data, its own structure definitions, and data flows. This essentially means that there are, indeed, multiple protocols within RDP.


Channels within an RDP Connection

RDP channels can be either static or dynamic, but the distinction between the two is not crucial for the purpose of this article. If you’d like to know more about the inner workings of an RDP connection, we recommend reading our previous post on it. In case you want to learn more about the fuzzing process, you came to the right place.

Challenges in RDP Fuzzing

In a “standard” fuzzing scenario, you have a program that reads input controlled by the fuzzer, which could be a file or a stream of data of any sort. The program then handles the data while the fuzzer monitors the code coverage the data generated. Based on that coverage, the fuzzer mutates the input, sends the mutated input to the program again, and the process repeats.

RDP fuzzing is different since we must have an RDP connection active at all times. The data the fuzzer can input to the program needs to be sent as a Protocol Data Unit (PDU) on top of a specific channel (which should also be active during the fuzzing time) in an open connection. As mentioned before, each channel is its own protocol, so the fuzzing needs to happen on a channel-by-channel basis. This introduces the following challenges to the fuzzing process (which may also apply to other protocol/networking-related fuzzers):

  1. Client-Server architecture – In traditional fuzzing, the fuzzer can simply run the target application and supply its inputs. In a client-server scenario, the target application runs on one side of the connection, while the inputs are sent from the other side. In the case of RDP, the two sides are usually on different machines.
  2. Statefulness – RDP is a stateful protocol, meaning that you have to consider the state of the connection while fuzzing test cases. This can badly affect fuzzing stability.
  1. Stability in the context of fuzzing usually means consistency – the confidence that identical inputs will generate identical code paths (for more info see the AFL docs)
  1. Multi-input fuzzing – When fuzzing a target that accepts a file as an input (file-format fuzzing), all the input of the fuzzer to the target is contained within a single file. On the contrary, when you fuzz a protocol, you may need to send a few consecutive messages in order to reach interesting code paths.
  2. Finding the target code – When you use coverage-guided fuzzing, you usually need to indicate to the fuzzer at which point it needs to start monitoring the code coverage (i.e., what’s the target function that handles your inputs?). RDP has many components that are responsible for its operation. Locating the correct one may be a difficult task in some cases.

These four challenges were the main ones we anticipated coming into this project. In the next section, we’ll discuss how we overcame these challenges, as well as additional ones that arose at a later stage of the work.

Technical Details: The Challenges and How to Overcome Them

In this section, we’ll go over the implementation details, and the challenges we encountered during the work process. We’ll also explain how we tackled them in order to get a working fuzzing setup. We have created a project repo on GitHub that contains all the code we wrote during this project. If you’re interested in this area, you may find this helpful when getting started.

Client-Server Architecture

In this project, we wanted to fuzz Windows’ RDP server as well as its RDP client. The motivation for fuzzing the RDP server is obvious: an attacker could use it to compromise a Windows server remotely and gain access to it. The motivation for fuzzing the RDP client is different. Think of a scenario in which an attacker has already compromised an RDP server, they then prepare their exploit and wait until a victim RDP client connects to it. Once the victim is connected, the attacker can compromise the victim’s machine as well through the RDP client. This can happen when an administrator connects to a server they manage, and this can even be used as a VM escape due to the fact that Hyper-V utilizes RDP to access its virtual machines (using the “enhanced session” feature).

A fuzzer for RDP needs to have the following basic components:

  • Instrumentation engine that tracks code coverage and detects crashes
  • Mutation engine that generates new inputs
  • Input sender that sends the fuzzer’s test cases over the appropriate target channel
  • Target binary that is tracked by the instrumentation engine

As you may imagine, the fuzzing setup for the client and the server ought to be quite different, but there are some similarities. We should start with those.

The basic fuzzing setup used (for both client and server fuzzing)
The basic fuzzing setup used (for both client and server fuzzing)

On the target side we have the following:

  • Fuzzer – custom-built afl-fuzz.exe
  • Instrumentation engine – custom-built winafl.dll, using custom-built DynamoRIO with the in_app instrumentation mode

See next section for the functionality added to WinAFL and DynamoRIO in our custom builds

  • Inputs (or test cases) – mutated PDUs that are written to an intermediary file in a directory shared between the target and the input sender

And on the input sender side we have one component:

  • Agent – reads intermediary files and sends inputs on the targeted channel

The idea of the setup is quite simple: we try to send our fuzzed inputs on a “live” RDP connection, mocking (virtually) nothing.

To do so, we decoupled the generation of inputs from their transmission to the target, allowing the inputs to move from one side of the RDP connection to the other before being processed by the target. Additionally, we used WinAFL’s in-app instrumentation mode to not interrupt the normal execution flow.

In order for coverage-guided fuzzing to work, there must be a one-to-one correspondence between inputs and the code-paths they triggered. To achieve that, we developed “background fuzzing”, which is differentiating the fuzzer PDUs from the regular PDUs, and only tracking code paths for the former. This was essential because we only want the fuzzer to track the coverage of our own test cases rather than random PDUs that are sent over the connection.

To illustrate that, let’s see how something like that might look like when fuzzing the RDPSND virtual channel that redirects audio from the RDP server to the client. According to the docs, the first byte of every PDU represents the type of message sent.


Source: [MS-RDPEA]

Supported values for the msgType field are 0x01 through 0x0D. In this case we can use the most significant bit of the first byte as our fuzzing marker in the following manner:

  1. The agent turns the most significant bit of the first byte before sending a PDU.
  2. WinAFL checks the most significant bit of the first byte before the message is handled. If the bit is on, WinAFL turns the bit back off and tracks the code coverage of this message. If the bit is off, WinAFL just ignores the message and doesn’t track any coverage.

After seeing the similarities between the client setup and the server setup, let’s look at how they differ, starting with the client.

Client Fuzzing Setup

The Windows RDP client is mstsc.exe, but most of the logic that handles virtual channel data is in mstscax.dll which the client loads.

Note that the remote access client of Hyper-V virtual machines, vmconnect.exe, also uses mstscax.dll for its core functionality

Windows RDP Client
Windows RDP Client

For simplicity and efficiency, we executed both client and server (the target and the agent) on the same machine (i.e., using the client to connect to localhost/127.0.0). In order to allow parallel fuzzing, we also used mimikatz to patch the server so that it allows concurrent RDP connections.


These are the components of the setup when fuzzing the client:

  • Target – mstsc.exe and a target module of mstscax.dll
  • Instrumentation – DynamoRIO that creates the client process and winafl.dll, a DynamoRIO client that reports code coverage
  • Mutation engine – AFL-Fuzz running on the same machine and writing new test cases to a file
  • Input sender – our RDPFuzzAgent that opens a handle to the server and sends PDUs on the selected virtual channel. The agent takes each test case from the file created by AFL-Fuzz and sends those on our target channel.

Using those components, we were able to achieve a modest execution speed of ~50-100 executions per second.This is by no means fast, but it was faster than the speed shown in the aforementioned Park et al research, so we were OK with that.

Server Fuzzing Setup

In order to find the target binary that holds the main logic of the RDP server, we can simply look into the Remote Desktop Services service.

PS C:\> gci HKLM:\SYSTEM\CurrentControlSet\Services\$((Get-Service -Name "Remote Desktop Services").Name)


    Hive: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TermService


Name                           Property
----                           --------
Parameters                     ServiceDll             : C:\Windows\System32\termsrv.dll
. . .

The main logic of the RDP server is indeed in termsrv.dll, which is loaded into a svchost.exe process with the following command line: C:\Windows\System32\svchost.exe -k NetworkService -s TermService.

The original plan for fuzzing the server was similar to the client’s, that is to fuzz a few instances of the target in parallel on the same machine which requires running multiple instances of TermService. This turned out to be quite a challenging task since Windows does not support it by default. When we tried to do that manually, we even saw a few hardcoded strings within termsrv.dll that point to TermService and its registry keys, so we decided to focus our efforts elsewhere and just use multiple VMs to fuzz the server in parallel.

In the client fuzzing setup, we used the server-side API call WTSVirtualChannelWrite() to send the fuzzer inputs to the target. Unfortunately, we couldn’t find a similar API that would let us send inputs to the server in an RDP connection. Hence, we chose to use a custom-built FreeRDP (a popular open-source RDP client) from an Ubuntu machine to send inputs to the fuzzed server. Note that this is not an ideal setup for fuzzing and these constraints resulted in the fuzzing speed of the server fuzzing being roughly 1/10 of that of the client fuzzing.

server fuzzing stats
server fuzzing stats

These are the components of the setup when fuzzing the server:

  • Target – depends on the channel being fuzzed. This can be termsrv.dll, audiodg.exe, rdpinput.exe, etc. (see the “Locating the relevant code” under challenges in the next section to understand how we figured the target for each channel).
  • Instrumentation – in this setup, the fuzzer has no control over the initialization of the target process. So, in order to track code coverage, we must instrument a running target. Our fuzzer of choice, WinAFL, has a few instrumentation platforms it can use, and we chose DynamoRIO for its extensibility and robustness.
  • DynamoRIO (and hence also WinAFL) does not support attaching to a live process out of the box. We have modified this pull request to implement it, and altered WinAFL to use this functionality.
  1. It is worth noting that this attach functionality opens the door to fuzzing processes that were “unfuzzable” using WinAFL before — processes whose creation the user has no control over. In particular, it allows fuzzing of windows services.
  • Mutation engine – AFL-Fuzz running on the Windows machine and writing new test cases to a file in a shared folder.
  • Input sender – our build of FreeRDP, which connects to the server on the Windows machine, monitors the shared folder for new test cases and sends each test case on the targeted RDP channel.

Statefulness

When running the first channel (fuzzing the client) we’ve encountered a problem: as soon as two invalid messages were detected on the client-side, it would terminate the connection immediately.

To avoid this issue, we introduced some protocol grammar enforcement in the agent’s logic, i.e., limiting the space of allowed inputs. Among other things, we implemented:

  • Message size restrictions
    • minimum, maximum
    • divisibility (e.g., if a message contains a variable-length array of 4-byte elements, it should be divisible by 4)
  • Values restrictions
    • only allow specific values
    • minimal and maximal values
    • value must be the PDU size
    • value must differ in each PDU

We extracted some of the RDP grammar from Microsoft’s documentation of RDP and its extensions, some from reverse engineering the relevant binaries, and the rest from tracing failed target executions.

For example, this logic can be used to only allow messages that begin with one of the supported msgTypes, followed by the size of the PDU and a unique identifier, and whose total size is between 22 and 122 and leaves a remainder of 2 modulo 4.

It’s important to mention that by doing these enforcements, you practically limit the ability of the mutation engine to change the test cases as it pleases, thus potentially missing interesting mutations. For that reason, we tried to enforce as little as we could, while still ensuring the connection will not get closed every too often.

Another important point here is that those grammar enforcements are not the only option when dealing with these kinds of issues. In the case of one specific channel (GFX), we turned to patch the actual target function we were fuzzing so that it won’t close the connection in case of an invalid set of messages. This allowed us to continue fuzzing invalid messages and keep the connection open at all times. Here also, you are at risk of finding crashes that will not reproduce in the original code (without the patch).  This is a great example of the delicate balance fuzzing requires between making sure you get sufficient execution speed while still maintaining the original functionality of the target program, as well as the freedom of the mutation engine.

Multi-input Fuzzing

It stands to reason that most logic, and hence most bugs, depend on a sequence of messages rather than a single one. It was not a coincidence that a majority of the bugs we found involved at least two messages.

To uncover these bugs, we introduced multi-input fuzzing. We used a fuzzer dictionary that identifies the start of a new message and its type. The agent would then split the input into multiple PDUs according to those dictionary words, and send them one after the other.

So, a multi-input input might look something like this:

___cmd07 <1st PDU data>
___cmd02 <2nd PDU data>
___cmd03 <3rd PDU data>

which the agent translates to three messages with msgType 7, 2, and 3, and their respective content.

To maintain the one-to-one correspondence between the inputs the fuzzer created and the code coverage they triggered, we introduced a second marker that identified the last message in a sequence. Only when WinAFL identifies that a call with this “last in sequence” marker ends it completes the cycle and creates the next input.

While multi-input fuzzing was crucial (and fruitful) to our efforts, we also found it necessary to limit the number of PDUs for each test case. That is because the fuzzer is drawn to inputs that lead to different code sequence. Repeating the same message 100 times leads to a different code sequence than sending it once.

Reproduction Issues

After about one week of fuzzing, the first crash appeared. However, the crash didn’t reproduce when we tried running the same input again. This happened quite often, and it was likely due to the stateful nature of the protocol. In other words, one test case got the client to a specific state, which was then “utilized” by a subsequent test case to crash the target.

To understand unreproducible crashes, we modified WinAFL to create a memory dump of the target process whenever a crash is detected.

Crash Analysis Automation

Creating dumps on crashes solved one problem, but created another: once a crash is found, it is very likely to be encountered repeatedly. Generally, WinAFL tries to detect identical crashes and notify only on “unique crashes”, but our multiple message fuzzing made this detection very hard. Think about a case where a single message causes the target to crash. The fuzzer can create any set of messages with this one at the end. Each of these message sets will crash the target, and will also result in a different coverage bitmap, since the messages and their handling are different (except for the last one that actually matters). This will cause WinAFL to report a unique crash every time.

We had to automate the analysis of the crashes for two reasons. First, it is tedious work to analyze each crash manually (only to find that it is old news), and second, the disk was quickly filled up with memory dumps.

To overcome this problem, we wrote a WinDBG script that analyzes a crash and extracts the crashing stack from it. We then ran a PowerShell script that periodically analyzes the crashes and keeps only those that contain new stacks (and emails us the good news).

Long Startup Time

In the client fuzzing setup, from the moment the target (mstsc.exe) was created by the fuzzer, to the moment the connection was established and the first message could be sent, it took more than 10 seconds. Hence, it was crucial to perform as many iterations as possible without restarting the target. We achieved this by using the -fuzz_iterations parameter of AFL-Fuzz and supplying as much iterations as possible (before things start to break).

Multi-channel Fuzzing

Like multi-input fuzzing, some of the logic requires a sequence of messages on different channels. For example, sending camera data from the client to the server is supported using multiple channels, as explained in the docs.


Source: [MS-RDPECAM]

Thus, if we wish to fuzz the server by sending camera inputs, we must do so on at least two different channels.

Our solution was also similar — the fuzzer dictionary also determined the channel on which the message was to be sent.

Locating relevant code

Since RDP has many different components in Windows, it can be challenging to even locate the target function we need to fuzz.

PS C:\> gci -Include *.exe, *.dll, *.sys -Recurse C:\Windows\ -ErrorAction SilentlyContinue | ?{[System.Diagnostics.FileVersionInfo]::GetVersionInfo($_).FileDescription -match "RDP|Remote Desktop"} | Measure-Object | select count


Count
-----
  191

To do that quickly, we created a small database of all the symbols that might be related to our project.

The idea was to download all the PDBs that are relevant to our version of Windows, extract all function names from them and dump them into a file (linking back to the exe/sys/dll) so that we can quickly search for function names and locate the function related to our current target channel.

This helped a lot. Since almost all of the dynamic channel receive functions match the following pattern C<class-name>::OnDataReceived, we could quickly look at the list of those functions and figure out what is probably related to the channel we are targeting.

Bug Case Studies

In this section, we’ll share technical details of two of the bugs we found during this project.

AUDIO_PLAYBACK Channel (Server → Client)

The AUDIO_PLAYBACK_DVC virtual channel is used to play sounds from the server on the client. Its normal flow consists of two sequences: initialization and data transfer. In normal usage of the protocol, the initialization sequence occurs once at the beginning, followed by many data transfer sequences.

    1. Initialization Sequence – used to establish the version and formats to be used in the following Data sequences


Source: [MS-RDPEA]

  1. Data Transfer Sequence – sound data from the server to be played on the client


Source: [MS-RDPEA]

The Wave and WaveInfo PDUs contain an index to the format array exchanged in the initialization sequence that determines the format of the transmitted audio data.


Source: [MS-RDPEA]

When a format change occurs — i.e., a Wave or WaveInfo PDU arrives with an index different than the last one used — the client verifies that the new index is valid.

// in mstscax!CRdpAudioController::OnNewFormat

if ( (unsigned int)new_format_index >= this->formatArray_size )

However, as long as the format index remains the same, this verification is skipped (OnNewFormat()) is not called and the verification code is in it). Here is a pseudo code of the relevant parts.

// in mstscax!CRdpAudioController::OnWaveData

last_format_index = this->last_format_index;
format_index_from_pdu = *((_WORD *)pdu + 3); //pdu is controlled by the server
if ( last_format_index != format_index_from_pdu )
{
    CRdpAudioController::OnNewFormat(this, (__int64 *)format_index_from_pdu); // this is where the bound check is being made
                                                                              // but only if the format index is different than the last index
    last_format_index = *((unsigned __int16 *)pdu + 3);
    this->last_format_index = last_format_index;
}
formats_array = (AUDIO_FORMAT **)this->formatArray;
current_format = formats_array[last_format_index]->wFormatTag; // crashes here

The vulnerable flow that triggers this bug in the client is as follows:


A Vulnerable Flow

  1. The server sends the client a Server Audio Formats PDU with 0x1A formats, causing the client to allocate a format array with this size.
  2. The server sends the client a Wave2 PDU that uses format 0x5 from the array with sound data
    • The client checks whether this format is identical to the last sent format.
    • If so, it uses the last decoder (dereferenced from the format item in the formats array) and if not, it will load the new decoder function pointer from the formats array.
  3. The server sends the client a Server Audio Formats PDU again — this time with only 0x2 formats, causing the client to free the previous formats array and allocate a new one with the new size.
  4. The server finally sends another Wave2 PDU using the last used format 0x5.
    • Since the format didn’t change, the client doesn’t perform any validity checks.
    • The client then performs an out-of-bound read trying to read the sixth format from a 2-format array and crashes.

This allows an attacker to cause a reallocation of the format array using an additional Server Audio Formats PDU and then specify an invalid index that was previously valid and used, causing the client to read out of bounds of the formats array and crash.

Note that this bug relies heavily on multi-input fuzzing, and we could not have found it without this feature.

AUDIO_INPUT Channel (Client → Server)

The AUDIO_INPUT virtual channel is used to send sound input from the client to the server. On the server side, the audio input data is handled by the elevated audiodg.exe process.

As in the AUDIO_PLAYBACK_DVC channel, the client and server first exchange an array of sound formats they support.


Source: [MS-RDPEAI]

The Sound Formats PDU begins with a header of nine bytes, which includes the command, number of formats, and size of the packet, followed by an array of formats, each of variable length (plus an optional field of extra data).


Source: [MS-RDPEAI]

The code that handles the Sound Formats PDU is in rdpendp.dll. It first verifies that the packet size is at least nine bytes, and then reads the header and verifies that the size from the header is not larger than the size of the packet.

// in rdpendp!CAudioInputHandler::OnFormatsReceived

	if ( size < 9 )
  {
		// ...
  }
	// ...
  size_from_msg = *(_DWORD *)(data + 5);
  if ( size_from_msg > size )
  {
		// ...
  }

The same function then subtracts nine from the size it read from the header and reads the number of formats specified in the header, as long as the remaining length is sufficiently large.

The size from the header is not protected against an integer underflow, which may cause this subtraction to wrap around and will result in the program reading “formats” from after the end of the packet.

// in rdpendp!CAudioInputHandler::OnFormatsReceived

underflowed_size = size_from_pdu - 9;
format_definition_offset = (unsigned __int16 *)(pdu + 9);
if ( num_formats )
{
    while ( underflowed_size >= 0x12 )
    {
        format_definition_size = format_definition_offset[8];
        total_format_size = format_definition_size + 18;
        if ( underflowed_size < (unsigned __int64)(format_definition_size + 18) )
            break;
        (*class_fomats_array)[format_index] = (struct SNDFORMATITEM *)operator new[](total_format_size);
        local_format = (*class_fomats_array)[format_index];
        if ( !local_format )
        {
            status = E_OUTOFMEMORY;
            goto CLEAN_AND_RETURN;
        }
        memcpy_0(local_format, format_definition_offset, total_format_size);
        format_definition_offset = (unsigned __int16 *)((char *)format_definition_offset + total_format_size);
        underflowed_size -= total_format_size;
        if ( ++format_index >= num_formats )
            goto LABEL_50;
    }
    goto INVALID_ARG_EXIT;
}

Note that we have no control over the initialization of the audiodg process. Thus, we could not have found this bug without the DynamoRIO attach functionality.

Summary

In this blogpost we presented our process of trying to tackle a challenging fuzzing target: Windows’ RDP client and server. We wanted to share our process for a few reasons.

First, we believe it’s important to share the process even if you were unable to get to your original goal (an RCE for example). This can help you reflect on your own process — what seemed to work well and what could have been improved. This can also help the security community learn from past experiences.

Second, even though setting up a fuzzing environment may be a complex process, we think it’s a goal worth pursuing — even with more challenging targets like the one we presented here. RDP is a very complex protocol with many components and different code bases. Combining this with the fact that it is so popular (over 4 million internet-facing servers based on Shodan.io) makes it a very lucrative target for attackers. This means that as the security community, we need to make great efforts to make it more secure.


Internet facing RDP servers. Source: shodan.io

Even though Microsoft has been doing some great work securing their products in recent years, we believe that there are still more vulnerabilities in RDP components waiting to be fuzzed. With the understanding, we gained during this process and with the information shared here, we can drive future research forward. Examples for projects worth pursuing may be improving the fuzzing performance, expanding it to other channels we didn’t have the chance to fuzz, using an emulator-based fuzzer or even performing manual analysis on interesting parts of the code.

Finally, to show the generality of our fuzzing solution, we have since applied it to several RPC servers with promising preliminary results. Expect to see more on that in the future.

 

Previous Article
INTENT Security Research Summit: Explore. Experience. Share.
INTENT Security Research Summit: Explore. Experience. Share.

While enterprises fight to stave off relentless attacks, 57% of them are hamstrung by the ever-worsening gl...

Next Article
Black Hat Reflections: Supply Chain Attacks, Zero Days and Disclosures
Black Hat Reflections: Supply Chain Attacks, Zero Days and Disclosures

Black Hat 2021 had a markedly different tone from previous years. Welcoming remarks explored the strong par...