Skip navigation

Writing an XDP Network Filter with eBPF

Jeremy Erickson May 7th, 2020 (Last Updated: May 7th, 2020)

01. Introduction

At Kubecon 2019, there were a number of great talks that referenced eBPF as an incredibly powerful tool for monitoring, creating audit trails, and even high-performance networking. One talk by Cilium, describing how eBPF and XDP could liberate Kubernetes from iptables, was really interesting because we were exploring Kubernetes and Istio at the time. Simply because it sounded like a really neat technology, we spent some side-cycles leveling up our understanding of how eBPF works and where it can be used.

In this post, I’m not going to focus on generic eBPF so much as XDP. In comparison with most uses of eBPF in which you can monitor the system, with XDP you can actually modify raw network traffic during some of the earliest moments that packets enter the system, before even the kernel has had a chance to process them. NICs that support "Offloaded" XDP can even run XDP applications and process packets on the NIC hardware itself, without the CPU even seeing it! That is so cool!

02. What is XDP

XDP stands for eXpress Data Path, and provides a high performance data path in the Linux kernel for processing network packets as they hit the NIC. Essentially, you can attach XDP programs to a network interface, and then those programs get callbacks every time a new packet is seen on that interface. That's it. Real simple.

When you attach your XDP program to an interface, you can attach it in one of three modes:

  1. Native XDP - the XDP program is loaded by the NICs driver into its early receive path. This requires support by the NIC driver.
  2. Offloaded XDP - The XDP program is loaded onto the NIC itself, executing entirely off of the CPU. This requires support by the NIC device itself.
  3. Generic XDP - The XDP program is loaded into the kernel as part of the normal network path. This mode does not impart the same performance benefits as Native or Offloaded XDP, but works generically on kernels since 4.12 and is great for testing XDP programs or running them on generic hardware.

Once the packet has been handed to your XDP program, you can do whatever you want with it, including modifying it in place. When you're done, your return value indicates to the XDP Packet Processor what to do with the packet next.

XDP_DROP indicates that the packet should be dropped and not further processed. This can be useful for early dropping of DOS attack packets, with the userspace application analyzing traffic patterns and updating the XDP application in real time on what filters to apply to drop packets as quickly as possible.

XDP_PASS indicates that the packet should be passed up to the normal network stack for further processing. The packet can be modified before this happens, or left alone.

XDP_TX and XDP_REDIRECT are similar in that they both tell the Packet Processor to immediately retransmit the packet. XDP_TX tells it to forward the (likely modified) packet right back out the same network interface it came in on. XDP_REDIRECT tells it to forward it out through a different NIC or possibly through a BPF cpumap to a userspace process, bypassing the normal network stack.

XDP_ABORTED is for errors, and should never be explicitly used.

03. Sample Problem

It's always good to start small. For this experiment, I wanted to start with a program that would change something, but would be small and easy to test and visualize. To that end, I picked the simple problem of changing the dest port on UDP packets from 7999 to 7998.

This is simple to visualize and easy to test. Open three terminals and run the following two commands in two of them:

nc -kul 127.0.0.1 7999
nc -kul 127.0.0.1 7998

These terminals are our listening processes. We are using nc netcat to open up a socket listening to udp packets that come in to the 127.0.0.1 address on ports 7999 and 7998. The -k argument simply tells netcat to continue listening after it has received a packet so it can receive more packets from other clients.

In our third terminal, run:

nc -u 127.0.0.1 7999

Then on the next line, type some text followed by <Enter>. You should see the text echoed in the first terminal, listening on port 7999. Once we put our XDP application in place, attached to the lo loopback device, the packet will be modified en route and diverted to the other terminal listening on port 7998.

04. XDP Loader

The first step in executing our (not yet shown) XDP program is to write a loader that will load it into the data path. For this, we'll use the amazing bcc toolkit that makes everything easy.

% main.py

#!/usr/bin/env python3

from bcc import BPF

device = "lo"                            % (1)
b = BPF(src_file="filter.c")             % (2)
fn = b.load_func("udpfilter", BPF.XDP)   % (3)
b.attach_xdp(device, fn, 0)              % (4)

Our loader starts by importing the bcc library, and specifically the BPF loader. In (1), we designate the network interface we plan to attach to. We choose the loopback device since we plan to modify packets that will be traversing that device.

In (2), we create our BPF program based on the source file filter.c (covered later). This, I believe, invokes the BPF compiler and verifier to make sure the BPF program is valid and safe to run.

In (3), we specify the function from our BPF program that we want to use as a callback to handle incoming packets, and designate it as an XDP program type.

In (4), we attach our XDP function to the device we specified in (1), passing 0 for flags. If we had wanted to specify native or offloaded XDP, we would have used the flags parameter to do it.

Once our XDP function is attached to the network interface, it will begin processing packets. However, there are a few additional lines we need to add just to wrap up the loader application.

% main.py

try:
  b.trace_print()                        % (5)
except KeyboardInterrupt:
  pass

b.remove_xdp(device, 0)                  % (6)

In (5), we put our loader into a wait loop that watches for any printed messages from our BPF application and prints them to the screen. This will run indefinitely, so we wrap it in a try/except block to catch a Ctrl-C and allow the program to proceed.

In (6), with the user having indicated the program should exit, we remove our XDP application from the network interface.

05. XDP Application

Now that we have a loader that can get our XDP program in place, we need an XDP program to actually parse and modify the packets we want.

// filter.c

#define KBUILD_MODNAME "filter"
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/in.h>
#include <linux/udp.h>

We'll start off with all the libraries we need. In this case, we need the main bpf.h library, as well as libraries that let us parse the Ethernet, IPv4, and UDP header structures, since XDP operates before the kernel has had a chance to do any of that for us.

// filter.c

int udpfilter(struct xdp_md *ctx) {

Here, we start by defining a function called udpfilter that will get called each time a new packet comes in on our network interface. Notice that udpfilter matches the function we specified at (3) in our loader program. The packet is contained in the xdp_md struct that gets passed into our function.

// filter.c

  bpf_trace_printk("got a packet\n");                   // (7)
  void *data = (void *)(long)ctx->data;                 // (8)
  void *data_end = (void *)(long)ctx->data_end;         // (8)
  struct ethhdr *eth = data;                            // (9)

In (7) we start with a very simple confirmation that everything is working. Every time we get a packet, regardless of its contents, we want to print that we received a packet. Messages from bpf_trace_printk() will be passed to b.trace_print() in our loader program.

In (8), we pull pointers to the start and end of the packet out of the xdp_md struct. At this point, all we have is bytes.

In (9), we create an Ethernet header struct pointer and point it at the start of the packet data. This lets us use offsets provided in the Ethernet header struct to reference packet fields. It's a common technique used in low-level networking code.

// filter.c

  if ((void*)eth + sizeof(*eth) <= data_end) {

However, before we do anything with the ethernet header, we need to verify that there is actually enough data present to fill the header.

// filter.c

    struct iphdr *ip = data + sizeof(*eth);
    if ((void*)ip + sizeof(*ip) <= data_end) {

The same is true for the IPv4 header.

NOTE: In this example, I am making an assumption that the packets we receive will be IPv4 packets and not IPv6 packets. Later code will break if any IPv6 packets traverse the network interface. Since this is only a quick example, we are making this simplifying assumption.

// filter.c

      if (ip->protocol == IPPROTO_UDP) {
        struct udphdr *udp = (void*)ip + sizeof(*ip);
        if ((void*)udp + sizeof(*udp) <= data_end) {
          if (udp->dest == htons(7999)) {               // (10)
            bpf_trace_printk("udp port 7999\n");
            udp->dest = htons(7998);                    // (11)
          }
        }
      }
    }
  }
  return XDP_PASS;                                      // (12)
}

After doing additional bounds checking and mapping the udphdr struct to the data, in (10) we check to see if the packet's destination port is 7999. Because the literal 7999 is represented in host byte order (little-endian, 0x3f1f) while the port number is represented in network byte order (big-endian, 0x1f3f), we use the htons function ("host to network short") to properly compare them.

If the packet's UDP destination port was in fact 7999, in (11) we modify the destination port value to 7998. Note that because the udp struct pointer still points to an offset from the original data pointer, we are modifying the raw bytes of the packet itself, not a copy.

In (12), regardless of whether we modified the packet or not, we return XDP_PASS to pass the packet up to the normal network stack for further processing.

06. Putting it all together

Now, let's go back to our original example.

nc -kul 127.0.0.1 7999
nc -kul 127.0.0.1 7998

In our third terminal, run:

nc -u 127.0.0.1 7999

Before sending any data, let's also now run main.py to load our new XDP application (filter.c) onto the loopback interface.

$ sudo ./main.py
b'     ksoftirqd/2-21    [002] ..s. 367485.247738: 0: got a packet'
b'     ksoftirqd/2-21    [002] ..s. 367485.247802: 0: got a packet'
b'           <...>-728756 [001] ..s1 367485.980134: 0: got a packet'
b'           <...>-728756 [001] ..s1 367485.980157: 0: udp port 7999'
b'           <...>-728756 [001] ..s1 367485.980200: 0: got a packet'

You should see the "got a packet" message for each new packet on the loopback interface. Now, if you type some data into the third-terminal nc instance, you should see the "udp port 7999" message appear as well.

You should also see that your message is received, not by the nc instance listening on port 7999 as before, but by the nc instance listening on port 7998 now.

07. What else?

This post really just scratches the surface of what you can do with eBPF and XDP, but hopefully gives you a general sense of the process of how the main components work together to make potentially complex decisions about packet filtering or packet rewriting.

One thing not covered in this article is how the eBPF/XDP program and the user-space loader program can communicate with one another during program operation using maps. Maps are essentially shared memory locations that can be accessed by both the eBPF program and the user-space component, and can be used generally to share data back and forth. Using maps, a user-space component with access to a wide set of libraries for rich querying and decision-making capabilities can determine what the eBPF program should do, and can configure the eBPF program, in real time, to do just that.

Some of the coolest work using eBPF and XDP these days is coming out of companies like Cloudflare and Cilium. Cloudflare uses XDP extensively as part of their DDoS mitigation strategy, something they detail on their blog. Cilium uses XDP to provide a high-performance networking plane for Kubernetes and Docker. The sky really is the limit. eBPF and XDP let you do pretty much anything you want in the networking arena.

08. Code Samples

main.py

#!/usr/bin/env python3

from bcc import BPF
import time

device = "lo"
b = BPF(src_file="filter.c")
fn = b.load_func("udpfilter", BPF.XDP)
b.attach_xdp(device, fn, 0)

try:
  b.trace_print()
except KeyboardInterrupt:
  pass

b.remove_xdp(device, 0)

filter.c


#define KBUILD_MODNAME "filter"
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/in.h>
#include <linux/udp.h>

int udpfilter(struct xdp_md *ctx) {
  bpf_trace_printk("got a packet\n");
  void *data = (void *)(long)ctx->data;
  void *data_end = (void *)(long)ctx->data_end;
  struct ethhdr *eth = data;
  if ((void*)eth + sizeof(*eth) <= data_end) {
    struct iphdr *ip = data + sizeof(*eth);
    if ((void*)ip + sizeof(*ip) <= data_end) {
      if (ip->protocol == IPPROTO_UDP) {
        struct udphdr *udp = (void*)ip + sizeof(*ip);
        if ((void*)udp + sizeof(*udp) <= data_end) {
          if (udp->dest == ntohs(7999)) {
            bpf_trace_printk("udp port 7999\n");
            udp->dest = ntohs(7998);
          }
        }
      }
    }
  }
  return XDP_PASS;
}