“Some say software is eating the world, I would say that: BPF is eating software”
To paraphrase Brad Gerstner, there are two broad approaches to investing: “working the cocktail circuit” or being an “anthropologist”. At the cocktail parties I’ve attended of late, touting ~new ways to program the Linux kernel isn’t always a surefire way to make friends, so anthropology it is. Like many emerging technologies (a la WebAssembly), it isn’t exclusively eBPF itself that represents a direct opportunity, but also the potential second-order solutions that the technology enables.
The market adjacent to eBPF is easy to get excited about. Linux is the world’s most used operating system — the trillion-dollar cloud runs on Linux. Yet despite its popularity, Brendan Gregg has famously likened the Linux kernel to HTML — a non-programmable file.
The programmability of the web is what led to the proliferation of web-based applications vs. pages. Similarly, eBPF facilitates the building of kernel-based applications. At this point at the cocktail party, I’m duly asked “why would I want to build kernel-based applications?”, to which I retort back “why would I want to build web-based applications?”. The web and the kernel have/are both software interfaces with assigned roles, which means that they receive specific inputs (data) and permissions (functions) that adjacent applications can use to conduct certain tasks.
Anyway, enough preambling. There are plenty of deeply technical posts elaborating on eBPF and pontificating on its future — this post will do the same, but hopefully in a more approachable and actionable way.
If you’re building eBPF developer tooling or solving problems with eBPF generally I’d love to talk to you / grab coffee if you’re in London: firstname.lastname@example.org
eBPF (Extended Berkeley Packet Filters) is a method of programming the Linux kernel. For the uninitiated (the lucky ones) the kernel is the piece of software within the Linux operating system that enables applications in userspace (e.g., Slack) to interact with a computer's underlying hardware (e.g., memory). The kernel’s intermediary role means that it is incredibly privileged — it “sees” all information communicated and decides what to relay/execute.
The kernel has historically been difficult to program due to its central role and high requirement for stability and security. The rate of innovation at the operating system level has thus traditionally been lower compared to functionality implemented outside of the operating system. Prior to eBPF, the kernel was ~safely programmed via: 1) changing the kernel’s source code (which can take years) or 2) loading “kernel modules” (which aren’t backward compatible).
Key Point: Alternative methods of programming the kernel are slow to implement or error-prone.
So, how does eBPF enable the kernel to be programmed in a way that is expedient, performant, and less prone to failure?
At a high-level, eBPF programs go through a series of steps to ensure that they’re insusceptible to the issues loadable kernel modules create. These steps are encapsulated under the process known as the “eBPF runtime”.
This “eBPF runtime” consists of 3 core processes:
1) Program Development
Firstly, developers write the code that will program the kernel. Developers will often write eBPF programs via higher-level languages like Python/C because it’s easier to write programs in these languages vs. writing eBPF bytecode.
** We’re going to jump into the deep end a little now but stay with me **
Within these programs, developers will specify a “program type” and “hook point”.
What’s most important to note here is that eBPF requires this detail given how privileged the kernel is. It would be a significant security risk to give a single eBPF program unnecessary access to all kernel functionality and/or events.
Let’s walk through an example eBPF program courtesy of Liz Rice with some additional comments from my side!
This "eBPF code" is then compiled to a specific bytecode format - eBPF bytecode. Post being compiled, the eBPF bytecode is then sent to the kernel via the bpf() system call. System calls or “syscalls” are the APIs exposed by the kernel which allow userspace applications to communicate with the kernel.
Technical Detail: Bytecodes are numeric representations of your human-readable code (e.g., Python). They’re an intermediate state between your human-readable code and “machine code”. To avoid getting into the weeds here, just ask yourself what could code that’s more similar (remember, it’s an intermediate state) to machine code (ie a binary language that can command hardware) enable? One answer is a more performant interpretation / subsequent compilation of this code. This is all you need to know for now as bytecode relates to eBPF.
2) Program Verification
Now that the bytecode is sent into the kernel, the kernel passes this bytecode through the “eBPF verifier”. The eBPF verifier can be thought of as a function that receives the bytecode as an argument and runs a series of tests to make sure that the bytecode is “safe” to run.
Safe means that a user has permission to load eBPF bytecode and that running this eBPF bytecode won’t crash the kernel, expose arbitrary kernel memory, and much more. Again, note the checks and balances that are taken by eBPF to ensure that the kernel is protected from these programs.
Once the bytecode runs through the eBPF verifier it is either approved or rejected.
3) Program Attachment
Now that this intermediate bytecode has been verified as safe to run, the program is attached to the developer’s pre-defined hook point. Remember, the hook point is specified in your code. In our sample code above, the specified hook point is the sys_clone system call which is called every time a new process starts.
The kernel then compiles the bytecode further to “native code” via a JIT (just-in-time) compiler.
Technical Detail x2:
JIT Compiler: JIT compilers compile code during runtime (when the code is being executed) vs. before runtime.
Native Code: Machine code. More technically known as a CPU’s Instruction Set Architecture (e.g., x86 or ARM).
So, as some of you may already be thinking, eBPF is ultimately a virtual machine within the kernel. It executes sandboxed programs at near-native speeds.
Key Point: Our initial question asked: how does eBPF help program the kernel in a way that is expedient, performant, and fail-proof? Via the eBPF runtime.
To bring this post to life, let’s look at some examples of eBPF being used in production.
You may be noticing some common use cases of eBPF here. Namely, observability, security, and networking.
It’s important to note however that eBPF isn’t always a superior method for programming the kernel. Like all technologies, using eBPF comes with its own set of trade-offs. Potential challenges faced using eBPF include:
eBPF gets its fair share of flak for its name, and arguably rightly so. However, if you trace back the technology’s history there’s some romance to be found in the four-letter acronym. To me, it captures the technology’s lineage rather perfectly.
1. Packet Filter
In 1993 the paper “The BSD Packet Filter - A New Architecture for User-level Packet Capture” was presented by Steven McCanne and Van Jacobson at the 1993 Winter USENIX conference.
In the paper, McCanne and Jacobson described the BSD (B) Packet (P) Filter (F). This packet filter leveraged a highly-efficient kernel-based virtual machine to solely do traffic filtering in a performant manner while still preserving a boundary between the filter code and the kernel.
Sounds familiar to one of Cloudflare’s use cases?
What was truly prescient by the duo however was how they designed the virtual machine for generality. They specified:
There were predecessors to BPF such as the CMU/Stanford Packet Filter. Steve & Van (we’re on a first-name basis now) worked at Lawrence Berkeley Laboratory
In 2014 (same year as Kubernetes!) Linux 3.18 was released which contained the first implementation of an extended (ie more usable) BPF. This release, and subsequent releases, consisted of many improvements to BPF:
Perhaps the most salient point here is that we’re witnessing a step-change in the rate of innovation within the kernel as developers are markedly less constrained by the kernel as a development environment. This rate of innovation will inevitably create new breaking points for the technology which I’ll be keeping an eye on. An emerging example here is the lifecycle management of numerous eBPF programs across multiple nodes - l3af is setting out to solve this problem in a user-friendly way.
Additional eBPF developer tools I’m excited to see emerge include:
As eBPF continues to proliferate, Linux source code maintainers are also increasingly incentivized to add additional support for eBPF. For example, in this Linux v5.7 patch, support was added for Linux Security Modules as hook points. LSMs are deserving of a post of their own — the main point to note here is that as eBPF programs gain access to new hook points, kernel functions, and other development utilities, new applications of the technology will crop up.
As a selfish aside, I’m interested in projects reducing the complexity of runtime enforcement - if you’re working on this say hello >> email@example.com
Aware I may very much sound like a man with a hammer at this point. However, as the aforementioned eBPF support continues, I believe we’ll see a gradual shift towards more traditionally userspace-bound programs being executed within the kernel instead. Why? Because unlike eBPF programs, userspace programs are completely isolated from the hardware that they ultimately rely on — this means that they incur a drop in performance between ~25-30% (!).
Whilst innovation within the kernel across networking, observability, and security will likely continue to be eBPF’s core commercial use case, it’s worth doing some blue-sky thinking too. As a technology, eBPF is an incredibly simple instruction set when compared to an ISA like x86. This makes the eBPF virtual machine far more portable (more complexity = more that can go wrong) than its peers. Thus, many — h/t to Ferris Ellis — have speculated on the virtual machine’s usage in entirely new systems beyond the kernel such as in “smart NICs.”
Technical Detail: Smart Network Interface Cards (NICS) allow the processing of network traffic, to varying degrees, to be offloaded to the NICs themselves (vs. the CPU).
New systems present new constraints. Fortunately for eBPF, constraints are what led to its creation. I’m looking forward to seeing if the technology becomes the standard development environment across increasingly powerful hardware. If so, it may very well have software for breakfast.
If you’re building eBPF developer tooling or solving problems with eBPF generally I’d love to talk to you / grab coffee if you’re in London >> firstname.lastname@example.org