Buildah and Docker Engine (Moby) were initializing containers with a fully populated 'Inheritable' capability set. This violated the principle of least privilege, allowing processes within the container to easily elevate their privileges to the container's maximum bounding set simply by executing binaries with file capabilities set. It's a classic case of "default insecure" configuration.
A deep dive into a subtle but significant flaw in how Buildah and Docker Engine initialized Linux process capabilities. By misconfiguring the Inheritable set, these runtimes allowed unintended privilege escalation within containers, turning the complex mathematics of Linux permissions against the security model.
Linux capabilities are essentially the operating system's attempt to break the monolithic root user into tiny, manageable shards of power. Instead of checking "Is UID 0?", the kernel checks "Do you have CAP_NET_BIND_SERVICE?" It’s a brilliant system in theory, designed to enforce granular security. But in practice, it is a confusing labyrinth of bitmasks that even seasoned kernel developers struggle to navigate correctly.
There are not just "capabilities"; there are sets of capabilities attached to every process: Permitted, Effective, inheritable, Bounding, and Ambient. It's a 5-dimensional chess game of permissions. When a container runtime spins up a new environment, it is responsible for initializing this matrix perfectly. If it messes up even one set, the math breaks down.
CVE-2022-27651 is the story of what happens when that initialization is slightly lazy. The developers at Buildah and Moby (Docker) decided to populate the Inheritable set with everything in the Bounding set. On the surface, it looked harmless—after all, the Bounding set prevents the process from getting new powers, right? Wrong. They essentially unlocked the door and left it slightly ajar, waiting for anyone with a setcap binary to kick it open.
To understand the bug, we have to look at the terrifying equation the Linux kernel uses to calculate permissions during an execve(2) system call. When a process executes a binary, its new capabilities are calculated as follows:
$$P'(permitted) = (P(inheritable) \cap F(inheritable)) \cup (F(permitted) \cap cap_bset)$$
Translated to human English: A process can inherit capabilities only if they are present in both the process's current Inheritable set ($P$) and the file's Inheritable set ($F$).
Standard security practice dictates that the process's Inheritable set ($P(inheritable)$) should be empty by default. This ensures that even if a binary has file capabilities set, they don't automatically transfer to the process unless explicitly intended (usually via the Ambient set).
The Vulnerability: Buildah and Moby ignored this best practice. They initialized the container process with a full Inheritable set (matching the Permitted set). This meant the first half of that equation—$P(inheritable)$—was effectively TRUE for everything. Consequently, any binary inside the container with File Inheritable bits set would immediately grant those privileges to the executing user, bypassing the intended restrictions of the Effective set. It turned a defensive wall into a permeable membrane.
The fix required telling the runtime to stop being so generous. We can see the stark difference in the patch applied to Buildah. The goal was to ensure the Inheritable set is explicitly empty, rather than a mirror of the Bounding set.
Here is the logic shift in the OCI spec generation:
// Before: Inheritable set was implicitly or explicitly synchronized
// with Permitted/Bounding sets during spec generation.
// After: Explicitly zeroing out the Inheritable set
// Commit: e7e55c988c05dd74005184ceb64f097a0cfe645b
g.Config.Process.Capabilities.Inheritable = []string{}
// In setupCapAdd and setupCapDrop functions:
// The code stopped appending added capabilities to the Inheritable list.By hardcoding the Inheritable slice to an empty list []string{}, the developers restored the expected behavior. Now, even if a file has bits set in F(inheritable), the kernel equation multiplies them by zero ($P(inheritable)$), resulting in no privilege gain.
Because this is a logic flaw in environment initialization, "exploitation" is really just standard Linux behavior working in a way administrators didn't plan for. There is no buffer overflow here, just a logic gate left open.
Step 1: The Litmus Test
To check if your container runtime is vulnerable, you don't need a complex C program. You just need grep. Spin up a container and check the status of process 1:
$ grep ^CapInh /proc/1/status
CapInh: 00000000a80425fb <-- VULNERABLE (Non-zero)If you see zeros (0000000000000000), you are safe. If you see hex values, your process has inheritable capabilities waiting to be triggered.
Step 2: The Attack Chain
An attacker with low-privilege access inside a container (e.g., the www-data user) wants to escalate privileges.
ping (which often has cap_net_raw set) or if the attacker can write a file and use setcap (requires CAP_SETFCAP in their current set, which they might not have, but a misconfigured binary might already exist).CapInh, the new process spawns with the capabilities added to its Permitted set. The attacker can now perform actions (like raw socket manipulation or system administration tasks) that were supposed to be restricted to the container root, effectively bypassing user privilege separation within the container.You might be thinking, "So what? The process is still inside the container! It can't break out to the host!" And you would be technically correct—this CVE does not bypass the Container Bounding Set or namespaces directly. However, it destroys the concept of Defense in Depth.
Modern container security relies on running services as non-root users inside the container. We tell developers: "Don't run as root! Create a node user!" We expect that node user to be restricted.
CVE-2022-27651 renders that advice largely moot if a suitable binary exists on the filesystem. It allows a compromised web application running as a low-privileged user to instantly jump to the maximum privileges allowed to the container. If that container was running with --privileged or permissive capabilities (common in CI/CD pipelines), the attacker now has full control over those capabilities, bringing them one step closer to a full host breakout.
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:N| Product | Affected Versions | Fixed Version |
|---|---|---|
Buildah Containers | <= 1.24.0 | 1.25.0 |
Moby (Docker) Moby Project | < 20.10.9 | 20.10.9 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-276 |
| Attack Vector | Local (Container) |
| CVSS | 6.8 (Medium) |
| Impact | Privilege Escalation (Intra-Container) |
| Exploit Status | PoC Available |
| Vulnerable Component | OCI Runtime Spec Generation |
Incorrect Default Permissions
Get the latest CVE analysis reports delivered to your inbox.