Docker and Buildah accidentally left the 'Inheritable' capability set wide open. By default, containers should start with this set empty. Because it wasn't, a process inside a container could elevate its privileges back up to the Bounding Set limits simply by executing a binary with specific file capabilities attached, effectively bypassing security profiles that rely on dropping capabilities from the Effective set.
A logic flaw in Buildah and Moby (Docker Engine) allowed containers to start with a non-empty Inheritable capability set. This subtle misconfiguration permits attackers to 'resurrect' privileges that were intended to be restricted, bypassing container hardening measures by leveraging file capabilities.
Linux capabilities are the bane of every container security engineer's existence. They are a complex, multi-layered permissions system designed to break root into smaller, manageable chunks. Ideally, you strip away everything a container doesn't need, leaving it with a minimal surface area. You assume that if you drop CAP_NET_RAW from a process's effective set, it's gone. Dead. Buried.
But CVE-2022-27651 proves that in the world of Linux kernels, what is dead may never die—it simply waits in the Inheritable set to be resurrected. This vulnerability, affecting Buildah (<= 1.24.0) and Moby/Docker (before 20.10.14), is a classic case of "default configuration negligence."
Developers assumed that simply defining the Bounding set was enough. They forgot that Linux calculates privileges using a formula that involves three distinct sets: Permitted, Effective, and Inheritable. By leaving the Inheritable set populated by default, they handed attackers a golden key: a mechanism to reclaim privileges that the container runtime ostensibly tried to take away. It’s like locking the front door of your house but leaving the window wide open with a ladder underneath it.
To understand why this is an issue, we have to look at how the Linux kernel decides what permissions a new process gets when you run execve(). This isn't magic; it's a strict boolean algebra formula defined in the kernel source. The formula for the new Permitted set (P'(permitted)) is:
P'(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding))
Here is the translation for humans:
P(inheritable): The capabilities the current process wants to pass down.F(inheritable): The capabilities the executable file allows to be inherited.P(bounding): The absolute ceiling of permissions allowed for this process chain.The vulnerability exists because Buildah and Docker incorrectly initialized P(inheritable). In a secure setup, P(inheritable) should be empty (all zeros). If it is empty, the first half of that equation becomes 0 & F(inheritable), which results in 0. This forces the process to rely solely on the second half of the equation (File Permitted bits), which are rare and hard to use.
However, the vulnerable code mirrored the Bounding set into the Inheritable set. This means P(inheritable) was full. Consequently, an attacker doesn't need F(permitted) (which is hard to get). They just need to set F(inheritable) on a file—which is a standard unprivileged operation if you own the file—and the kernel logic essentially says: "Oh, the process wants to pass it, and the file is marked to receive it? Granted."
The fix is almost laughably simple, highlighting just how subtle this configuration error was. The developers simply forgot to explicitly zero out the inheritable set, so it defaulted to copying the broader configuration.
In the containers/buildah repository, specifically within chroot/run.go, the code constructs the capability map for the new container process. Let's look at the difference.
The Vulnerable Code: The code previously took the OCI spec and just mapped everything 1:1. It assumed that if a capability was in the config, it should be inheritable.
The Fix (Commit e7e55c988c05dd74005184ceb64f097a0cfe645b):
The patch explicitly forces the INHERITABLE key to an empty slice, regardless of what the spec says for other sets.
// container_linux.go / chroot/run.go
capMap := map[capability.CapType][]string{
capability.BOUNDING: spec.Process.Capabilities.Bounding,
capability.EFFECTIVE: spec.Process.Capabilities.Effective,
// VULNERABILITY FIX START
// Old behavior implicitly allowed inheritance.
// New behavior explicitly kills it.
capability.INHERITABLE: []string{},
// VULNERABILITY FIX END
capability.PERMITTED: spec.Process.Capabilities.Permitted,
capability.AMBIENT: spec.Process.Capabilities.Ambient,
}[!NOTE] This change ensures that even if
spec.Process.Capabilities.Inheritablecontains dangerous flags, the runtime ignores them and passes an empty set to the actual system call.
Let's construct a scenario. You are an attacker who has compromised a web application running inside a Docker container. The sysadmin, trying to be secure, configured the container to drop CAP_DAC_OVERRIDE (which allows reading/writing any file) from the Effective set, but they left it in the Bounding set (which is common default behavior).
Because of CVE-2022-27651, your shell process has CAP_DAC_OVERRIDE inside its Inheritable set. It's dormant, but it's there. You cannot use it yet. But you can wake it up.
Step 1: Verify Vulnerability First, check your status.
$ grep CapInh /proc/self/status
CapInh: 00000000a80425fbIf that value is anything other than 0000000000000000, the exploit is possible.
Step 2: Weaponize a Binary
You need a binary that will "catch" the inherited capabilities. You can copy bash or write a small C program.
cp /bin/bash ./mybashStep 3: The Magic Spell (File Caps)
Use setcap to set the "File Inheritable" bit. You don't need root to do this on a file you own, you just need basic file ownership.
setcap cap_dac_override+i ./mybashStep 4: Execution & Elevation
When you run ./mybash, the kernel formula triggers. P(inh) has the bit (thanks to the bug). F(inh) has the bit (thanks to you). The result is promoted to Permitted.
Inside ./mybash, you can now issue a syscall to raise that Permitted capability into your Effective set. Suddenly, you have full file system access, effectively undoing the container's security profile. You have successfully resurrected a dead privilege.
This vulnerability is a perfect example of why "Defense in Depth" is a requirement, not a buzzword. The immediate impact is Privilege Escalation within the Container Boundary.
It is important to clarify: this does not allow you to break out of the container (Container Escape) by itself, nor does it grant you capabilities that were never in the Bounding set to begin with. If CAP_SYS_ADMIN was stripped from the Bounding set at container creation, this exploit cannot bring it back.
However, it completely undermines the concept of dropping capabilities for process hygiene. Many applications start as root, set up their environment, and then drop privileges (e.g., Nginx or Apache). If the Inheritable set is tainted, a compromised child process can simply re-acquire those dropped privileges. It effectively renders the drop instructions in Docker Compose or Kubernetes manifests partially useless against a determined attacker who can execute arbitrary code.
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H| Product | Affected Versions | Fixed Version |
|---|---|---|
Buildah Containers Project | <= 1.24.0 | 1.25.0 |
Moby (Docker) Moby Project | < 20.10.14 | 20.10.14 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-276 (Incorrect Default Permissions) |
| CVSS v3.1 | 6.6 (Medium) |
| Attack Vector | Local (Container Internal) |
| Privileges Required | Low |
| User Interaction | None |
| Scope | Unchanged |
The application creates a process with an inheritable capability set that is not properly restricted, allowing child processes to acquire privileges intended to be dropped.
Get the latest CVE analysis reports delivered to your inbox.