runc < 1.1.5 allows container images to replace the `/proc` directory with a symbolic link. This confuses path-based security modules like AppArmor, causing them to fail to apply or enforce profiles correctly. An attacker can use this to bypass confinement and access sensitive host resources.
A vulnerability in runc allowing attackers to bypass AppArmor and SELinux profiles by crafting container images with a symlinked /proc directory, effectively blinding the host's security controls.
Let’s start with a dirty little secret: containers are lies. They aren't mini-VMs; they are just processes wearing trench coats (namespaces) and standing behind a velvet rope (cgroups). To keep these processes from acting like drunken tourists in your kernel, we rely on the bouncers: AppArmor and SELinux. These Linux Security Modules (LSMs) are the only things stopping a container from peeking at your host's sensitive bits.
runc is the industry-standard CLI tool that actually spawns these containers. It does the heavy lifting for Docker, Kubernetes, and Podman. You trust it to set up the sandbox correctly every single time. But here's the problem with trust: it assumes the other party isn't easily confused by a simple party trick.
CVE-2023-28642 is exactly that kind of party trick. It's a vulnerability that exploits the fundamental way runc sets up the container environment. By simply replacing a standard directory with a symbolic link in a malicious image, an attacker can effectively blind the bouncers, allowing the container to run without the strict supervision you thought was in place.
To understand this exploit, you have to understand how AppArmor works. AppArmor is largely path-based. It says, "This process cannot touch /proc/sys/kernel/shm*." It relies on the filesystem path being exactly what it expects. But filesystems in Linux are flexible, sometimes too flexible.
The vulnerability arises in how runc handles the /proc mount point inside the container. During the container creation phase, runc needs to mount the proc filesystem so the container can see process information. However, if the container image (the rootfs) has replaced the empty /proc directory with a symbolic link pointing somewhere else, runc didn't stop to ask questions.
When runc proceeds to mount the pseudo-filesystem over this symlink, or when AppArmor tries to resolve paths relative to it, the logic breaks down. By symlinking /proc to a different directory, the attacker changes the canonical path of the files accessed underneath it. Since AppArmor rules are looking for specific paths starting with /proc/, accesses through the symlinked path effectively bypass the profile rules. It's like putting a "Do Not Enter" sign on the front door, but the attacker simply relabeled the back door to "Front Door" and walked right in.
The fix for this issue is almost painfully simple, which highlights how subtle the oversight was. The developers simply forgot to check what /proc actually was before mounting over it. They assumed it was a directory because, well, it's always a directory. Right?
Here is the logic that was introduced in runc 1.1.5 to kill this bug. The patch forces runc to inspect the file mode of the mount destination before proceeding:
// Inside the setup code for container mounts
fi, err := os.Lstat(dest)
if err != nil {
if !os.IsNotExist(err) {
return err
}
} else if fi.Mode()&os.ModeSymlink != 0 {
// The Fix: Explicitly forbid symlinks at /proc
return fmt.Errorf("%s is a symlink", dest)
}Before this check, runc would blindly follow the symlink or mount on top of it, creating the confused deputy scenario for the LSMs. This snippet adds a hard guardrail: if /proc is a symlink, the container fails to start. Game over.
Exploiting this doesn't require complex memory corruption or ROP chains. It just requires the ability to supply a container image. This is a "Configuration" attack, which are often the most stable and dangerous kind.
An attacker would craft a Dockerfile like this:
FROM alpine:latest
# Remove the real proc directory
RUN rm -rf /proc
# Create a symlink pointing to a target we want to mess with
# or simply to confuse the path resolution
RUN ln -s /variable/path /proc
CMD ["/bin/sh"]When a victim runs this image with standard flags (e.g., docker run --rm -it malicious-image), runc spins up the environment. Because the /proc mount point is now a symlink, the AppArmor profile loaded for the container (which expects standard paths) fails to attach correctly to the actual file operations.
The result? The process inside the container runs with significantly fewer restrictions than intended. While this doesn't automatically give you root on the host, it removes the safety net. If you have a secondary exploit that AppArmor would usually block (like writing to sensitive /sys files), that exploit now works.
In the grand scheme of container escapes, this is a "soft" escape. It disables the security profile, but it doesn't instantly pop a shell on the host. However, in security, defense in depth is everything. AppArmor is often the only thing preventing a compromised container from interacting with the underlying kernel in dangerous ways.
If you are running multi-tenant Kubernetes clusters or allowing developers to run arbitrary images, this is a critical flaw. It means that any user who can define a pod or run a container can opt-out of your security policies without permission. They can potentially read sensitive host configuration, interfere with other processes, or leverage kernel vulnerabilities that are normally suppressed by the default Docker AppArmor profile.
It is also a prerequisite for more complex chains. Many container breakouts rely on the LSM being disabled or bypassed. This vulnerability serves that bypass on a silver platter.
The remediation is straightforward: Update runc.
If you are on version 1.1.5 or later, you are safe. The runtime will detect the symlink shenanigans and refuse to start the container, throwing an error similar to "/proc is a symlink".
If you cannot patch immediately (perhaps you're running a legacy orchestrator), your mitigation options are limited to admission control:
/proc is not a directory, reject the image.userns-remap in Docker) provides a strong layer of isolation that doesn't rely solely on AppArmor. Even if AppArmor is bypassed, the root user inside the container is just a nobody on the host.CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:L| Product | Affected Versions | Fixed Version |
|---|---|---|
runc Open Container Initiative | < 1.1.5 | 1.1.5 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-281 |
| Attack Vector | Local (Image-based) |
| CVSS | 6.1 (Medium) |
| Privileges Required | None (User Interaction) |
| User Interaction | Required (Victim must run image) |
| Impact | Security Bypass (AppArmor/SELinux) |
Improper Preservation of Permissions
Get the latest CVE analysis reports delivered to your inbox.