// Before: blindly trusting pmem and DAX let use_dax = true; // After: Explicitly disabling DAX for rootfs unless strictly controlled // and favoring virtio-blk-pci which enforces RO at the block layer let use_dax = !conf.disable_new_netns && !is_arm64;

So, how do we weaponize this? We are inside a container. We want to run code as the Guest VM root. We need CAP_MKNOD (which is often available in less-restrictive container profiles or older Kubernetes setups).

Step 1: Create the Device First, we create the raw block device node for the physical memory. The major number for pmem is usually 259.

mknod /dev/pmem0 b 259 0

Step 2: Locate the Target We can't just write garbage anywhere; we'll crash the kernel. We need to find the physical offset of a binary that the Guest OS executes regularly. systemd-tmpfiles or systemd service binaries are perfect targets. We can calculate this offset by reading the filesystem structure (since we can read /dev/pmem0 just fine).

Step 3: The Overwrite We seek to the calculated offset and write our payload. Since the memory is mapped MAP_PRIVATE by the hypervisor, our write succeeds and is stored in a new physical page allocated to the VM. The Guest OS's page tables are updated to point to this new 'dirty' page.

# Concept Python Exploit
fd = open("/dev/pmem0", "r+b")
fd.seek(TARGET_BINARY_OFFSET)
# Overwrite the binary's entry point with shellcode
fd.write(b"\x7f\x45\x4c\x46...<shellcode>...")
fd.close()

Step 4: Trigger We wait. When the Guest OS tries to run the binary we just overwrote (e.g., via a cron job or timer), it loads our dirty page from RAM instead of the clean page from the disk image. Our shellcode executes with full root privileges inside the microVM context.

Product

Affected Versions

Fixed Version

Kata Containers

< 3.27.0

3.27.0

Attribute

Detail

CWE

CWE-732

CVSS

9.4 (Critical)

Attack Vector

Local (Container to Guest VM)

Privileges Required

None (if CAP_MKNOD present)

Exploit Status

PoC Available

Platform

Linux / KVM

CVE-2026-24834

9.4

Ghost in the Machine: Breaking Out of Kata Containers via Direct Access Memory Corruption

Alon Barad

Software Engineer

Feb 19, 2026·7 min read·7 visits

PoC Available

Executive Summary (TL;DR)

Kata Containers < 3.27.0 allows containers to write to the 'read-only' Guest VM filesystem via `/dev/pmem0`. An attacker with `CAP_MKNOD` can overwrite system binaries in memory, executing code as root on the Guest VM.

A critical privilege escalation vulnerability exists in Kata Containers allowing a containerized attacker to overwrite the underlying Guest VM's read-only root filesystem. By exploiting a flaw in how the Linux `virtio-pmem` driver handles read-only flags combined with DAX memory mapping, an attacker can modify executable binaries in the guest kernel's memory space. This grants root access to the micro-VM, bypassing container isolation entirely and, in specific ARM64 configurations, potentially corrupting the host image.

Attack Flow Diagram

The Illusion of Isolation

Kata Containers is built on a simple, powerful promise: containers are scary, so let's wrap them in lightweight Virtual Machines (microVMs). It's the security equivalent of wearing a hazmat suit inside a tank. You get the speed of containers with the isolation of hardware virtualization. The Guest OS (the kernel running inside the microVM) is supposed to be immutable, a read-only sanctuary that orchestrates the container's lifecycle.

But here's the thing about 'read-only' in the world of virtualization: it's only as good as the enforcement mechanism. CVE-2026-24834 is a beautiful example of what happens when three different components—the Linux kernel, the Hypervisor, and the storage driver—all assume someone else is locking the door.

This isn't a simple buffer overflow. It's a logic flaw in how memory-mapped devices (DAX) talk to the guest kernel. It turns out, if you ask the Linux kernel nicely (or rudely, via mknod), it will let you scribble all over the memory pages that are supposed to be your immutable hard drive. This vulnerability allows a standard container process to reach down through the floorboards and rewrite the operating system running underneath it.

The Three-Headed Monster: Root Cause

To understand this exploit, you have to appreciate the comedy of errors that occurred between the Guest Kernel and the Hypervisor. The vulnerability relies on the interaction between virtio-pmem (Persistent Memory), DAX (Direct Access), and Cloud Hypervisor.

1. The Kernel's Apathy: The Linux virtio-pmem driver has a probe path that is essentially gaslighting us. Even if the underlying device is flagged as Read-Only by the hypervisor, the driver code explicitly clears the nd_region->ro flag. It effectively says, 'I see you want this to be read-only, but I'm going to ignore that.' This results in the block layer exposing /dev/pmem* devices as writable (brw-rw----).

2. The Hypervisor's Copy-on-Write: Cloud Hypervisor, trying to be efficient, maps the backing file into the guest using MAP_PRIVATE. This creates a Copy-on-Write (CoW) mechanism. When the guest writes to this memory, it doesn't error out; instead, it allocates a new private page in RAM and writes there. The hypervisor thinks, 'This is fine, they are only dirtying their own RAM, not the disk.'

3. The DAX Bypass: Here is the kill shot. Kata uses DAX to map the root filesystem directly into the guest's address space for performance. Because it's a direct map, the hypervisor's storage emulation layer is bypassed for reads and writes. When the container writes to the pmem device, it's modifying the actual memory pages that the Guest OS thinks are its executable binaries.

The Code: Fixing the Unfixable

The fix required a strategic retreat. The developers realized that virtio-pmem combined with DAX was simply too permissive for a secure rootfs implementation. The patch, specifically commit 6a672503973bf7c687053e459bfff8a9652e16bf, changes the default storage driver and mounting logic.

Here is the logic shift in the configuration generation:

// Before: blindly trusting pmem and DAX
let use_dax = true;
 
// After: Explicitly disabling DAX for rootfs unless strictly controlled
// and favoring virtio-blk-pci which enforces RO at the block layer
let use_dax = !conf.disable_new_netns && !is_arm64;

The key change involves switching from virtio-pmem to virtio-blk-pci. Unlike pmem, virtio-blk operates via block requests. If the Guest OS tries to write to a read-only virtio-blk device, the hypervisor intercepts the request and returns a hard error (EPERM), rather than silently allocating a CoW page in RAM.

> [!NOTE] > The patch also specifically targets ARM64. On ARM64 QEMU, NVDIMM read-only support was missing entirely. This meant writes didn't just stay in RAM—they could potentially flush back to the host backing file, corrupting the image for every VM on the server.

Exploitation: Inception-Style Root

Step 1: Create the Device First, we create the raw block device node for the physical memory. The major number for pmem is usually 259.

mknod /dev/pmem0 b 259 0

# Concept Python Exploit
fd = open("/dev/pmem0", "r+b")
fd.seek(TARGET_BINARY_OFFSET)
# Overwrite the binary's entry point with shellcode
fd.write(b"\x7f\x45\x4c\x46...<shellcode>...")
fd.close()

The Impact: Why You Should Care

You might ask, "So what? I broke out of the container into a VM. I'm still trapped in a VM." That's true, but you've crossed a major security boundary.

Guest Root Access: You now control the kernel that manages the container. You can bypass all network policies enforced at the guest level, intercept traffic from other containers in the same pod (if using shared networking), and manipulate the container runtime agent.

The ARM64 Catastrophe: If you are running on ARM64 with QEMU, this vulnerability escalates from "Container Breakout" to "Host Persistence". Because QEMU on ARM64 lacks the ability to enforce read-only on NVDIMMs, the writes might bypass the CoW safety net and commit to the actual backing file on the host. This means if you reboot the VM, the malware is still there. If other VMs share the same base image, you just infected them all.

This vulnerability fundamentally breaks the "Hard Multi-tenancy" promise of Kata Containers for affected versions. The barrier between the untrusted workload and the control plane (the Guest OS) is non-existent.

Mitigation: Closing the Hole

Fixing this requires a two-pronged approach: patching the software and hardening the configuration.

1. Upgrade Immediately Update Kata Containers to version 3.27.0 or later. The patch forces the usage of virtio-blk and disables DAX where it is unsafe.

2. Configuration Hardening If you cannot upgrade immediately, you must modify your Kata configuration to avoid virtio-pmem. Ensure your configuration.toml uses virtio-blk for the rootfs_driver.

3. Runtime Restrictions The exploit requires CAP_MKNOD to create the /dev/pmem0 device. Use a strict security context in Kubernetes to drop this capability. Additionally, use Kata's Agent Policy to prevent the container from accessing raw block devices.

securityContext:
  capabilities:
    drop:
      - MKNOD

Without mknod, the attacker cannot instantiate the handle to the memory device, effectively neutralizing the exploit vector even on vulnerable versions.

Official Patches

Kata ContainersKata Containers v3.27.0 Release Notes

Fix Analysis (1)

Technical Appendix

CVSS Score

9.4/ 10

CVSS:3.1/AV:L/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H

Affected Systems

Kata ContainersCloud HypervisorQEMU (ARM64 specific impact)

Affected Versions Detail

Product	Affected Versions	Fixed Version
Kata Containers Kata Containers	< 3.27.0	3.27.0

Attribute	Detail
CWE	CWE-732
CVSS	9.4 (Critical)
Attack Vector	Local (Container to Guest VM)
Privileges Required	None (if CAP_MKNOD present)
Exploit Status	PoC Available
Platform	Linux / KVM