Apr 23, 2026·7 min read·3 visits
Argo Workflows fails to properly validate the array index when parsing the pod garbage collection annotation. Submitting a workflow with a malformed annotation causes a persistent Go runtime panic in the controller process, resulting in a denial of service.
CVE-2026-40886 is a high-severity denial-of-service vulnerability in Argo Workflows caused by an unhandled Go runtime panic. A malformed Kubernetes annotation triggers an out-of-bounds array access in the controller's pod informer, leading to a permanent crash loop that halts all workflow orchestration operations.
Argo Workflows relies on a central controller to orchestrate container-native workflows across a Kubernetes cluster. The controller utilizes a Kubernetes Informer to monitor pod state changes and manage resource lifecycles, including garbage collection operations. This vulnerability resides within the controller's pod informer logic, specifically in the function responsible for determining pod garbage collection strategies.
The underlying flaw is categorized as CWE-129: Improper Validation of Array Index. When the controller processes a pod containing the workflows.argoproj.io/pod-gc-strategy annotation, it fails to ensure the annotation string meets expected formatting constraints before accessing the resulting indexed elements. This oversight creates a critical fragility in the core event processing loop.
The security impact is a high-severity Denial of Service (DoS). Because the missing validation leads to a Go runtime panic within a background goroutine, the entire controller process terminates abruptly. Furthermore, the persistent nature of Kubernetes API resources ensures the controller will repeatedly encounter the malformed pod upon restart, resulting in a permanent CrashLoopBackOff state that halts all workflow orchestration cluster-wide.
The root cause of CVE-2026-40886 is an unchecked memory access during string tokenization in the podGCFromPod() function. This function extracts the value of the workflows.argoproj.io/pod-gc-strategy annotation and attempts to parse it into a strategy type and a delay duration. The implementation incorrectly assumed that the annotation value would always contain a forward slash character acting as a delimiter.
The vulnerable code utilizes the strings.Split(val, "/") function from the Go standard library. When passed a string without the specified delimiter, strings.Split returns a string slice containing exactly one element representing the original string. Immediately following this operation, the code explicitly accesses the second element of the resulting slice via parts[1].
In the Go programming language, attempting to access an index beyond the bounds of a slice triggers a synchronous runtime panic. Since this index access occurs without first verifying the length of the parts slice (len(parts) > 1), a maliciously crafted or inadvertently malformed annotation guarantees an out-of-bounds read and subsequent runtime panic.
Compounding the issue, this panic originates inside the pod informer's event handler. In Go, an unrecovered panic within a goroutine unwinds the stack and crashes the entire application process. The controller lacks a global recovery mechanism for this specific informer goroutine, escalating a localized parsing error into a total application failure.
An examination of the controller code reveals the specific mechanics of the vulnerability and the simplicity of the subsequent patch. The vulnerable segment in the controller codebase manually structures the PodGC object using hardcoded slice indices.
The vulnerable implementation forces a panic if the parts slice has a length of 1:
func podGCFromPod(pod *apiv1.Pod) wfv1.PodGC {
if val, ok := pod.Annotations[common.AnnotationKeyPodGCStrategy]; ok {
parts := strings.Split(val, "/")
// CRITICAL VULNERABILITY: parts[1] accessed without bounds checking
return wfv1.PodGC{Strategy: wfv1.PodGCStrategy(parts[0]), DeleteDelayDuration: parts[1]}
}
return wfv1.PodGC{Strategy: wfv1.PodGCOnPodNone}
}The official patch applied in commit 4fe54e529eff5519233287251e5adf9a61b9fc67 addresses the flaw by transitioning from strings.Split to strings.Cut. Introduced in Go 1.18, strings.Cut is designed specifically for safe string partitioning and avoids slice allocations entirely.
The patched implementation safely handles missing delimiters:
func podGCFromPod(pod *apiv1.Pod) wfv1.PodGC {
if val, ok := pod.Annotations[common.AnnotationKeyPodGCStrategy]; ok {
strategy, delay, _ := strings.Cut(val, "/")
// FIX: strings.Cut assigns empty string to delay if "/" is absent
return wfv1.PodGC{Strategy: wfv1.PodGCStrategy(strategy), DeleteDelayDuration: delay}
}
return wfv1.PodGC{Strategy: wfv1.PodGCOnPodNone}
}If the forward slash delimiter is absent, strings.Cut returns the original string as the strategy variable and an empty string as the delay variable. This eliminates the out-of-bounds slice access while maintaining intended functionality for properly formatted annotations. The fix is comprehensive and mitigates variant attacks targeting this specific parsing path.
Exploitation of CVE-2026-40886 requires minimal privileges: an attacker needs only the standard role-based access control (RBAC) permissions required to create Workflow resources within the cluster. Network access to the Kubernetes API server or the Argo Workflows API endpoint is necessary to submit the malicious payload.
The attack sequence is deterministic and highly reliable. The attacker submits a YAML manifest defining a new Workflow. Within this manifest, the attacker injects the workflows.argoproj.io/pod-gc-strategy annotation under the podMetadata block, assigning it a string value that lacks a forward slash delimiter.
A verified Proof-of-Concept (PoC) demonstrating this attack involves applying the following workflow manifest:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: crash-podgc
spec:
entrypoint: main
serviceAccountName: default
podGC:
strategy: OnPodCompletion
podMetadata:
annotations:
workflows.argoproj.io/pod-gc-strategy: "NoSlash"
templates:
- name: main
container:
image: alpine:3.18
command: [echo, "hello"]Upon submission, the Kubernetes API persists the resource and schedules the underlying pod. The Argo Workflows controller detects the new pod via its informer cache mechanisms. As the controller executes podGCFromPod() during the event processing phase, it parses the malformed annotation, triggers the array index panic, and terminates execution entirely.
The impact of this vulnerability extends far beyond a simple transient application crash due to the design of Kubernetes controllers. Kubernetes operates on a declarative state model. When the controller process crashes, the malformed pod object remains registered persistently in the underlying etcd data store.
When the orchestration platform detects the Argo Workflows controller pod has terminated, the corresponding deployment engine automatically restarts the pod to enforce the desired replica count. Upon initialization, the newly spawned controller synchronizes its local informer cache with the Kubernetes API, immediately pulling the malicious pod state back into its processing queue.
This architectural behavior guarantees a permanent denial of service condition. The workflow controller enters an unrecoverable CrashLoopBackOff state, rendering the entire workflow orchestration system inoperable. Existing workflow execution will halt, and new workflows will remain pending indefinitely.
The assigned CVSS v3.1 score of 7.7 accurately reflects this operational impact. While the vulnerability does not directly permit data exfiltration or arbitrary code execution, the absolute loss of system availability combined with a low exploitation barrier designates this flaw as a critical operational risk.
The primary and most effective remediation for CVE-2026-40886 is upgrading to a patched release of Argo Workflows. The vulnerability is fully resolved in the official releases of v3.7.14 and v4.0.5. System administrators are strongly advised to deploy these updates immediately, as the exploit requirements are trivial and the disruption severe.
If upgrading the controller is not immediately feasible, defense-in-depth measures can be implemented at the Kubernetes API layer. Administrators can deploy an admission controller, such as OPA Gatekeeper or Kyverno, to validate the workflows.argoproj.io/pod-gc-strategy annotation on all incoming Workflow objects. The admission policy should be configured to reject any requests where the annotation value fails to match the expected formatting schema.
For clusters already experiencing active exploitation and trapped in a persistent crash loop, manual administrative intervention is required to restore service. The controller process cannot self-heal from this corrupted state, as the problematic payload resides within the Kubernetes API.
Recovery is achieved by deleting the malicious resource directly from the cluster using standard administrative tooling. Executing kubectl delete workflow <workflow-name> -n <namespace> removes the poisoned state from the API server. Once the resource is purged, the Argo Workflows controller will restart successfully, bypass the previously malformed state, and resume standard orchestration processing.
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:C/C:N/I:N/A:H| Product | Affected Versions | Fixed Version |
|---|---|---|
Argo Workflows Argoproj | >= 3.6.5, <= 3.6.19 | v3.7.14 |
Argo Workflows Argoproj | >= 3.7.0, <= 3.7.13 | v3.7.14 |
Argo Workflows Argoproj | >= 4.0.0, <= 4.0.4 | v4.0.5 |
| Attribute | Detail |
|---|---|
| CVE ID | CVE-2026-40886 |
| CVSS v3.1 Score | 7.7 |
| Attack Vector | Network |
| CWE | CWE-129 |
| Impact | Denial of Service (Availability: High) |
| Exploit Status | Proof of Concept Available |
| KEV Status | Not Listed |
The software uses untrusted input to calculate an array index but does not validate that the index is within expected bounds.