Feb 18, 2026·6 min read·5 visits
Traefik < 2.11.35 and < 3.6.7 contains a DoS vulnerability in its ACME TLS-ALPN-01 challenge handler. The code explicitly clears connection deadlines before a blocking TLS handshake, allowing an attacker to hold connections open indefinitely. This leads to file descriptor and goroutine exhaustion.
A deep dive into a Denial of Service vulnerability within Traefik's ACME TLS-ALPN challenge handling. By failing to implement handshake timeouts, Traefik allows unauthenticated attackers to exhaust system resources with stalled connections.
Traefik is the darling of the cloud-native world. It’s the magic reverse proxy that automatically discovers your Docker containers and—crucially—automatically creates HTTPS certificates for them using Let's Encrypt. To do this, it implements the ACME protocol. One specific flavor of this protocol is the TLS-ALPN-01 challenge, which allows validation over port 443 by using a special TLS extension.
Here’s the setup: When traffic hits Traefik, it peeks at the TLS ClientHello. If it sees the ALPN protocol acme-tls/1, it thinks, "Aha! This is for me! I need to prove I own this domain." It hijacks the connection away from your standard application routing and sends it down a specialized "fast path" handler to complete the cryptographic handshake and satisfy the ACME server.
But here is the irony: in an effort to handle these special validations reliably, the developers made the handler too patient. They created a scenario where a malicious client can say "Hello," and Traefik will wait for the rest of the conversation... forever. It’s the digital equivalent of holding the door open for someone who is standing a mile away and not moving.
To understand the bug, we have to look at how Go handles network timeouts. Go is famous for its "goroutines"—lightweight threads that handle concurrency. In a typical net/http server, there are layers of timeouts (ReadTimeout, WriteTimeout, IdleTimeout) to prevent slow-loris style attacks. But Traefik's TCP router operates a layer below that, closer to the metal.
In pkg/server/router/tcp/router.go, when the router identifies the acme-tls/1 ALPN header, it grabs the raw TCP connection. The vulnerability stems from a specific sequence of operations that would make any paranoid site reliability engineer scream.
First, the code explicitly clears any existing deadlines on the connection (conn.SetDeadline(time.Time{})). It effectively tells the operating system, "Take as long as you need." Then, it initiates a standard TLS handshake using tls.Server(...).Handshake(). This function is blocking. Because the deadline was just nuked, this function will block indefinitely if the client stops sending data.
It’s a classic Resource Exhaustion (CWE-400). The attacker doesn't need to flood the network; they just need to initiate a connection and then do absolutely nothing. Traefik allocates a file descriptor and a goroutine, then pauses execution, waiting for bytes that will never come.
Let's look at the diff for commit e9f3089e9045812bcf1b410a9d40568917b26c3d. It perfectly illustrates the difference between "it works" code and "it's secure" code.
The Vulnerable Code:
// pkg/server/router/tcp/router.go
return tcp.HandlerFunc(func(conn tcp.WriteCloser) {
// DEADLY MISTAKE: No context, no timeout, just vibes.
_ = tls.Server(conn, r.httpsTLSConfig).Handshake()
// ... logic to handle the challenge ...
})In the snippet above, Handshake() is a blocking call on the underlying conn. Since the caller (the router) had previously cleared deadlines to hand off the connection, this line is an infinite wait trap.
The Fix:
The patch introduces two critical concepts: Contexts and Defers.
return tcp.HandlerFunc(func(conn tcp.WriteCloser) {
tlsConn := tls.Server(conn, r.httpsTLSConfig)
// Ensure we clean up the FD no matter what happens
defer tlsConn.Close()
// "we expect a validation request to complete in a short period of time"
// Enforce a hard 2-second limit.
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()
// Use the Context-aware handshake
if err := tlsConn.HandshakeContext(ctx); err != nil {
log.FromContext(ctx).WithError(err).Debug("Error during ACME-TLS/1 handshake")
}
})They switched from Handshake() to HandshakeContext(ctx) and imposed a strict 2-second timeout. If the handshake doesn't finish in 2 seconds—which is eons in CPU time—Traefik kills the connection and frees the resources.
Exploiting this is trivially easy and requires zero authentication. You don't need a massive botnet; a single laptop can hold enough connections open to degrade a standard Traefik instance.
The Attack Recipe:
ClientHello packet. The critical payload is the ALPN extension field containing the string acme-tls/1.ClientHello. The server will parse the ALPN, switch to the vulnerable handler, and wait for the rest of the handshake (e.g., Key Exchange).Re-exploitation Potential:
If you simply repeat this loop, you will consume File Descriptors (FDs). A typical Linux server default is ulimit -n 1024. Once you hit that limit, the operating system refuses to accept any new connections—legitimate or otherwise. The web server effectively disappears from the internet.
> [!NOTE] > Since the patch sets a 2-second timeout, an attacker trying to bypass this on a patched version would need to complete the handshake. But once the handshake is done, the connection is either validated (and closed) or rejected (and closed). The infinite hang is gone.
Why does this matter? Traefik is rarely deployed for a personal blog. It is the ingress controller for massive Kubernetes clusters. It is the front door for hundreds of microservices.
If an attacker exploits CVE-2026-22045, they aren't just taking down one website; they are taking down the entire cluster's ability to speak to the outside world. API gateways, frontend apps, authentication services—they all go dark because they share the same ingress point.
While the CVSS score is a "Medium" 5.9 (due to the lack of Confidentiality/Integrity impact), the Availability impact is High. For an e-commerce platform or a SaaS provider, "Availability: High" means "Revenue: Zero" for the duration of the attack.
Remediation is straightforward: Update.
Configuration Mitigation:
If you cannot patch immediately, you can mitigate this by disabling the TLS-ALPN-01 challenge type in your static configuration. Switch to using the HTTP-01 challenge (which uses standard HTTP requests and likely hits different, safer timeouts) or the DNS-01 challenge (which validates ownership via DNS records and avoids this traffic path entirely).
# Mitigation: Don't use this if you can avoid it
certificatesResolvers:
myresolver:
acme:
# Prefer httpChallenge or dnsChallenge over tlsChallenge
httpChallenge:
entryPoint: webCVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:U/C:N/I:N/A:H| Product | Affected Versions | Fixed Version |
|---|---|---|
Traefik Traefik Labs | < 2.11.35 | 2.11.35 |
Traefik Traefik Labs | >= 3.0.0-beta1, < 3.6.7 | 3.6.7 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-400 (Uncontrolled Resource Consumption) |
| Attack Vector | Network (Remote) |
| CVSS v3.1 | 5.9 (Medium) |
| EPSS Score | 0.00018 (~0.02%) |
| Impact | Denial of Service (DoS) |
| Exploit Status | PoC Available / Trivial |
The software does not properly restrict the size or amount of resources that are requested or influenced by an actor, which can be used to consume more resources than intended.