Jun 18, 2026·5 min read·3 visits
Crawl4AI <= 0.8.9 allows arbitrary file write and path traversal, potentially leading to RCE via unauthenticated /crawl endpoints or victim-initiated crawling.
A critical Arbitrary File Write vulnerability exists in Crawl4AI versions 0.8.9 and below. By manipulating download filenames via Content-Disposition headers or suggested_filename values, attackers can write arbitrary files to any location on the file system, potentially leading to Remote Code Execution.
Crawl4AI is an open-source Python package widely utilized for web crawling and scraping tasks, particularly in artificial intelligence and machine learning pipelines. The tool features dual crawling strategies: an HTTP-based approach utilizing AsyncHTTPCrawlerStrategy and a browser-based approach powered by Playwright through AsyncPlaywrightCrawlerStrategy. Both engines expose download capabilities intended to store media and file attachments retrieved during crawls.
In versions 0.8.9 and prior, these download features contain an improper pathname limitation vulnerability (CWE-22). The attack surface includes any application execution path where the crawler parses an external, untrusted web resource that redirects to or initiates a file download. Because the crawler does not neutralize directory traversal sequences before processing paths, an attacker can manipulate target outputs to escape the intended downloads directory.
The root cause of this vulnerability lies in the unvalidated trust placed in remote server headers and DOM properties during download resolution. When the HTTP crawler processes a remote resource, it reads the Content-Disposition header and passes it to self._extract_filename(). The resulting filename is combined directly with the downloads path using os.path.join.
In Python, os.path.join(path, *paths) resolves absolute paths or directory traversal sequences (such as ../) by appending them dynamically. If the filename parameter contains relative parent directory symbols, the resulting resolved path escapes the bounds of the configured target folder. The application then proceeds to open and write raw bytes to this unresolved filepath using aiofiles.open() without path confinement checks.
Similarly, the browser-based crawling strategy relies on Playwright's download.suggested_filename property. This property is also generated from remote HTTP response metadata. If the crawled webpage triggers a download with a name containing traversal components, os.path.join() behaves identically, leading to an out-of-bounds file write via Playwright's save_as() mechanism.
To understand the vulnerabilities in version 0.8.9, examine the file download paths in crawl4ai/async_crawler_strategy.py before and after the patch. In the vulnerable version, paths are joined directly:
# Vulnerable Path Resolution (Crawl4AI <= 0.8.9)
filename = self._extract_filename(content_disposition, url, content_type)
filepath = os.path.join(downloads_path, filename)
async with aiofiles.open(filepath, 'wb') as f:
await f.write(raw_bytes)The remediation introduced in commit 60886d1a0c52682e4c83a7cef9dfac417fff6bd2 wraps path resolution in a hardened helper function _safe_download_filepath() and enforces strict symlink-blocking mechanisms:
def _safe_download_filepath(downloads_path: str, filename: str) -> str:
# Restrict filename strictly to its base name component
safe_name = os.path.basename(filename or "")
if not safe_name or safe_name in (".", ".."):
safe_name = f"download_{hashlib.md5((filename or '').encode()).hexdigest()[:10]}"
real_root = os.path.realpath(downloads_path)
real_path = os.path.realpath(os.path.join(real_root, safe_name))
# Verify path confinement inside the root directory
if os.path.commonpath([real_root, real_path]) != real_root:
raise ValueError(f"Unsafe download filename rejected: {filename!r}")
return real_pathTo address time-of-check to time-of-use (TOCTOU) race conditions, files are opened with a custom file opener implementing O_NOFOLLOW:
def _nofollow_opener(path, flags):
return os.open(path, flags | os.O_NOFOLLOW)
# Safe File Write Integration
async with aiofiles.open(filepath, 'wb', opener=_nofollow_opener) as f:
await f.write(raw_bytes)This implementation prevents the crawler from following malicious symbolic links planted in the directory during execution.
An attacker can exploit this vulnerability through two separate vectors depending on the integration context of Crawl4AI. The first vector involves self-hosted Docker instances where the /crawl API endpoint runs without authentication by default. An attacker sends a POST request specifying a target URL under their control. When Crawl4AI initiates a connection to the attacker-controlled server, the server responds with a malicious payload and a crafted Content-Disposition header.
Content-Disposition: attachment; filename="../../../../../root/.bashrc"
Upon receiving the response, the crawler resolves the target path to /root/.bashrc instead of the expected downloads subdirectory, subsequently overwriting the shell configuration file with attacker-controlled instructions. The next time an interactive shell is initialized within the container, the injected shell code executes.
The second vector affects developers utilizing the Crawl4AI Python library locally. If the developer invokes crawler.arun() on an untrusted web page, the target site can trigger a browser-driven download where the Playwright-derived suggested_filename contains a directory escape string, executing the same out-of-bounds file-writing sequence on the host machine.
The impact of an arbitrary file write vulnerability depends heavily on the execution environment and user privileges. Because Crawl4AI is frequently deployed inside Docker containers running as the root user, the write primitive can be used to execute arbitrary commands. Attackers can overwrite sensitive files such as /etc/cron.d/malicious_job to register system tasks or append credentials to ~/.ssh/authorized_keys for direct network access.
Even in unprivileged user spaces, write capability into Python's package path (e.g., modifying libraries in the active sys.path) allows attackers to inject malicious Python files. This triggers execution when subsequent modules are loaded by the application. Under CVSS 3.1, this is rated at a score of 9.6 due to the high severity of remote execution capabilities, lack of required privileges, and minimal user interaction involved.
Remediation requires upgrading the crawl4ai package to version 0.9.0 or higher. This version implements robust path validation using realpath checks, blocks symlink traversal via O_NOFOLLOW, and ensures secure default behavior.
For environments where immediate upgrading is not feasible, several defensive workarounds should be applied. First, execute the web crawler process under an unprivileged system user with minimal file system write permissions. Second, run the Crawl4AI engine in a separate, isolated volume mount that contains no system-critical configuration directories or executable code. Finally, secure the /crawl endpoint with an explicit API token to restrict unauthorized usage.
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H| Product | Affected Versions | Fixed Version |
|---|---|---|
crawl4ai unclecode | <= 0.8.9 | 0.9.0 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-22, CWE-59 |
| Attack Vector | Network (AV:N) |
| CVSS Score | 9.6 (Critical) |
| Impact | Arbitrary File Write / Remote Code Execution |
| Exploit Status | Proof-of-Concept |
| KEV Status | Not Listed |
The software uses external input to construct a pathname that is intended to identify a file or directory that is located beneath a restricted parent directory, but the software does not properly neutralize special elements within the pathname that can cause the pathname to resolve to a location outside of the restricted directory.
An unauthenticated authentication bypass vulnerability exists in @acastellon/auth, an authorization middleware package for Express-based microservices. The vulnerability allows a remote, unauthenticated attacker to completely bypass token validation checks in the validateToken() middleware via spoofed HTTP headers.
A technical analysis of SQL injection vulnerabilities affecting Budibase's database connectors for PostgreSQL, Microsoft SQL Server, and MySQL. Due to direct concatenation of schema and table identifiers into raw SQL queries, authenticated administrative users or malicious database schemas can execute arbitrary SQL commands.
A critical unauthenticated remote code execution vulnerability exists in Crawl4AI versions up to 0.8.9. The flaw is caused by improper neutralization of command arguments passed to the Chromium process execution engine via the browser_config.extra_args parameter, enabling remote attackers to execute arbitrary shell commands inside the container.
An unauthenticated Server-Side Request Forgery (SSRF) vulnerability was identified in the Crawl4AI Docker API server before version 0.9.0. The vulnerability exists because the streaming crawl endpoint (/crawl/stream) and the standard crawl endpoint with streaming enabled (/crawl with crawler_config.stream=true) bypass the validate_url_destination security filter. This allows remote, unauthenticated attackers to execute arbitrary HTTP requests targeting internal infrastructure, loopback interfaces, or cloud metadata endpoints like AWS/GCP services.
CVE-2026-12565 is a medium-severity path traversal (Zip-Slip) vulnerability within the internal unarchive module of the BBOT (Black Lantern Security) OSINT framework. The vulnerability exists due to a failure to validate target paths before extracting archives using host-level command-line utilities. This allows remote, unauthenticated attackers to write arbitrary files outside of the target extraction folder on environments running legacy versions of GNU tar.
A Server-Side Request Forgery (SSRF) vulnerability exists in the docker_pull module of Black Lantern Security BBOT. By returning a maliciously crafted WWW-Authenticate header from a rogue Docker registry or executing a Man-in-the-Middle (MitM) attack, an attacker can coerce the BBOT scanner into making arbitrary HTTP requests to internal system services or external infrastructure, potentially disclosing sensitive authorization tokens and host metadata.