CVEReports
CVEReports

Automated vulnerability intelligence platform. Comprehensive reports for high-severity CVEs generated by AI.

Product

  • Home
  • Sitemap
  • RSS Feed

Company

  • About
  • Contact
  • Privacy Policy
  • Terms of Service

© 2026 CVEReports. All rights reserved.

Made with love by Amit Schendel & Alon Barad



GHSA-2JQ4-Q6VV-4CP3

GHSA-2JQ4-Q6VV-4CP3: Arbitrary File Write via Path Traversal in Crawl4AI Downloads

Amit Schendel
Amit Schendel
Senior Security Researcher

Jun 18, 2026·5 min read·3 visits

Executive Summary (TL;DR)

Crawl4AI <= 0.8.9 allows arbitrary file write and path traversal, potentially leading to RCE via unauthenticated /crawl endpoints or victim-initiated crawling.

A critical Arbitrary File Write vulnerability exists in Crawl4AI versions 0.8.9 and below. By manipulating download filenames via Content-Disposition headers or suggested_filename values, attackers can write arbitrary files to any location on the file system, potentially leading to Remote Code Execution.

Vulnerability Overview

Crawl4AI is an open-source Python package widely utilized for web crawling and scraping tasks, particularly in artificial intelligence and machine learning pipelines. The tool features dual crawling strategies: an HTTP-based approach utilizing AsyncHTTPCrawlerStrategy and a browser-based approach powered by Playwright through AsyncPlaywrightCrawlerStrategy. Both engines expose download capabilities intended to store media and file attachments retrieved during crawls.

In versions 0.8.9 and prior, these download features contain an improper pathname limitation vulnerability (CWE-22). The attack surface includes any application execution path where the crawler parses an external, untrusted web resource that redirects to or initiates a file download. Because the crawler does not neutralize directory traversal sequences before processing paths, an attacker can manipulate target outputs to escape the intended downloads directory.

Root Cause Analysis

The root cause of this vulnerability lies in the unvalidated trust placed in remote server headers and DOM properties during download resolution. When the HTTP crawler processes a remote resource, it reads the Content-Disposition header and passes it to self._extract_filename(). The resulting filename is combined directly with the downloads path using os.path.join.

In Python, os.path.join(path, *paths) resolves absolute paths or directory traversal sequences (such as ../) by appending them dynamically. If the filename parameter contains relative parent directory symbols, the resulting resolved path escapes the bounds of the configured target folder. The application then proceeds to open and write raw bytes to this unresolved filepath using aiofiles.open() without path confinement checks.

Similarly, the browser-based crawling strategy relies on Playwright's download.suggested_filename property. This property is also generated from remote HTTP response metadata. If the crawled webpage triggers a download with a name containing traversal components, os.path.join() behaves identically, leading to an out-of-bounds file write via Playwright's save_as() mechanism.

Code Analysis

To understand the vulnerabilities in version 0.8.9, examine the file download paths in crawl4ai/async_crawler_strategy.py before and after the patch. In the vulnerable version, paths are joined directly:

# Vulnerable Path Resolution (Crawl4AI <= 0.8.9)
filename = self._extract_filename(content_disposition, url, content_type)
filepath = os.path.join(downloads_path, filename)
async with aiofiles.open(filepath, 'wb') as f:
    await f.write(raw_bytes)

The remediation introduced in commit 60886d1a0c52682e4c83a7cef9dfac417fff6bd2 wraps path resolution in a hardened helper function _safe_download_filepath() and enforces strict symlink-blocking mechanisms:

def _safe_download_filepath(downloads_path: str, filename: str) -> str:
    # Restrict filename strictly to its base name component
    safe_name = os.path.basename(filename or "")
    if not safe_name or safe_name in (".", ".."):
        safe_name = f"download_{hashlib.md5((filename or '').encode()).hexdigest()[:10]}"
    
    real_root = os.path.realpath(downloads_path)
    real_path = os.path.realpath(os.path.join(real_root, safe_name))
    
    # Verify path confinement inside the root directory
    if os.path.commonpath([real_root, real_path]) != real_root:
        raise ValueError(f"Unsafe download filename rejected: {filename!r}")
    return real_path

To address time-of-check to time-of-use (TOCTOU) race conditions, files are opened with a custom file opener implementing O_NOFOLLOW:

def _nofollow_opener(path, flags):
    return os.open(path, flags | os.O_NOFOLLOW)
 
# Safe File Write Integration
async with aiofiles.open(filepath, 'wb', opener=_nofollow_opener) as f:
    await f.write(raw_bytes)

This implementation prevents the crawler from following malicious symbolic links planted in the directory during execution.

Exploitation and Attack Methodology

An attacker can exploit this vulnerability through two separate vectors depending on the integration context of Crawl4AI. The first vector involves self-hosted Docker instances where the /crawl API endpoint runs without authentication by default. An attacker sends a POST request specifying a target URL under their control. When Crawl4AI initiates a connection to the attacker-controlled server, the server responds with a malicious payload and a crafted Content-Disposition header.

Content-Disposition: attachment; filename="../../../../../root/.bashrc"

Upon receiving the response, the crawler resolves the target path to /root/.bashrc instead of the expected downloads subdirectory, subsequently overwriting the shell configuration file with attacker-controlled instructions. The next time an interactive shell is initialized within the container, the injected shell code executes.

The second vector affects developers utilizing the Crawl4AI Python library locally. If the developer invokes crawler.arun() on an untrusted web page, the target site can trigger a browser-driven download where the Playwright-derived suggested_filename contains a directory escape string, executing the same out-of-bounds file-writing sequence on the host machine.

Impact Assessment

The impact of an arbitrary file write vulnerability depends heavily on the execution environment and user privileges. Because Crawl4AI is frequently deployed inside Docker containers running as the root user, the write primitive can be used to execute arbitrary commands. Attackers can overwrite sensitive files such as /etc/cron.d/malicious_job to register system tasks or append credentials to ~/.ssh/authorized_keys for direct network access.

Even in unprivileged user spaces, write capability into Python's package path (e.g., modifying libraries in the active sys.path) allows attackers to inject malicious Python files. This triggers execution when subsequent modules are loaded by the application. Under CVSS 3.1, this is rated at a score of 9.6 due to the high severity of remote execution capabilities, lack of required privileges, and minimal user interaction involved.

Mitigation and Remediation

Remediation requires upgrading the crawl4ai package to version 0.9.0 or higher. This version implements robust path validation using realpath checks, blocks symlink traversal via O_NOFOLLOW, and ensures secure default behavior.

For environments where immediate upgrading is not feasible, several defensive workarounds should be applied. First, execute the web crawler process under an unprivileged system user with minimal file system write permissions. Second, run the Crawl4AI engine in a separate, isolated volume mount that contains no system-critical configuration directories or executable code. Finally, secure the /crawl endpoint with an explicit API token to restrict unauthorized usage.

Fix Analysis (1)

Technical Appendix

CVSS Score
9.6/ 10
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H

Affected Systems

Crawl4AI (Python package) <= 0.8.9

Affected Versions Detail

Product
Affected Versions
Fixed Version
crawl4ai
unclecode
<= 0.8.90.9.0
AttributeDetail
CWE IDCWE-22, CWE-59
Attack VectorNetwork (AV:N)
CVSS Score9.6 (Critical)
ImpactArbitrary File Write / Remote Code Execution
Exploit StatusProof-of-Concept
KEV StatusNot Listed

MITRE ATT&CK Mapping

T1210Exploitation of Remote Services
Initial Access
T1059Command and Scripting Interpreter
Execution
T1546Event Triggered Execution
Persistence
CWE-22
Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')

The software uses external input to construct a pathname that is intended to identify a file or directory that is located beneath a restricted parent directory, but the software does not properly neutralize special elements within the pathname that can cause the pathname to resolve to a location outside of the restricted directory.

Vulnerability Timeline

Vulnerable download code introduced in version 0.8.8/0.8.9
2026-03-16
Vulnerability identified and reported by Y4tacker
2026-06-18
Patch released in version 0.9.0 and GHSA published
2026-06-18

References & Sources

  • [1]GitHub Security Advisory GHSA-2jq4-q6vv-4cp3
  • [2]GitHub Advisory Database Entry GHSA-2JQ4-Q6VV-4CP3
  • [3]Fix Commit in unclecode/crawl4ai
  • [4]Crawl4AI Repository

Attack Flow Diagram

Press enter or space to select a node. You can then use the arrow keys to move the node around. Press delete to remove it and escape to cancel.
Press enter or space to select an edge. You can then press delete to remove it or escape to cancel.

More Reports

•10 minutes ago•GHSA-GFJ5-979R-92PW
9.3

GHSA-GFJ5-979R-92PW: Unauthenticated Authentication Bypass in @acastellon/auth via Header Spoofing

An unauthenticated authentication bypass vulnerability exists in @acastellon/auth, an authorization middleware package for Express-based microservices. The vulnerability allows a remote, unauthenticated attacker to completely bypass token validation checks in the validateToken() middleware via spoofed HTTP headers.

Alon Barad
Alon Barad
0 views•6 min read
•43 minutes ago•GHSA-QQF5-X7MJ-V43P
8.4

GHSA-QQF5-X7MJ-V43P: SQL Injection Vulnerabilities in Budibase Database Connectors

A technical analysis of SQL injection vulnerabilities affecting Budibase's database connectors for PostgreSQL, Microsoft SQL Server, and MySQL. Due to direct concatenation of schema and table identifiers into raw SQL queries, authenticated administrative users or malicious database schemas can execute arbitrary SQL commands.

Alon Barad
Alon Barad
3 views•8 min read
•about 2 hours ago•GHSA-R253-R9JW-QG44
10.0

GHSA-R253-R9JW-QG44: Unauthenticated Remote Code Execution in Crawl4AI via Chromium Launch-Argument Injection

A critical unauthenticated remote code execution vulnerability exists in Crawl4AI versions up to 0.8.9. The flaw is caused by improper neutralization of command arguments passed to the Chromium process execution engine via the browser_config.extra_args parameter, enabling remote attackers to execute arbitrary shell commands inside the container.

Alon Barad
Alon Barad
3 views•6 min read
•about 3 hours ago•GHSA-WM69-2PC3-RMMF
8.6

GHSA-wm69-2pc3-rmmf: Unauthenticated Server-Side Request Forgery in Crawl4AI Docker Streaming Crawl Path

An unauthenticated Server-Side Request Forgery (SSRF) vulnerability was identified in the Crawl4AI Docker API server before version 0.9.0. The vulnerability exists because the streaming crawl endpoint (/crawl/stream) and the standard crawl endpoint with streaming enabled (/crawl with crawler_config.stream=true) bypass the validate_url_destination security filter. This allows remote, unauthenticated attackers to execute arbitrary HTTP requests targeting internal infrastructure, loopback interfaces, or cloud metadata endpoints like AWS/GCP services.

Amit Schendel
Amit Schendel
4 views•5 min read
•about 3 hours ago•CVE-2026-12565
5.3

CVE-2026-12565: Arbitrary File Write via Path Traversal in BBOT unarchive Module

CVE-2026-12565 is a medium-severity path traversal (Zip-Slip) vulnerability within the internal unarchive module of the BBOT (Black Lantern Security) OSINT framework. The vulnerability exists due to a failure to validate target paths before extracting archives using host-level command-line utilities. This allows remote, unauthenticated attackers to write arbitrary files outside of the target extraction folder on environments running legacy versions of GNU tar.

Alon Barad
Alon Barad
3 views•7 min read
•about 4 hours ago•CVE-2026-12566
3.1

CVE-2026-12566: Server-Side Request Forgery (SSRF) in Black Lantern Security BBOT docker_pull Module

A Server-Side Request Forgery (SSRF) vulnerability exists in the docker_pull module of Black Lantern Security BBOT. By returning a maliciously crafted WWW-Authenticate header from a rogue Docker registry or executing a Man-in-the-Middle (MitM) attack, an attacker can coerce the BBOT scanner into making arbitrary HTTP requests to internal system services or external infrastructure, potentially disclosing sensitive authorization tokens and host metadata.

Amit Schendel
Amit Schendel
5 views•6 min read