The root cause of this vulnerability lies in the unvalidated trust placed in remote server headers and DOM properties during download resolution. When the HTTP crawler processes a remote resource, it reads the Content-Disposition header and passes it to self._extract_filename(). The resulting filename is combined directly with the downloads path using os.path.join.

In Python, os.path.join(path, *paths) resolves absolute paths or directory traversal sequences (such as ../) by appending them dynamically. If the filename parameter contains relative parent directory symbols, the resulting resolved path escapes the bounds of the configured target folder. The application then proceeds to open and write raw bytes to this unresolved filepath using aiofiles.open() without path confinement checks.

Similarly, the browser-based crawling strategy relies on Playwright's download.suggested_filename property. This property is also generated from remote HTTP response metadata. If the crawled webpage triggers a download with a name containing traversal components, os.path.join() behaves identically, leading to an out-of-bounds file write via Playwright's save_as() mechanism.

# Vulnerable Path Resolution (Crawl4AI <= 0.8.9) filename = self._extract_filename(content_disposition, url, content_type) filepath = os.path.join(downloads_path, filename) async with aiofiles.open(filepath, 'wb') as f: await f.write(raw_bytes)

def _safe_download_filepath(downloads_path: str, filename: str) -> str: # Restrict filename strictly to its base name component safe_name = os.path.basename(filename or "") if not safe_name or safe_name in (".", ".."): safe_name = f"download_{hashlib.md5((filename or '').encode()).hexdigest()[:10]}" real_root = os.path.realpath(downloads_path) real_path = os.path.realpath(os.path.join(real_root, safe_name)) # Verify path confinement inside the root directory if os.path.commonpath([real_root, real_path]) != real_root: raise ValueError(f"Unsafe download filename rejected: {filename!r}") return real_path

def _nofollow_opener(path, flags): return os.open(path, flags | os.O_NOFOLLOW) # Safe File Write Integration async with aiofiles.open(filepath, 'wb', opener=_nofollow_opener) as f: await f.write(raw_bytes)

Product

Affected Versions

Fixed Version

crawl4ai

unclecode

<= 0.8.9

0.9.0

Attribute

Detail

CWE ID

CWE-22, CWE-59

Attack Vector

Network (AV:N)

CVSS Score

9.6 (Critical)

Impact

Arbitrary File Write / Remote Code Execution

Exploit Status

Proof-of-Concept

KEV Status

Not Listed

GHSA-2JQ4-Q6VV-4CP3

GHSA-2JQ4-Q6VV-4CP3: Arbitrary File Write via Path Traversal in Crawl4AI Downloads

Amit Schendel

Senior Security Researcher

Jun 18, 2026·5 min read·11 visits

Executive Summary (TL;DR)

Crawl4AI <= 0.8.9 allows arbitrary file write and path traversal, potentially leading to RCE via unauthenticated /crawl endpoints or victim-initiated crawling.

A critical Arbitrary File Write vulnerability exists in Crawl4AI versions 0.8.9 and below. By manipulating download filenames via Content-Disposition headers or suggested_filename values, attackers can write arbitrary files to any location on the file system, potentially leading to Remote Code Execution.

Attack Flow Diagram

Vulnerability Overview

Crawl4AI is an open-source Python package widely utilized for web crawling and scraping tasks, particularly in artificial intelligence and machine learning pipelines. The tool features dual crawling strategies: an HTTP-based approach utilizing AsyncHTTPCrawlerStrategy and a browser-based approach powered by Playwright through AsyncPlaywrightCrawlerStrategy. Both engines expose download capabilities intended to store media and file attachments retrieved during crawls.

In versions 0.8.9 and prior, these download features contain an improper pathname limitation vulnerability (CWE-22). The attack surface includes any application execution path where the crawler parses an external, untrusted web resource that redirects to or initiates a file download. Because the crawler does not neutralize directory traversal sequences before processing paths, an attacker can manipulate target outputs to escape the intended downloads directory.

Root Cause Analysis

Code Analysis

To understand the vulnerabilities in version 0.8.9, examine the file download paths in crawl4ai/async_crawler_strategy.py before and after the patch. In the vulnerable version, paths are joined directly:

# Vulnerable Path Resolution (Crawl4AI <= 0.8.9)
filename = self._extract_filename(content_disposition, url, content_type)
filepath = os.path.join(downloads_path, filename)
async with aiofiles.open(filepath, 'wb') as f:
    await f.write(raw_bytes)

The remediation introduced in commit 60886d1a0c52682e4c83a7cef9dfac417fff6bd2 wraps path resolution in a hardened helper function _safe_download_filepath() and enforces strict symlink-blocking mechanisms:

def _safe_download_filepath(downloads_path: str, filename: str) -> str:
    # Restrict filename strictly to its base name component
    safe_name = os.path.basename(filename or "")
    if not safe_name or safe_name in (".", ".."):
        safe_name = f"download_{hashlib.md5((filename or '').encode()).hexdigest()[:10]}"
    
    real_root = os.path.realpath(downloads_path)
    real_path = os.path.realpath(os.path.join(real_root, safe_name))
    
    # Verify path confinement inside the root directory
    if os.path.commonpath([real_root, real_path]) != real_root:
        raise ValueError(f"Unsafe download filename rejected: {filename!r}")
    return real_path

To address time-of-check to time-of-use (TOCTOU) race conditions, files are opened with a custom file opener implementing O_NOFOLLOW:

def _nofollow_opener(path, flags):
    return os.open(path, flags | os.O_NOFOLLOW)
 
# Safe File Write Integration
async with aiofiles.open(filepath, 'wb', opener=_nofollow_opener) as f:
    await f.write(raw_bytes)

This implementation prevents the crawler from following malicious symbolic links planted in the directory during execution.

Exploitation and Attack Methodology

An attacker can exploit this vulnerability through two separate vectors depending on the integration context of Crawl4AI. The first vector involves self-hosted Docker instances where the /crawl API endpoint runs without authentication by default. An attacker sends a POST request specifying a target URL under their control. When Crawl4AI initiates a connection to the attacker-controlled server, the server responds with a malicious payload and a crafted Content-Disposition header.

Content-Disposition: attachment; filename="../../../../../root/.bashrc"

Upon receiving the response, the crawler resolves the target path to /root/.bashrc instead of the expected downloads subdirectory, subsequently overwriting the shell configuration file with attacker-controlled instructions. The next time an interactive shell is initialized within the container, the injected shell code executes.

The second vector affects developers utilizing the Crawl4AI Python library locally. If the developer invokes crawler.arun() on an untrusted web page, the target site can trigger a browser-driven download where the Playwright-derived suggested_filename contains a directory escape string, executing the same out-of-bounds file-writing sequence on the host machine.

Impact Assessment

The impact of an arbitrary file write vulnerability depends heavily on the execution environment and user privileges. Because Crawl4AI is frequently deployed inside Docker containers running as the root user, the write primitive can be used to execute arbitrary commands. Attackers can overwrite sensitive files such as /etc/cron.d/malicious_job to register system tasks or append credentials to ~/.ssh/authorized_keys for direct network access.

Even in unprivileged user spaces, write capability into Python's package path (e.g., modifying libraries in the active sys.path) allows attackers to inject malicious Python files. This triggers execution when subsequent modules are loaded by the application. Under CVSS 3.1, this is rated at a score of 9.6 due to the high severity of remote execution capabilities, lack of required privileges, and minimal user interaction involved.

Mitigation and Remediation

Remediation requires upgrading the crawl4ai package to version 0.9.0 or higher. This version implements robust path validation using realpath checks, blocks symlink traversal via O_NOFOLLOW, and ensures secure default behavior.

For environments where immediate upgrading is not feasible, several defensive workarounds should be applied. First, execute the web crawler process under an unprivileged system user with minimal file system write permissions. Second, run the Crawl4AI engine in a separate, isolated volume mount that contains no system-critical configuration directories or executable code. Finally, secure the /crawl endpoint with an explicit API token to restrict unauthorized usage.

Fix Analysis (1)

Technical Appendix

CVSS Score

9.6/ 10

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H

Affected Systems

Crawl4AI (Python package) <= 0.8.9

Affected Versions Detail

Product	Affected Versions	Fixed Version
crawl4ai unclecode	<= 0.8.9	0.9.0

Attribute	Detail
CWE ID	CWE-22, CWE-59
Attack Vector	Network (AV:N)
CVSS Score	9.6 (Critical)
Impact	Arbitrary File Write / Remote Code Execution
Exploit Status	Proof-of-Concept
KEV Status	Not Listed

MITRE ATT&CK Mapping

T1210Exploitation of Remote Services

Initial Access

T1059Command and Scripting Interpreter

Execution

T1546Event Triggered Execution

Persistence

CWE-22

Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')

The software uses external input to construct a pathname that is intended to identify a file or directory that is located beneath a restricted parent directory, but the software does not properly neutralize special elements within the pathname that can cause the pathname to resolve to a location outside of the restricted directory.

Vulnerability Timeline

Vulnerable download code introduced in version 0.8.8/0.8.9

2026-03-16

Vulnerability identified and reported by Y4tacker

2026-06-18

Patch released in version 0.9.0 and GHSA published

2026-06-18

More Reports

•about 22 hours ago•CVE-2026-58263

7.2

CVE-2026-58263: Mutation Cross-Site Scripting (mXSS) in Jodit Editor clean-html Sanitizer

CVE-2026-58263 is a high-severity Mutation Cross-Site Scripting (mXSS) vulnerability affecting Jodit Editor prior to version 4.12.28. The flaw exists in Jodit's built-in clean-html sanitizer plugin, which fails to securely parse and sanitize nested elements containing foreign namespaces like MathML and SVG. Attackers can bypass sanitization by smuggling malicious payload elements inside rawtext container tags like style inside a MathML node, leading to DOM mutation and unauthenticated arbitrary script execution in the context of the user's browser session.

Amit Schendel

5 views•6 min read

•about 23 hours ago•CVE-2026-65841

5.3

CVE-2026-65841: Client-Side Cross-Site Scripting (XSS) via Foreign Namespace Sanitization Bypass in Jodit Editor

Jodit Editor versions prior to 4.13.6 are vulnerable to client-side Cross-Site Scripting (XSS). The clean-html plugin's sanitization routine performs case-sensitive lookups against uppercase-only element blacklists. When processing XML-based foreign namespaces such as SVG or MathML, DOM engines preserve the lowercase format of tags. Because Jodit's denyTags check fails to normalize tag casing, malicious script blocks nested inside foreign namespace elements completely bypass validation and serialize directly into the editor output.

Amit Schendel

6 views•6 min read

•1 day ago•CVE-2026-53510

8.1

CVE-2026-53510: Remote Code Execution via Dynamic WSDL Parsing in Savon Ruby SOAP Client

A critical code injection vulnerability exists in Savon, a widely used SOAP client library for Ruby, prior to version 2.17.2. The vulnerability resides within the Savon::Model.all_operations module, where operation names fetched from a target Web Services Description Language (WSDL) document are dynamically evaluated via module_eval without sanitization. An attacker capable of manipulating the target WSDL document (e.g., through Man-in-the-Middle attacks, DNS hijacking, or Server-Side Request Forgery) can execute arbitrary Ruby code in the context of the parent application process.

Alon Barad

11 views•6 min read

•1 day ago•CVE-2026-53466

6.5

CVE-2026-53466: Integer Conversion Overflow in ImageMagick XCF Decoder

An integer conversion overflow vulnerability exists in the XCF decoder of ImageMagick before version 6.9.13-51 and 7.1.2-26. The issue arises from mixed-type arithmetic that promotes calculation results to floating-point representations, causing an undefined cast back to integer. Under optimizing compilers, this undefined behavior results in bounds checks being bypassed, allowing out-of-bounds heap reads.

Amit Schendel

6 views•6 min read

•1 day ago•CVE-2026-53599

7.5

CVE-2026-53599: Authenticated Remote Code Execution in REDAXO CMS via Mediapool File Upload Validation Bypass

An authenticated file upload validation bypass vulnerability exists in the REDAXO CMS Mediapool addon in versions 5.18.2 through 5.21.0. Under permissive web server configurations, this allows authenticated users with media upload privileges to achieve remote code execution via multi-segment extension file uploads.

Alon Barad

7 views•7 min read

•1 day ago•CVE-2026-52887

10.0

CVE-2026-52887: Critical SQL Injection and Remote Code Execution in NocoBase

A critical SQL injection vulnerability exists in the @nocobase/plugin-notification-in-app-message plugin of NocoBase prior to version 2.0.61. The flaw is caused by direct string interpolation of user-controlled input into a Sequelize.literal() query, allowing authenticated users to execute stacked PostgreSQL queries and achieve remote code execution on the underlying database server.

Amit Schendel

9 views•7 min read