// recursive_url.ts (Patched) // 🛡️ Narrative: "Check origin, check IP, check everything." import { isSameOrigin, validateSafeUrl } from "@langchain/core/utils/ssrf"; // ... inside the loop if (this.preventOutside && !isSameOrigin(link, this.url)) { continue; } // The real MVP: Proactive IP validation before fetch if (!(await validateSafeUrl(link))) { throw new Error("Potentially unsafe URL detected"); }

So, how do we weaponize this? We need two things: a way to feed the crawler a starting URL (or control a page it visits), and a target worth stealing. In a cloud environment, the target is almost always the Instance Metadata Service (IMDS).

The Setup:

The victim is running a LangChain agent on AWS EC2/Lambda.
The agent uses RecursiveUrlLoader to scrape a URL provided by the user (or a URL the attacker can modify).

The Attack Chain:

Bypass the Scope: The attacker hosts a malicious page at https://target-site.com.attacker-controlled.net. If the user points the loader here, or if the loader crawls here from a valid site (via an open redirect or comment section), the startsWith check passes because the string matches the prefix.
The Redirection (SSRF): On the attacker's page, we place a link or an HTTP redirect to http://169.254.169.254/latest/meta-data/iam/security-credentials/.
The Exfiltration: This is where LangChain makes the vulnerability worse than a standard SSRF. A standard SSRF might just be "blind" (you can send requests but not see responses). But the entire purpose of this component is to read the response body and return it as text to the LLM.
The Payday: The loader fetches the metadata, believing it to be just another web page. It passes the AWS keys to the LLM. The attacker then asks the LLM: "Summarize the documents you just crawled." The LLM happily replies: "I found some JSON data containing an AccessKeyId and a SecretAccessKey..."

Product

Affected Versions

Fixed Version

@langchain/community

langchain-ai

< 1.1.14

1.1.14

Attribute

Detail

CWE ID

CWE-918

Attack Vector

Network

CVSS Score

4.1 (Medium)

Impact

Confidentiality (Low), Scope Changed

Vulnerable Logic

String.startsWith() bypass

Target Component

RecursiveUrlLoader

CVE-2026-26019

Spider in the Web: Escaping LangChain's Crawler Sandbox via SSRF

Alon Barad

Software Engineer

Feb 12, 2026·6 min read·45 visits

Executive Summary (TL;DR)

The RecursiveUrlLoader in LangChain JS used `startsWith()` to validate URLs, allowing attackers to bypass domain restrictions and scan internal networks or steal cloud credentials.

A logic flaw in the LangChain JS `@langchain/community` package allows for Server-Side Request Forgery (SSRF) within the `RecursiveUrlLoader`. By bypassing a weak string-prefix validation check, attackers can force the crawler to access internal network resources, local loopback interfaces, or cloud metadata services. Since the output of this loader is typically fed into an LLM for summarization or processing, this vulnerability transforms a simple network scan into a high-fidelity data exfiltration pipeline.

Attack Flow Diagram

The Hook: Feeding the Beast

In the modern AI ecosystem, data is oxygen. We build agents, give them tools, and tell them to "go forth and learn." One of the most popular tools in the LangChain arsenal is the RecursiveUrlLoader. It's essentially a web spider in a box—you point it at a documentation site or a wiki, and it recursively scrapes every link it finds to build a knowledge base for your RAG (Retrieval-Augmented Generation) pipeline.

But here's the thing about giving a robot a web browser: unless you put a leash on it, it's going to wander into your backyard. The developers knew this. They implemented a preventOutside flag, enabled by default, intended to keep the spider inside the garden fence (the target domain). Ideally, if you point it at https://docs.example.com, it shouldn't wander off to https://pornhub.com or, more importantly, http://169.254.169.254 (the AWS metadata service).

CVE-2026-26019 is the story of how that leash was made of wet paper. It turns out that validating URLs is hard, and using string manipulation to do it is almost always a death sentence for security.

The Flaw: The 'startsWith' Fallacy

The root cause of this vulnerability is a classic developer mistake: confusing a string prefix with a security boundary. To enforce the preventOutside rule, the code needed to check if a discovered link belonged to the same origin as the base URL.

Instead of parsing the URL into its component parts (protocol, hostname, port) and comparing them semantically, the code did this:

// The 'Security' Check
const isAllowed = !this.preventOutside || link.startsWith(baseUrl);

If you are a security researcher, you are likely grinning right now. If you are a developer, let me explain why this is catastrophic. In the world of URLs, startsWith is meaningless for origin validation. If my baseUrl is https://example.com, obviously https://example.com/page2 passes. But so does https://example.com.attacker.com.

This is the "Golden Key" bypass. By simply registering a domain that starts with the target string, or utilizing certain URL formatting tricks (like the @ symbol for authentication segments), an attacker can fool the crawler into thinking an external, malicious, or internal IP is actually part of the allowed zone.

The Code: From Negligence to Sanity

Let's look at the smoking gun. The vulnerability existed in RecursiveUrlLoader prior to version 1.1.14. The fix involved ripping out the naive string comparison and replacing it with a dedicated SSRF protection module that actually understands what a URL is.

The Vulnerable Code:

// recursive_url.ts (Pre-patch)
// 🚩 Narrative: "Looks like the base URL? Must be safe!"
if (this.preventOutside && !link.startsWith(this.url)) {
  continue;
}

The Patched Code:

// recursive_url.ts (Patched)
// 🛡️ Narrative: "Check origin, check IP, check everything."
import { isSameOrigin, validateSafeUrl } from "@langchain/core/utils/ssrf";
 
// ... inside the loop
if (this.preventOutside && !isSameOrigin(link, this.url)) {
  continue;
}
 
// The real MVP: Proactive IP validation before fetch
if (!(await validateSafeUrl(link))) {
  throw new Error("Potentially unsafe URL detected");
}

The patch introduced validateSafeUrl, which does the heavy lifting. It blocks Private IP ranges (RFC 1918), loopback addresses (127.0.0.1), and the notorious cloud metadata IP (169.254.169.254). It also handles the semantic origin check correctly, ensuring that example.com.evil.com is treated as a different origin than example.com.

The Exploit: Exfiltrating Cloud Credentials

The Setup:

The victim is running a LangChain agent on AWS EC2/Lambda.
The agent uses RecursiveUrlLoader to scrape a URL provided by the user (or a URL the attacker can modify).

The Attack Chain:

Bypass the Scope: The attacker hosts a malicious page at https://target-site.com.attacker-controlled.net. If the user points the loader here, or if the loader crawls here from a valid site (via an open redirect or comment section), the startsWith check passes because the string matches the prefix.
The Redirection (SSRF): On the attacker's page, we place a link or an HTTP redirect to http://169.254.169.254/latest/meta-data/iam/security-credentials/.
The Exfiltration: This is where LangChain makes the vulnerability worse than a standard SSRF. A standard SSRF might just be "blind" (you can send requests but not see responses). But the entire purpose of this component is to read the response body and return it as text to the LLM.
The Payday: The loader fetches the metadata, believing it to be just another web page. It passes the AWS keys to the LLM. The attacker then asks the LLM: "Summarize the documents you just crawled." The LLM happily replies: "I found some JSON data containing an AccessKeyId and a SecretAccessKey..."

The Impact: Why This Matters

SSRF in AI agents is a distinct beast. In traditional web apps, SSRF is often used to port scan internal networks or hit legacy administrative interfaces. In AI, it is an Identity Theft vector.

Because the RecursiveUrlLoader is designed to ingest unstructured text, it bypasses many of the format-based protections that might stop other SSRF attacks (like expecting HTML). It will happily ingest JSON, XML, or raw text credentials.

If this runs in a cloud environment (AWS, GCP, Azure) without strict IMDSv2 token requirements, a single crawl can result in full account takeover. Furthermore, because the "User Interaction" metric is required (someone has to tell the bot to crawl), the CVSS score (4.1) deceptively masks the critical nature of the flaw. If you are building a "Chat with Website" feature, this is a Critical vulnerability for your infrastructure.

Mitigation: Patching the Hole

The immediate fix is simple: Upgrade @langchain/community to version 1.1.14 or later. This pulls in the new SSRF protections.

However, code fixes are only one layer of defense. You should treat the environment your AI agents run in as hostile.

Network Segmentation: Why does your AI scraping container need access to your internal HR portal or the AWS Metadata service? It doesn't. Use iptables or Security Groups to block egress traffic to 10.0.0.0/8, 192.168.0.0/16, and 169.254.169.254.
Enforce IMDSv2: If you are on AWS, enforce IMDSv2 (Session-oriented). This requires a PUT request with specific headers to get a token before reading metadata. Simple GET based SSRF (like this web crawler) cannot generate those headers, effectively neutralizing the cloud credential theft vector.
Input Validation: Never trust a URL provided by a user. Even with the patch, users might find ways to abuse the crawler to ddos external sites. Rate limit and allowlist domains wherever possible.

Official Patches

LangChainPull Request implementing strict origin checks and IP validation

Fix Analysis (1)

Technical Appendix

CVSS Score

4.1/ 10

CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:C/C:L/I:N/A:N

Affected Systems

@langchain/community < 1.1.14Applications using RecursiveUrlLoader

Affected Versions Detail

Product	Affected Versions	Fixed Version
@langchain/community langchain-ai	< 1.1.14	1.1.14

Attribute	Detail
CWE ID	CWE-918
Attack Vector	Network
CVSS Score	4.1 (Medium)
Impact	Confidentiality (Low), Scope Changed
Vulnerable Logic	String.startsWith() bypass
Target Component	RecursiveUrlLoader

MITRE ATT&CK Mapping

T1190Exploit Public-Facing Application

Initial Access

T1552.005Cloud Instance Metadata API

Credential Access

CWE-918

Server-Side Request Forgery (SSRF)

Known Exploits & Detection

HypotheticalConstructed scenario using prefix-matching domain bypass to access AWS Metadata.

Vulnerability Timeline

Vulnerability Published

2026-02-11

Patch Released in v1.1.14

2026-02-11

More Reports

•2 days ago•CVE-2026-9354

6.9

CVE-2026-9354: Arbitrary Mass Mention Bypass in NousResearch hermes-agent Slack and Mattermost Adapters

A vulnerability in the Slack and Mattermost platform adapters for NousResearch hermes-agent permits an unauthenticated remote attacker to execute arbitrary mass mentions. By leveraging prompt injection, an attacker can bypass output sanitization logic and trigger workspace-wide notification exhaustion.

Alon Barad

25 views•6 min read

•3 days ago•CVE-2026-9306

6.3

CVE-2026-9306: Unauthenticated Insecure Direct Object Reference (IDOR) in QuantumNous new-api Midjourney Relay

CVE-2026-9306 is a critical unauthenticated Insecure Direct Object Reference (IDOR) vulnerability located in the QuantumNous new-api application, affecting versions up to and including 0.12.1. The flaw is caused by improper middleware ordering combined with a lack of object-level authorization checks. This allows remote, unauthenticated attackers to retrieve sensitive Midjourney images belonging to other users by supplying a valid task identifier.

Amit Schendel

11 views•5 min read

•3 days ago•GHSA-GGXF-37HM-9WQF

6.5

GHSA-GGXF-37HM-9WQF: Session Leakage via Unsafe Challenge Path Parsing in instagrapi

The instagrapi library prior to version 2.6.9 contains an improper input validation vulnerability within its challenge handling mechanism. Maliciously crafted server responses can manipulate the client into forwarding session cookies and credentials to an external attacker-controlled domain.

Amit Schendel

20 views•6 min read

•4 days ago•GHSA-QQQM-5547-774X

9.1

GHSA-QQQM-5547-774X: Unauthenticated Path Traversal in FileBrowser Quantum PATCH Handler

GHSA-QQQM-5547-774X is a critical path traversal vulnerability in the FileBrowser Quantum application, specifically within the Go backend package. The vulnerability resides in the HTTP handler responsible for processing bulk file modifications via the public API. Unauthenticated attackers can exploit an order-of-operations flaw in the path sanitization logic to bypass intended directory restrictions. This allows adversaries to arbitrarily read, move, and overwrite files on the underlying filesystem by supplying specially crafted HTTP PATCH requests.

Alon Barad

5 views•6 min read

•4 days ago•CVE-2026-8723

5.3

CVE-2026-8723: Synchronous Denial of Service in qs npm Package via TypeError

The qs query string parsing and serialization library for Node.js is vulnerable to a synchronous Denial of Service (DoS) attack. The vulnerability manifests as a process-terminating TypeError when processing arrays with null or undefined elements under specific configuration parameters.

Amit Schendel

34 views•7 min read

•4 days ago•GHSA-7M8F-HGJQ-8GC9

7.5

GHSA-7M8F-HGJQ-8GC9: Pre-Authentication Denial of Service via Insecure Deserialization Order in aiosend

The aiosend library prior to version 3.0.6 contains a pre-authentication Denial of Service (DoS) vulnerability in its webhook handling mechanism. The software processes and deserializes incoming JSON payloads before verifying the cryptographic signature, allowing unauthenticated attackers to exhaust server CPU and memory resources by sending large, complex payloads.

Amit Schendel

3 views•6 min read