# Conceptual representation of the vulnerable pattern def split_text_from_url(url: str): if not is_safe_url(url): raise ValueError("Unsafe URL") # Flaw: The default HTTP client follows redirects without re-validation response = requests.get(url) return split_text(response.text)

# Conceptual representation of the patched pattern def split_text_from_url(url: str): if not is_safe_url(url): raise ValueError("Unsafe URL") # Enforcing strict redirect validation or disabling auto-redirects response = requests.get(url, allow_redirects=False) if response.status_code in (301, 302, 303, 307, 308): # Handle redirect manually by re-verifying the Location header new_url = response.headers['Location'] if not is_safe_url(new_url): raise ValueError("Unsafe redirect URL") response = requests.get(new_url, allow_redirects=False) return split_text(response.text)

Exploiting this SSRF vulnerability requires the attacker to control an external web server and pass its URL into an application utilizing the HTMLHeaderTextSplitter.split_text_from_url method. The attacker configures their server to respond to incoming HTTP GET requests with a 302 Found status code. The response includes a Location header pointing to the targeted internal resource.

When the vulnerable LangChain application processes the attacker's input, it first validates the external domain. Since the domain resolves to a public IP address, the validation check passes. The application then issues the GET request. The underlying HTTP library receives the 302 response and automatically issues a secondary request to the URL specified in the Location header.

The application retrieves the content from the internal resource, processes it through the HTML splitting logic, and incorporates the resulting text chunks into its normal execution flow. Depending on the application's design, this data may be reflected directly back to the attacker in an HTTP response, stored in a database, or processed by an LLM, making the exfiltrated data accessible to the attacker.

Product

Affected Versions

Fixed Version

langchain-text-splitters

LangChain

< 0.3.5

0.3.5

Attribute

Detail

CWE ID

CWE-918

Attack Vector

Network

CVSS Score

6.5

Impact

Confidentiality, Integrity

Exploit Status

Proof-of-Concept

KEV Status

Not Listed

GHSA-FV5P-P927-QMXR

GHSA-FV5P-P927-QMXR: SSRF via Redirect Bypass in LangChain HTMLHeaderTextSplitter

Alon Barad

Software Engineer

Apr 17, 2026·6 min read·7 visits

Executive Summary (TL;DR)

LangChain's HTML text splitter fails to validate HTTP redirects during content retrieval, enabling attackers to bypass SSRF protections and extract internal network data or cloud IAM credentials.

The `langchain-text-splitters` package prior to version 0.3.5 is vulnerable to Server-Side Request Forgery (SSRF) in the `HTMLHeaderTextSplitter.split_text_from_url` method. The vulnerability arises from an incomplete validation mechanism that checks the initial URL but fails to restrict subsequent HTTP redirects, allowing an attacker to access restricted internal resources and cloud metadata services.

Attack Flow Diagram

Vulnerability Overview

The langchain-text-splitters package, a component of the broader LangChain ecosystem, provides utilities for dividing text into smaller, semantically meaningful chunks. This functionality is required for processing large documents before embedding them into vector stores or feeding them to Large Language Models (LLMs). The HTMLHeaderTextSplitter class specifically targets HTML documents, parsing the Document Object Model (DOM) and splitting content based on header tags (<h1>, <h2>, etc.) to maintain logical structure.

A Server-Side Request Forgery (SSRF) vulnerability exists in the split_text_from_url method of this class. The flaw is tracked as GHSA-FV5P-P927-QMXR and carries a CVSS v3.1 score of 6.5. The vulnerability allows an attacker to bypass initial URL validation checks by leveraging HTTP redirects, forcing the server into making unauthorized requests to internal network resources.

The component attempts to restrict outbound requests to safe, public IP addresses to prevent SSRF. It implements a validation step that checks the user-supplied URL against a blocklist of restricted ranges, such as local loopback addresses and cloud metadata service IPs. However, this validation is performed only on the initial URL provided in the method invocation, failing to account for subsequent network routing events.

Root Cause Analysis

The root cause of this vulnerability lies in the implementation of the HTTP request lifecycle within the split_text_from_url method. When a user supplies a URL to this method, the underlying code executes a validation routine to ensure the destination is not a restricted or internal network address. If the URL passes this check, the method proceeds to fetch the content using an HTTP client.

The critical flaw is the failure to restrict or re-validate HTTP redirects. By default, standard HTTP clients automatically follow 3xx redirect status codes, such as 301 Moved Permanently or 302 Found. The underlying client transparently processes the Location header provided in the remote server's response and initiates a secondary request to the new destination.

Because the anti-SSRF validation logic only inspects the initial input string, it remains blind to any subsequent destinations introduced during the redirect chain. An attacker can supply a URL pointing to an external server they control. This server passes the initial validation but responds with a redirect pointing to a restricted internal IP address, circumventing the security control entirely.

Code Analysis

To understand the mechanical failure, we examine the sequence of operations in the vulnerable code path. The initial implementation performs a synchronous check on the URL string, verifying its host component against known restricted CIDR blocks. This logic correctly identifies and blocks explicit attempts to access addresses like 127.0.0.1 or 169.254.169.254.

# Conceptual representation of the vulnerable pattern
def split_text_from_url(url: str):
    if not is_safe_url(url):
        raise ValueError("Unsafe URL")
    
    # Flaw: The default HTTP client follows redirects without re-validation
    response = requests.get(url)
    return split_text(response.text)

The patch introduced in Pull Request #35960 addresses this discrepancy by hardening the anti-SSRF mechanisms. The fix modifies the request execution strategy to explicitly control redirect behavior. It disables automatic redirects or implements a custom redirect handler that recursively validates each hop in the redirect chain before proceeding.

# Conceptual representation of the patched pattern
def split_text_from_url(url: str):
    if not is_safe_url(url):
        raise ValueError("Unsafe URL")
    
    # Enforcing strict redirect validation or disabling auto-redirects
    response = requests.get(url, allow_redirects=False)
    if response.status_code in (301, 302, 303, 307, 308):
        # Handle redirect manually by re-verifying the Location header
        new_url = response.headers['Location']
        if not is_safe_url(new_url):
            raise ValueError("Unsafe redirect URL")
        response = requests.get(new_url, allow_redirects=False)
    
    return split_text(response.text)

Exploitation

Impact Assessment

The primary impact of this vulnerability is the unauthorized disclosure of internal network configuration, local services, and sensitive credentials. The most critical risk surfaces when the vulnerable application is deployed in a cloud environment, such as AWS, Google Cloud Platform, or Microsoft Azure. Cloud providers utilize metadata services accessible via deterministic, non-routable IP addresses.

By targeting these metadata endpoints, an attacker extracts temporary Identity and Access Management (IAM) credentials, instance configuration details, and user-data scripts. If the IAM role attached to the computing instance possesses excessive privileges, the attacker leverages these credentials to pivot into the broader cloud environment, resulting in broader infrastructure compromise.

Beyond cloud metadata, the SSRF flaw enables an attacker to map the internal network architecture. They systematically probe local ports to identify services bound to 127.0.0.1 or scan adjacent internal subnets. Unauthenticated internal services, such as internal Redis databases, REST APIs, or administrative consoles, become reachable from the external attacker's perspective.

Remediation

The definitive remediation for this vulnerability is upgrading the langchain-text-splitters package to version 0.3.5 or later. This release incorporates the anti-SSRF hardening implemented in PR #35960. Development teams must audit their dependencies and verify that their Python environments execute the patched version.

In scenarios where immediate patching is not feasible, organizations must implement defense-in-depth measures at the network level. Configuring strict egress filtering on the host or container running the LangChain application restricts the impact. The egress firewall must deny all outbound traffic to 169.254.169.254 and block traffic to internal subnets (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) unless explicitly required by business logic.

Applications accepting arbitrary URLs for processing should implement defense-in-depth at the application layer. Utilizing dedicated proxy services designed to fetch external content safely prevents direct interaction with untrusted remote servers. These proxies enforce strict routing policies, deny redirects automatically, and strip sensitive headers.

Official Patches

LangChainfeat(core): harden anti-ssrf Pull Request

Technical Appendix

CVSS Score

6.5/ 10

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:N

Affected Systems

langchain-text-splitters

Affected Versions Detail

Product	Affected Versions	Fixed Version
langchain-text-splitters LangChain	< 0.3.5	0.3.5

Attribute	Detail
CWE ID	CWE-918
Attack Vector	Network
CVSS Score	6.5
Impact	Confidentiality, Integrity
Exploit Status	Proof-of-Concept
KEV Status	Not Listed

MITRE ATT&CK Mapping

T1190Exploit Public-Facing Application

Initial Access

T1552.005Cloud Instance Metadata API

Credential Access

CWE-918

Server-Side Request Forgery (SSRF)

The web server receives a URL or similar request from an upstream component and retrieves the contents of this URL, but it does not sufficiently ensure that the request is being sent to the expected destination.

Vulnerability Timeline

Vulnerability Published

2024-10-24

Patch Released in version 0.3.5

2024-10-24

More Reports

•9 days ago•CVE-2026-9354

6.9

CVE-2026-9354: Arbitrary Mass Mention Bypass in NousResearch hermes-agent Slack and Mattermost Adapters

A vulnerability in the Slack and Mattermost platform adapters for NousResearch hermes-agent permits an unauthenticated remote attacker to execute arbitrary mass mentions. By leveraging prompt injection, an attacker can bypass output sanitization logic and trigger workspace-wide notification exhaustion.

Alon Barad

38 views•6 min read

•9 days ago•CVE-2026-9306

6.3

CVE-2026-9306: Unauthenticated Insecure Direct Object Reference (IDOR) in QuantumNous new-api Midjourney Relay

CVE-2026-9306 is a critical unauthenticated Insecure Direct Object Reference (IDOR) vulnerability located in the QuantumNous new-api application, affecting versions up to and including 0.12.1. The flaw is caused by improper middleware ordering combined with a lack of object-level authorization checks. This allows remote, unauthenticated attackers to retrieve sensitive Midjourney images belonging to other users by supplying a valid task identifier.

Amit Schendel

16 views•5 min read

•10 days ago•GHSA-GGXF-37HM-9WQF

6.5

GHSA-GGXF-37HM-9WQF: Session Leakage via Unsafe Challenge Path Parsing in instagrapi

The instagrapi library prior to version 2.6.9 contains an improper input validation vulnerability within its challenge handling mechanism. Maliciously crafted server responses can manipulate the client into forwarding session cookies and credentials to an external attacker-controlled domain.

Amit Schendel

21 views•6 min read

•10 days ago•GHSA-QQQM-5547-774X

9.1

GHSA-QQQM-5547-774X: Unauthenticated Path Traversal in FileBrowser Quantum PATCH Handler

GHSA-QQQM-5547-774X is a critical path traversal vulnerability in the FileBrowser Quantum application, specifically within the Go backend package. The vulnerability resides in the HTTP handler responsible for processing bulk file modifications via the public API. Unauthenticated attackers can exploit an order-of-operations flaw in the path sanitization logic to bypass intended directory restrictions. This allows adversaries to arbitrarily read, move, and overwrite files on the underlying filesystem by supplying specially crafted HTTP PATCH requests.

Alon Barad

11 views•6 min read

•10 days ago•CVE-2026-8723

5.3

CVE-2026-8723: Synchronous Denial of Service in qs npm Package via TypeError

The qs query string parsing and serialization library for Node.js is vulnerable to a synchronous Denial of Service (DoS) attack. The vulnerability manifests as a process-terminating TypeError when processing arrays with null or undefined elements under specific configuration parameters.

Amit Schendel

38 views•7 min read

•10 days ago•GHSA-7M8F-HGJQ-8GC9

7.5

GHSA-7M8F-HGJQ-8GC9: Pre-Authentication Denial of Service via Insecure Deserialization Order in aiosend

The aiosend library prior to version 3.0.6 contains a pre-authentication Denial of Service (DoS) vulnerability in its webhook handling mechanism. The software processes and deserializes incoming JSON payloads before verifying the cryptographic signature, allowing unauthenticated attackers to exhaust server CPU and memory resources by sending large, complex payloads.

Amit Schendel

4 views•6 min read