def get_article_urls(self, browser, root_url, current_url, max_depth=100): # ... scraping logic ... for link in links: # CRITICAL FAIL: max_depth is passed, but no 'current_depth' is tracked. # The function has no idea how deep it is. article_urls.extend( self.get_article_urls(browser, root_url, link, max_depth) )

# Now we track 'depth' and increment it. def get_article_urls(self, browser, root_url, current_url, max_depth=100, depth=0): # The Base Case! Finally! if depth >= max_depth: return [] # ... scraping logic ... for link in links: article_urls.extend( # Increment depth! self.get_article_urls(browser, root_url, link, max_depth, depth + 1) )

To exploit this, we don't need advanced fuzzing or heap spraying. We just need two HTML files. Let's create a deadly trap for the KnowledgeBaseWebReader.

Step 1: The Trap

We set up a simple Flask server hosting two pages: /alpha and /omega.

/alpha contains a link to /omega.
/omega contains a link to /alpha.

Step 2: The Trigger

We feed http://attacker.com/alpha to the LlamaIndex reader.

The Execution Flow:

get_article_urls(..., url='/alpha') runs.
Finds link to /omega. Calls get_article_urls(..., url='/omega').
Finds link to /alpha. Calls get_article_urls(..., url='/alpha').
Finds link to /omega...

Since the vulnerable code has no visited list (a set of URLs already processed) AND no depth limit enforcement, this ping-pong continues instantly until the Python stack explodes.

This crashes the thread. If you are running this in a Celery worker or a synchronous API handler, that worker is dead. If you don't have robust supervisor processes, your service starts dropping requests.

Product

Affected Versions

Fixed Version

llama-index-readers-web

LlamaIndex

< 0.3.6

0.3.6

Attribute

Detail

CWE ID

CWE-674 (Uncontrolled Recursion)

CVSS v3.0

7.5 (High)

Attack Vector

Network

Impact

Denial of Service (DoS)

Exploit Status

POC Available

Patch Date

2025-02-27

CVE-2025-1752

7.50.06%

The Infinite Loop of Death: Crashing LlamaIndex with Simple Recursion

Alon Barad

Software Engineer

Feb 21, 2026·5 min read·4 visits

PoC Available

Executive Summary (TL;DR)

The `KnowledgeBaseWebReader` in LlamaIndex failed to increment or check recursion depth during web crawls. An attacker can supply a URL pointing to a page with circular links, causing the Python process to hit its recursion limit and crash (DoS). Fixed in `llama-index-readers-web` version 0.3.6.

In the race to feed Large Language Models (LLMs) with data, developers often overlook the basics of web crawling safety. CVE-2025-1752 is a stark reminder of this: a High-severity Denial of Service vulnerability in LlamaIndex's web reader component. By failing to track recursion depth during a crawl, the library allows attackers to trap the ingestion process in an infinite loop, leading to a stack exhaustion crash (`RecursionError`). This affects any application using `KnowledgeBaseWebReader` to ingest content from untrusted URLs.

Attack Flow Diagram

The Hook: Feeding the Beast

We live in the age of RAG (Retrieval-Augmented Generation). Everyone and their dog is building pipelines to scrape the internet, chunk the text, embed it, and shove it into a vector database so their AI chatbot knows about the latest company policy or news article. LlamaIndex is one of the premier frameworks for this orchestration. It's the shovel we use to feed the beast.

But here's the thing about shovels: if you aren't careful, you can whack yourself in the face. Specifically, the KnowledgeBaseWebReader component—designed to crawl knowledge bases and documentation sites—had a fatal flaw in how it walked the web. It assumed that web pages are trees. They aren't. The web is a graph, and graphs have cycles.

This vulnerability isn't complex memory corruption. It's not a buffer overflow in C. It's a logic error in Python that turns a simple URL ingestion task into a process-killing weapon. If you are allowing users to submit URLs for your AI to 'learn' from, you just handed them a kill switch for your worker nodes.

The Flaw: The Placebo Parameter

The vulnerability (CWE-674: Uncontrolled Recursion) lies in the get_article_urls method. The developers intended to limit the crawl depth—they even included a parameter named max_depth in the function signature. It looks safe on paper. You see max_depth=100 and think, 'Ah, good, it won't crawl the entire internet.'

But here is the punchline: they never actually checked or incremented a depth counter. They passed the static max_depth value down to every child call, but never passed the current depth.

Imagine telling a runner, 'Stop running after 10 miles,' but never giving them a watch or mile markers. They just keep running until they collapse. In computer science terms, this is the difference between a while loop with a broken condition and a recursive function without a base case. The Python interpreter, thankfully, has a fail-safe: sys.getrecursionlimit() (usually 1000). When the code hits that wall, it doesn't just stop gracefully; it raises a RecursionError and crashes the program.

The Code: The Smoking Gun

Let's look at the code. It is almost comical how close they were to getting it right, yet how completely they missed it.

The Vulnerable Code (Simplified):

def get_article_urls(self, browser, root_url, current_url, max_depth=100):
    # ... scraping logic ...
    for link in links:
        # CRITICAL FAIL: max_depth is passed, but no 'current_depth' is tracked.
        # The function has no idea how deep it is.
        article_urls.extend(
            self.get_article_urls(browser, root_url, link, max_depth)
        )

Do you see it? max_depth is a constant. If I call this with max_depth=10, the child calls it with max_depth=10. The grandchild calls it with max_depth=10. There is no depth counter increasing (0, 1, 2...).

The Fix (Commit 3c65db2):

# Now we track 'depth' and increment it.
def get_article_urls(self, browser, root_url, current_url, max_depth=100, depth=0):
    # The Base Case! Finally!
    if depth >= max_depth:
        return []
 
    # ... scraping logic ...
    for link in links:
        article_urls.extend(
            # Increment depth!
            self.get_article_urls(browser, root_url, link, max_depth, depth + 1)
        )

The fix is standard CS 101: add a depth argument, default it to 0, check if depth >= max_depth, and recurse with depth + 1.

The Exploit: Building the Ouroboros

To exploit this, we don't need advanced fuzzing or heap spraying. We just need two HTML files. Let's create a deadly trap for the KnowledgeBaseWebReader.

Step 1: The Trap

We set up a simple Flask server hosting two pages: /alpha and /omega.

/alpha contains a link to /omega.
/omega contains a link to /alpha.

Step 2: The Trigger

We feed http://attacker.com/alpha to the LlamaIndex reader.

The Execution Flow:

get_article_urls(..., url='/alpha') runs.
Finds link to /omega. Calls get_article_urls(..., url='/omega').
Finds link to /alpha. Calls get_article_urls(..., url='/alpha').
Finds link to /omega...

Since the vulnerable code has no visited list (a set of URLs already processed) AND no depth limit enforcement, this ping-pong continues instantly until the Python stack explodes.

The Mitigation: Stopping the Bleeding

The immediate fix is to upgrade llama-index-readers-web to version 0.3.6 or later. This introduces the depth tracking logic seen in the code section above.

> [!NOTE] > Researcher's Note: Even with the patch, the crawler doesn't seem to implement a visited set (deduplication) based on the diff analysis. While the depth parameter prevents infinite recursion crashes, a site with a massive number of links within the max_depth range could still cause performance degradation (a 'wide' traversal DoS rather than a 'deep' one). However, the crash vector is resolved.

Defensive Strategy:

Upgrade: pip install --upgrade llama-index-readers-web
Sanitize Inputs: Never trust user-provided URLs. Implement a blocklist for private IP ranges (prevent SSRF while you're at it) and consider checking the Content-Length or structure of the target page before fully committing resources.
Timeouts: Wrap your ingestion logic in a strict timeout. If a crawl takes longer than 60 seconds, kill it. Python's func_timeout or Celery's time limits are your friends here.

Official Patches

LlamaIndexRelease notes containing the fix

Fix Analysis (1)

Technical Appendix

CVSS Score

7.5/ 10

CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

EPSS Probability

0.06%

Top 82% most exploited

Affected Systems

LlamaIndex (Python Framework)llama-index-readers-webRAG Pipelines using Web Scraping

Affected Versions Detail

Product	Affected Versions	Fixed Version
llama-index-readers-web LlamaIndex	< 0.3.6	0.3.6

Attribute	Detail
CWE ID	CWE-674 (Uncontrolled Recursion)
CVSS v3.0	7.5 (High)
Attack Vector	Network
Impact	Denial of Service (DoS)
Exploit Status	POC Available
Patch Date	2025-02-27

MITRE ATT&CK Mapping

T1499Endpoint Denial of Service

Impact

CWE-674

Uncontrolled Recursion

The product does not properly control the amount of recursion that takes place, consuming excessive resources, such as memory or the program stack.

Known Exploits & Detection

GitHubProof of Concept demonstrating circular link crash

Vulnerability Timeline

Vulnerability patched in source code

2025-02-27

CVE-2025-1752 Published

2025-05-10

CVSS Score Updated

2025-10-15