CVEReports
CVEReports

Automated vulnerability intelligence platform. Comprehensive reports for high-severity CVEs generated by AI.

Product

  • Home
  • Dashboard
  • Sitemap
  • RSS Feed

Company

  • About
  • Contact
  • Privacy Policy
  • Terms of Service

© 2026 CVEReports. All rights reserved.

Made with love by Amit Schendel & Alon Barad



CVE-2026-27628
7.50.04%

The Ouroboros Document: Infinite Loops in pypdf

Alon Barad
Alon Barad
Software Engineer

Feb 25, 2026·5 min read·5 visits

PoC Available

Executive Summary (TL;DR)

pypdf < 6.7.2 fails to track visited offsets when parsing PDF cross-reference tables. A malicious PDF with a `/Prev` pointer referencing an earlier byte offset creates an infinite loop, causing permanent CPU exhaustion.

A critical Denial of Service (DoS) vulnerability exists in the `pypdf` library, a ubiquitous tool for PDF manipulation in the Python ecosystem. By crafting a PDF with a circular cross-reference (xref) chain, an attacker can trap the parser in an infinite loop. This results in immediate 100% CPU utilization and process hang, potentially taking down document processing pipelines, web services, or serverless functions.

The Hook: Parsing the Unparsable

PDFs are not documents; they are containers of sorrow. The PDF specification is a sprawling, decades-old beast that supports features most people have never heard of, including incremental updates. When you edit a PDF, the software doesn't necessarily rewrite the whole file. Instead, it appends a new 'body' to the end, containing the changed objects and a new 'cross-reference' (xref) section.

To read the file, a parser starts at the end (the trailer) and works its way backward, following pointers to previous versions of the document. This is handled by the /Prev key in the trailer dictionary, which points to the byte offset of the previous xref table.

pypdf, one of the most popular Python libraries for handling PDFs, is tasked with traversing this chain to build a complete map of the document. It’s a standard linked-list traversal problem. But as any computer science freshman knows, if you traverse a linked list without checking for cycles, you are one bad pointer away from eternity. That is exactly what happened here.

The Flaw: Trusting the Trail

The vulnerability lies in the _read_xref_tables_and_trailers method within pypdf/_reader.py. This function is responsible for reconstructing the document structure by hopping from the most recent xref table to the oldest one via the /Prev attribute.

The logic was deceptively simple: start at the startxref offset, parse the table, look for a /Prev key, update startxref to that new value, and repeat until startxref is None. It looks standard, but it lacks a critical defensive programming concept: distrust.

The code blindly assumed that the chain of /Prev pointers would eventually terminate or at least move linearly backward through the file. It did not account for a malicious PDF where the trailer at offset 1000 points to a trailer at offset 500, which in turn points back to offset 1000. Once the parser enters this loop, it spins forever, repeatedly re-parsing and re-caching the same objects, consuming 100% of the CPU core until the process is killed or the universe ends.

The Code: The Smoking Gun

Let's look at the vulnerable code. It's a textbook example of an infinite loop waiting to happen. The variable startxref is the control condition, but there is no history of where we've been.

# Vulnerable implementation in pypdf/_reader.py
 
def _read_xref_tables_and_trailers(self, stream, startxref):
    # ... initialization ...
    while startxref is not None:
        # The parser seeks to the offset blindly
        stream.seek(startxref, 0)
        
        # ... complex parsing logic ...
        
        # The parser reads the next link in the chain
        # If this points to an offset we just visited, we are doomed.
        startxref = trailer.get("/Prev")

The fix, introduced in version 6.7.2, is elegant and standard: keep a set of visited offsets. If we see an offset again, we know we are in a loop, and we bail out.

# Patched implementation
 
def _read_xref_tables_and_trailers(self, stream, startxref):
    # ... initialization ...
    visited_xref_offsets: set[int] = set() # [!code ++]
    
    while startxref is not None:
        # Check if we've been here before
        if startxref in visited_xref_offsets: # [!code ++]
            logger_warning( # [!code ++]
                f"Circular xref chain detected at offset {startxref}, stopping", # [!code ++]
                __name__, # [!code ++]
            ) # [!code ++]
            break # [!code ++]
            
        visited_xref_offsets.add(startxref) # [!code ++]
        
        stream.seek(startxref, 0)
        # ... parsing logic ...

This simple addition of visited_xref_offsets completely neutralizes the attack. It transforms an infinite loop into a logged warning.

The Exploit: Crafting the Ouroboros

Exploiting this is trivial if you understand raw PDF syntax. We don't need complex heap massaging or ROP chains; we just need a text editor. A PDF trailer usually looks like this:

trailer
<< /Size 10 /Root 1 0 R /Prev 400 >>
startxref
1000
%%EOF

In a valid file, the /Prev 400 implies there is another cross-reference table at byte offset 400. To exploit pypdf, we construct a file where the cross-reference table points to itself, or points to a second table that points back to the first.

Here is the logic flow of the attack:

  1. Define a valid PDF structure (Header, Body, Xref).
  2. Calculate the byte offset of the xref keyword (let's say it's at offset 550).
  3. In the trailer dictionary, set /Prev 550.
  4. Normally, startxref points to 550. The parser reads the table, finds /Prev 550, sets startxref to 550, and loops. And loops. And loops.

The PoC provided in the repository does exactly this using Python formatting to inject the calculated offsets dynamically. It creates a 'self-referential' PDF that is syntactically valid enough to trick the parser into the loop but logically broken.

The Impact: Death by Parsing

You might think, "It's just an infinite loop, who cares?" In the age of automated document processing, you should care deeply. Imagine a backend service that accepts user-uploaded resumes or invoices. This service likely spins up a worker process (or an AWS Lambda) to parse the PDF, extract text, or render a thumbnail.

If an attacker uploads a 5KB malformed PDF:

  1. Resource Exhaustion: The worker process hits 100% CPU immediately.
  2. Thread Starvation: If the application is synchronous (common in simple Flask/Django apps), that worker is dead. It can't handle other requests.
  3. Bill Shock: If this is running in a serverless environment (Lambda/Cloud Run) with a high timeout (e.g., 15 minutes), you are paying for max compute for the full duration of the timeout, for every single malicious request.
  4. DoS: Send 50 of these files concurrently, and you can lock up an entire fleet of workers with almost zero bandwidth cost.

Official Patches

GitHubCommit 0fbd959 fixing the infinite loop

Fix Analysis (1)

Technical Appendix

CVSS Score
7.5/ 10
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H
EPSS Probability
0.04%
Top 88% most exploited

Affected Systems

pypdf < 6.7.2Python applications processing untrusted PDFs

Affected Versions Detail

Product
Affected Versions
Fixed Version
pypdf
py-pdf
< 6.7.26.7.2
AttributeDetail
CWECWE-835 (Infinite Loop)
CVSS v3.17.5 (High)
Attack VectorNetwork (via file upload)
ImpactDenial of Service (CPU Exhaustion)
Exploit StatusPoC Available
EPSS Score0.04%

MITRE ATT&CK Mapping

T1499.003Endpoint Denial of Service: OS Resource Exhaustion
Impact
CWE-835
Infinite Loop

Loop with Unreachable Exit Condition ('Infinite Loop')

Known Exploits & Detection

GitHubOriginal issue report containing the circular reference PoC

Vulnerability Timeline

Issue Reported on GitHub
2026-02-21
Fix Committed
2026-02-21
pypdf v6.7.2 Released
2026-02-22
CVE Published
2026-02-25

References & Sources

  • [1]GHSA Advisory
  • [2]NVD CVE-2026-27628

Attack Flow Diagram

Press enter or space to select a node. You can then use the arrow keys to move the node around. Press delete to remove it and escape to cancel.
Press enter or space to select an edge. You can then press delete to remove it or escape to cancel.