Executive Summary (TL;DR)

The `pypdf` library (< 6.6.2) fails to detect cycles when parsing PDF outlines (bookmarks). An attacker can craft a malicious PDF where bookmark A points to bookmark B, and bookmark B points back to A, causing the parser to enter an infinite loop. This effectively hangs the application, consuming all available CPU resources.

To understand the flaw, we have to look at how PDF outlines are structured. Internally, a PDF outline is a collection of dictionary objects. Each object points to its neighbors using keys like /First (the first child), /Last (the last child), /Next (the next sibling), and /Parent (the parent node).

It looks something like this in a healthy file:

The vulnerability lies in pypdf/_doc_common.py, specifically in the _get_outline method. This function is responsible for traversing this structure to build a Python-friendly list of bookmarks. It uses a while True loop to iterate through siblings via the /Next key and recursive calls to handle children via /First.

The fatal mistake? The code assumed that following /Next pointers would eventually lead to a null or missing value, terminating the loop. It did not track which nodes it had already visited. If a malicious actor hand-edits a PDF so that NodeB's /Next pointer goes back to NodeA, the parser happily obliges, running in circles forever.

This is a textbook Denial of Service (DoS) via resource exhaustion. It’s stateless, requires no authentication, and can be triggered by simply asking the server to "read the bookmarks" of a tiny, 1KB PDF file.

# VULNERABLE CODE (< 6.6.2) def _get_outline(self, node, outline): while True: # 1. Process the current node outline_obj = self._build_outline_item(node) outline.append(outline_obj) # 2. Check for children (Recursion) if "/First" in node: sub_outline = [] self._get_outline(node["/First"], sub_outline) outline.append(sub_outline) # 3. Move to next sibling if "/Next" in node: node = node["/Next"] # <--- THE TRAP else: break

# PATCHED CODE (6.6.2+) def _get_outline(self, node, outline, visited=None): if visited is None: visited = set() while True: # 1. CYCLE DETECTION node_id = id(node) if node_id in visited: logger.warning(f"Detected cycle in outline for {node}") break visited.add(node_id) # ... processing code ... if "/First" in node: # Pass a COPY of visited to children to allow valid DAGs # but prevent cycles down the tree self._get_outline(..., visited=visited.copy())

Product

Affected Versions

Fixed Version

pypdf

py-pdf

< 6.6.2

6.6.2

Attribute

Detail

Vulnerability ID

CVE-2026-24688

CWE ID

CWE-835

Type

Infinite Loop / DoS

CVSS

7.5 (High)

Attack Vector

Network (File Upload)

Patch Date

2026-01-26

CVE-2026-24688

7.5

Ouroboros in the Outline: Infinite Loops in pypdf (CVE-2026-24688)

Amit Schendel

Senior Security Researcher

Jan 27, 2026·6 min read·23 visits

PoC Available

Executive Summary (TL;DR)

A Denial of Service (DoS) vulnerability in the popular `pypdf` library allows attackers to trigger an infinite loop by crafting a PDF with cyclic outline references. This results in 100% CPU utilization and application hangs.

Attack Flow Diagram

The Hook: A Snake Eating Its Own Tail

PDF parsing is a thankless job. You are essentially writing code to interpret a file format that is less of a document and more of a serialized hallucination of Adobe engineers from the 1990s. One of the most common pitfalls in parsing complex, hierarchical data structures is assuming that a tree is actually a tree.

In CVE-2026-24688, we look at pypdf, a wildly popular pure-Python library used for everything from splitting pages to extracting text. The vulnerability here isn't a buffer overflow or a remote code execution via pickle serialization. It's a logic error—a classic infinite loop caused by trusting the input.

The specific component at fault is the Outline parser. In PDF terminology, 'Outlines' are what users see as Bookmarks. They are a navigational aid. But to a parser, they are a linked list of dictionary objects. And whenever you have a linked list provided by untrusted user input, you have to ask yourself one question: 'What happens if this list is actually a circle?'

For pypdf versions prior to 6.6.2, the answer to that question was 'I will run until the heat death of the universe or until the sysadmin kills the process.'

The Flaw: Trusting the Linked List

It looks something like this in a healthy file:

The Code: The Smoking Gun

Let's look at the vulnerable code in pypdf/_doc_common.py. I've stripped it down to the essentials to highlight the logic flaw.

# VULNERABLE CODE (< 6.6.2)
def _get_outline(self, node, outline):
    while True:
        # 1. Process the current node
        outline_obj = self._build_outline_item(node)
        outline.append(outline_obj)
        
        # 2. Check for children (Recursion)
        if "/First" in node:
            sub_outline = []
            self._get_outline(node["/First"], sub_outline)
            outline.append(sub_outline)
            
        # 3. Move to next sibling
        if "/Next" in node:
            node = node["/Next"] # <--- THE TRAP
        else:
            break

Notice the node = node["/Next"] line inside a while True loop. There is absolutely no guardrail here. If node["/Next"] is the node itself, node never changes, and the loop spins tight. If it points to a previous node, it loops wide.

The fix, introduced in version 6.6.2, is elegant in its simplicity. It introduces a visited set that tracks the memory IDs of the processed objects. If we see an ID we've already processed in the current chain, we bail out.

# PATCHED CODE (6.6.2+)
def _get_outline(self, node, outline, visited=None):
    if visited is None:
        visited = set()
        
    while True:
        # 1. CYCLE DETECTION
        node_id = id(node)
        if node_id in visited:
            logger.warning(f"Detected cycle in outline for {node}")
            break
        visited.add(node_id)
 
        # ... processing code ...
 
        if "/First" in node:
             # Pass a COPY of visited to children to allow valid DAGs
             # but prevent cycles down the tree
             self._get_outline(..., visited=visited.copy())

The Exploit: Crafting the Ouroboros

Exploiting this doesn't require complex heap spraying or ROP chains. It requires a text editor. PDFs are partially ASCII, and their structure is defined in plain text blocks.

Here is how an attacker constructs a "bomb":

Create a standard PDF. Any "Hello World" PDF will do.
Locate the Outline dictionary. It usually looks like << /Type /Outlines ... >>.
Inject a Cycle. We create two objects, 5 and 6, and link them together eternally.

5 0 obj
<<
  /Title (Bookmark A)
  /Parent 4 0 R
  /Next 6 0 R  % Points to B
>>
endobj
 
6 0 obj
<<
  /Title (Bookmark B)
  /Parent 4 0 R
  /Prev 5 0 R
  /Next 5 0 R  % Points back to A!
>>
endobj

When pypdf hits Object 5, it follows /Next to Object 6. It processes Object 6, follows /Next back to Object 5. Repeat ad infinitum.

The impact is immediate. If this runs in a web worker (e.g., a "Upload your Resume" feature that extracts text or checks page counts), that worker thread hangs at 100% CPU. If you upload 10 of these files, you take down 10 workers. It is a highly asymmetric attack: trivial to generate, expensive to mitigate without the patch.

The Fix: Mitigation & Defense

The remediation is straightforward: Update pypdf to version 6.6.2 or later.

However, this vulnerability highlights a broader issue in handling complex file formats. If you are processing files from untrusted sources, relying solely on library patches is often a game of whack-a-mole.

Defense in Depth Strategies:

Timeouts: Never let a file parsing job run indefinitely. Wrap your parsing logic in a timeout block (e.g., Python's func_timeout or generic task queue timeouts in Celery/RQ). If parsing a 2MB PDF takes more than 10 seconds, kill it.
Resource Limits: Run your parser in a containerized environment (Docker/Kubernetes) with strict CPU and RAM limits. This prevents a single hung process from starving the entire host.
Input Validation: While you can't easily detect cycles without parsing, you can validate file headers and enforce maximum recursion depths if you are wrapping the library yourself.