PDF cross-reference streams employ a width array (the /W entry) to define the byte lengths of integer fields within the stream data. The stream dictionary also specifies a /Size entry, declaring the total number of entries in the table. The pypdf parser uses these values to iterate over the stream data and extract object offsets.

Prior to version 6.10.1, the parser trusted the /Size parameter without corroborating it against the actual byte length of the stream data. A mismatch between the declared size and the physical data length triggered unbounded parsing loops. The parser relied entirely on the attacker-controlled dictionary values.

When an attacker supplies an artificially large /Size value, the parser enters an excessive iteration state (CWE-834). The implementation attempts to read and process data points that do not physically exist in the stream. This operation consumes memory and CPU cycles strictly proportionate to the manipulated integer value.

# Vulnerable pattern equivalent size = stream_dict.get("/Size") widths = stream_dict.get("/W") # Blind iteration based on attacker-supplied size for i in range(size): process_stream_entry(stream_data, widths) # Patched pattern equivalent (PR #3733) size = stream_dict.get("/Size") widths = stream_dict.get("/W") total_width = sum(widths) if total_width > 0: # Calculate physical maximum elements max_elements = len(stream_data) // total_width # Enforce strict bound safe_size = min(size, max_elements) for i in range(safe_size): process_stream_entry(stream_data, widths)

Product

Affected Versions

Fixed Version

pypdf

py-pdf

< 6.10.1

6.10.1

Attribute

Detail

Vulnerability Type

Uncontrolled Resource Consumption

CWE IDs

CWE-400, CWE-834

Attack Vector

Local / Remote via File Upload

Impact

Denial of Service (DoS)

Authentication Required

None

Affected Component

pypdf xref and object stream parser

GHSA-JJ6C-8H6C-HPPX

GHSA-JJ6C-8H6C-HPPX: Uncontrolled Resource Consumption in pypdf via Malformed PDF Streams

Amit Schendel

Senior Security Researcher

Apr 15, 2026·5 min read·26 visits

Executive Summary (TL;DR)

pypdf versions prior to 6.10.1 are vulnerable to Denial of Service (DoS) due to inadequate validation of xref and object stream sizes, allowing crafted PDFs to trigger unbounded resource consumption.

The pypdf library prior to version 6.10.1 contains a moderate-severity vulnerability related to the handling of cross-reference (xref) and object streams. The library fails to adequately validate the sizes of these streams against supplied metadata, leading to excessive iteration and uncontrolled resource consumption when parsing maliciously crafted PDF documents.

Attack Flow Diagram

Vulnerability Overview

The pypdf library is a widely deployed Python package used for PDF manipulation and data extraction. Version 6.10.1 patches a vulnerability identified as GHSA-JJ6C-8H6C-HPPX, which involves uncontrolled resource consumption (CWE-400). Applications parsing untrusted PDF files are exposed to Denial of Service (DoS) attacks.

The flaw resides in the processing of cross-reference (xref) and object streams. Modern PDF files use these structures to compress and index internal objects efficiently. The vulnerability occurs because the library does not adequately validate the metadata size parameters of these streams against their physical content.

Exploitation results in extreme CPU utilization or memory exhaustion. An attacker requires no authentication, only the ability to supply a crafted PDF file to the target application. This creates a high-probability attack vector for applications supporting user uploads.

Root Cause Analysis

Code Analysis

The patch implemented in Pull Request #3733 introduces dynamic validation for stream sizes. The maintainer added logic to compute the maximum possible number of valid entries based on the physical length of the stream data. This bounds the iteration loop strictly to the available data.

The vulnerable implementation relied on the size variable derived directly from the PDF dictionary without verification. The patched version calculates a max_size value using integer division of the actual stream length by the sum of the width parameters. This mathematically guarantees the iteration cannot exceed the bounds of the actual byte stream.

# Vulnerable pattern equivalent
size = stream_dict.get("/Size")
widths = stream_dict.get("/W")
# Blind iteration based on attacker-supplied size
for i in range(size):
    process_stream_entry(stream_data, widths)
 
# Patched pattern equivalent (PR #3733)
size = stream_dict.get("/Size")
widths = stream_dict.get("/W")
total_width = sum(widths)
if total_width > 0:
    # Calculate physical maximum elements
    max_elements = len(stream_data) // total_width
    # Enforce strict bound
    safe_size = min(size, max_elements)
    for i in range(safe_size):
        process_stream_entry(stream_data, widths)

This modification ensures high backward compatibility. Valid PDFs with slightly malformed metadata continue to process up to the limit of their actual data. Maliciously inflated metadata fails to trigger the resource exhaustion condition.

Exploitation Mechanics

Exploitation requires constructing a minimal PDF file containing a malformed xref or object stream. The attacker modifies the stream dictionary to include a standard /W array and an exceptionally large /Size integer. The stream data itself remains minimal to bypass common application-level file size limits.

When the application receives the file, it invokes the pypdf.PdfReader class. The parsing engine reaches the malformed stream and attempts to parse the declared number of objects. The process enters an extended execution loop, blocking the executing thread immediately.

In Python environments using asynchronous I/O or single-threaded event loops, this blocking operation halts all concurrent request processing. A single malicious file submission degrades the performance of the entire application instance. No further exploitation steps are necessary.

Impact Assessment

The direct consequence of this vulnerability is an application-level Denial of Service. CPU-bound parsing operations consume 100% of the allocated processing core. Memory allocation scales with the iteration count, frequently triggering the operating system's Out-Of-Memory (OOM) killer.

Applications utilizing containerized environments face rapid pod eviction. Kubernetes or Docker orchestrators terminate the affected container due to memory threshold violations. The orchestrator subsequently restarts the pod, leading to severe service disruption if the malicious payload is continually reprocessed from a queue.

Data extraction services, invoice processing pipelines, and web applications generating previews are primary targets. The vulnerability does not compromise data confidentiality or system integrity. The attack solely disrupts availability, rendering the service inoperable for legitimate users.

Remediation and Mitigation

Organizations must upgrade the pypdf dependency to version 6.10.1 or later. This release contains the dynamic validation logic required to safely parse complex stream structures. The update addresses both xref and object stream parsing vectors completely.

Development teams unable to deploy the patch immediately should implement strict operational resource limits. Execute PDF parsing tasks in isolated, short-lived subprocesses rather than the main application thread. Apply strict memory limits and aggressive execution timeouts to these subprocesses using the resource module or container constraints.

Applications must avoid processing untrusted PDF files synchronously within web request handlers. Defer parsing operations to background task queues with automated failure recovery mechanisms. This architecture prevents a single malformed file from impacting the primary service API.

Official Patches

py-pdfFix Pull Request #3733

py-pdfGitHub Release v6.10.1

Technical Appendix

CVSS Score

5.5/ 10

CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:N/I:N/A:H

Affected Systems

Python web applications accepting PDF uploadsAutomated document processing pipelinesData extraction and indexing servicesServerless functions analyzing document content

Affected Versions Detail

Product	Affected Versions	Fixed Version
pypdf py-pdf	< 6.10.1	6.10.1

Attribute	Detail
Vulnerability Type	Uncontrolled Resource Consumption
CWE IDs	CWE-400, CWE-834
Attack Vector	Local / Remote via File Upload
Impact	Denial of Service (DoS)
Authentication Required	None
Affected Component	pypdf xref and object stream parser

MITRE ATT&CK Mapping

T1499Endpoint Denial of Service

Impact

T1499.004Endpoint Denial of Service: Application or System Exploitation

Impact

CWE-400

Uncontrolled Resource Consumption

The software does not properly control the allocation and maintenance of a limited resource, thereby enabling an attacker to influence the amount of resources consumed.

Vulnerability Timeline

Vulnerability identified and fixed by maintainer stefan6419846 via PR #3733

2025-02-01

pypdf Version 6.10.1 released containing the fix

2025-02-01

Public advisory GHSA-JJ6C-8H6C-HPPX published

2025-02-01

More Reports

•42 minutes ago•GHSA-PQG7-V6WH-3PFP

8.5

GHSA-pqg7-v6wh-3pfp: IP Spoofing and Access Control Bypass via HTTP Header Injection in TsDProxy

A high-severity input sanitization and header injection vulnerability in TsDProxy allows authenticated Tailscale users to inject arbitrary values into the X-Forwarded-For and X-Real-IP HTTP headers. Because downstream backend services frequently trust these headers to resolve client identities, attackers can exploit this flaw to bypass IP-based access control lists, audit logs, and geo-blocking restrictions.

Amit Schendel

2 views•5 min read

•about 1 hour ago•CVE-2026-54448

6.5

CVE-2026-54448: Denial of Service in Trivy Helm Chart Parser via Decompression Bomb

CVE-2026-54448 is a critical denial of service vulnerability in Trivy's Infrastructure-as-Code (IaC) misconfiguration scanning engine. Prior to version 0.71.0, Trivy utilized a custom archive parser to unpack Helm chart tarballs (.tgz) during automated scans. This custom implementation iterated through compressed files and loaded their entire raw contents into system memory using the io.ReadAll function without implementing size limits or threshold checks, enabling an attacker to trigger an immediate heap-allocation crash or system Out-of-Memory (OOM) termination using a decompression bomb.

Alon Barad

3 views•7 min read

•about 2 hours ago•GHSA-9HC2-HJX8-Q6PV

9.6

GHSA-9HC2-HJX8-Q6PV: Remote Code Execution in TidGi Desktop via Malicious TiddlyWiki Repository Import

A critical remote code execution vulnerability exists in TidGi Desktop up to version 0.13.0. The flaw allows an attacker to execute arbitrary code with Node.js privileges when a user imports or clones a malicious TiddlyWiki repository. This occurs due to the automatic execution of 'startup' modules defined in user-imported tiddler files.

Alon Barad

3 views•5 min read

•about 2 hours ago•GHSA-MQXV-9RM6-W8QC

8.7

GHSA-MQXV-9RM6-W8QC: CPU Exhaustion in Ech0 i18n Middleware via Accept-Language Header Parser Bypass

A critical Denial of Service (DoS) vulnerability in the Ech0 publishing platform allows unauthenticated remote attackers to exhaust CPU resources via a crafted Accept-Language header. By utilizing underscore separators instead of hyphens, the attack bypasses the CVE-2022-32149 guard within the Go language tag parser, triggering a quadratic-time complexity operation.

Amit Schendel

4 views•5 min read

•about 3 hours ago•GHSA-HGJX-R89M-M7V4

9.9

GHSA-HGJX-R89M-M7V4: Remote Code Execution via Path Traversal in FacturaScripts

FacturaScripts is an open-source PHP-based enterprise resource planning (ERP) and billing software. A critical path traversal vulnerability in the file-handling logic allows authenticated attackers with file upload permissions to write arbitrary files to any location on the system writable by the web server user. By writing custom server configuration files (.htaccess) to directories excluded from default rewrite rules, attackers can map allowed file types (like .png) to the PHP interpreter, leading to full remote code execution.

Amit Schendel

5 views•7 min read

•about 4 hours ago•GHSA-7RX3-5WX3-5V76

7.7

GHSA-7rx3-5wx3-5v76: Missing Authorization in Nebula-mesh Webhook Subscription API Enables Server-Side Request Forgery

Nebula-mesh allows non-admin operators to disable webhook SSRF (Server-Side Request Forgery) protection via the allow_private parameter. Low-privilege operators can configure webhook endpoints targeting internal endpoints and trigger lifecycle events on resources they own, bypassing network access controls.

Amit Schendel

6 views•4 min read