CVEReports
CVEReports

Automated vulnerability intelligence platform. Comprehensive reports for high-severity CVEs generated by AI.

Product

  • Home
  • Sitemap
  • RSS Feed

Company

  • About
  • Contact
  • Privacy Policy
  • Terms of Service

© 2026 CVEReports. All rights reserved.

Made with love by Amit Schendel & Alon Barad



GHSA-JJ6C-8H6C-HPPX
5.5

GHSA-JJ6C-8H6C-HPPX: Uncontrolled Resource Consumption in pypdf via Malformed PDF Streams

Amit Schendel
Amit Schendel
Senior Security Researcher

Apr 15, 2026·5 min read·6 visits

PoC Available

Executive Summary (TL;DR)

pypdf versions prior to 6.10.1 are vulnerable to Denial of Service (DoS) due to inadequate validation of xref and object stream sizes, allowing crafted PDFs to trigger unbounded resource consumption.

The pypdf library prior to version 6.10.1 contains a moderate-severity vulnerability related to the handling of cross-reference (xref) and object streams. The library fails to adequately validate the sizes of these streams against supplied metadata, leading to excessive iteration and uncontrolled resource consumption when parsing maliciously crafted PDF documents.

Vulnerability Overview

The pypdf library is a widely deployed Python package used for PDF manipulation and data extraction. Version 6.10.1 patches a vulnerability identified as GHSA-JJ6C-8H6C-HPPX, which involves uncontrolled resource consumption (CWE-400). Applications parsing untrusted PDF files are exposed to Denial of Service (DoS) attacks.

The flaw resides in the processing of cross-reference (xref) and object streams. Modern PDF files use these structures to compress and index internal objects efficiently. The vulnerability occurs because the library does not adequately validate the metadata size parameters of these streams against their physical content.

Exploitation results in extreme CPU utilization or memory exhaustion. An attacker requires no authentication, only the ability to supply a crafted PDF file to the target application. This creates a high-probability attack vector for applications supporting user uploads.

Root Cause Analysis

PDF cross-reference streams employ a width array (the /W entry) to define the byte lengths of integer fields within the stream data. The stream dictionary also specifies a /Size entry, declaring the total number of entries in the table. The pypdf parser uses these values to iterate over the stream data and extract object offsets.

Prior to version 6.10.1, the parser trusted the /Size parameter without corroborating it against the actual byte length of the stream data. A mismatch between the declared size and the physical data length triggered unbounded parsing loops. The parser relied entirely on the attacker-controlled dictionary values.

When an attacker supplies an artificially large /Size value, the parser enters an excessive iteration state (CWE-834). The implementation attempts to read and process data points that do not physically exist in the stream. This operation consumes memory and CPU cycles strictly proportionate to the manipulated integer value.

Code Analysis

The patch implemented in Pull Request #3733 introduces dynamic validation for stream sizes. The maintainer added logic to compute the maximum possible number of valid entries based on the physical length of the stream data. This bounds the iteration loop strictly to the available data.

The vulnerable implementation relied on the size variable derived directly from the PDF dictionary without verification. The patched version calculates a max_size value using integer division of the actual stream length by the sum of the width parameters. This mathematically guarantees the iteration cannot exceed the bounds of the actual byte stream.

# Vulnerable pattern equivalent
size = stream_dict.get("/Size")
widths = stream_dict.get("/W")
# Blind iteration based on attacker-supplied size
for i in range(size):
    process_stream_entry(stream_data, widths)
 
# Patched pattern equivalent (PR #3733)
size = stream_dict.get("/Size")
widths = stream_dict.get("/W")
total_width = sum(widths)
if total_width > 0:
    # Calculate physical maximum elements
    max_elements = len(stream_data) // total_width
    # Enforce strict bound
    safe_size = min(size, max_elements)
    for i in range(safe_size):
        process_stream_entry(stream_data, widths)

This modification ensures high backward compatibility. Valid PDFs with slightly malformed metadata continue to process up to the limit of their actual data. Maliciously inflated metadata fails to trigger the resource exhaustion condition.

Exploitation Mechanics

Exploitation requires constructing a minimal PDF file containing a malformed xref or object stream. The attacker modifies the stream dictionary to include a standard /W array and an exceptionally large /Size integer. The stream data itself remains minimal to bypass common application-level file size limits.

When the application receives the file, it invokes the pypdf.PdfReader class. The parsing engine reaches the malformed stream and attempts to parse the declared number of objects. The process enters an extended execution loop, blocking the executing thread immediately.

In Python environments using asynchronous I/O or single-threaded event loops, this blocking operation halts all concurrent request processing. A single malicious file submission degrades the performance of the entire application instance. No further exploitation steps are necessary.

Impact Assessment

The direct consequence of this vulnerability is an application-level Denial of Service. CPU-bound parsing operations consume 100% of the allocated processing core. Memory allocation scales with the iteration count, frequently triggering the operating system's Out-Of-Memory (OOM) killer.

Applications utilizing containerized environments face rapid pod eviction. Kubernetes or Docker orchestrators terminate the affected container due to memory threshold violations. The orchestrator subsequently restarts the pod, leading to severe service disruption if the malicious payload is continually reprocessed from a queue.

Data extraction services, invoice processing pipelines, and web applications generating previews are primary targets. The vulnerability does not compromise data confidentiality or system integrity. The attack solely disrupts availability, rendering the service inoperable for legitimate users.

Remediation and Mitigation

Organizations must upgrade the pypdf dependency to version 6.10.1 or later. This release contains the dynamic validation logic required to safely parse complex stream structures. The update addresses both xref and object stream parsing vectors completely.

Development teams unable to deploy the patch immediately should implement strict operational resource limits. Execute PDF parsing tasks in isolated, short-lived subprocesses rather than the main application thread. Apply strict memory limits and aggressive execution timeouts to these subprocesses using the resource module or container constraints.

Applications must avoid processing untrusted PDF files synchronously within web request handlers. Defer parsing operations to background task queues with automated failure recovery mechanisms. This architecture prevents a single malformed file from impacting the primary service API.

Official Patches

py-pdfFix Pull Request #3733
py-pdfGitHub Release v6.10.1

Technical Appendix

CVSS Score
5.5/ 10
CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:N/I:N/A:H

Affected Systems

Python web applications accepting PDF uploadsAutomated document processing pipelinesData extraction and indexing servicesServerless functions analyzing document content

Affected Versions Detail

Product
Affected Versions
Fixed Version
pypdf
py-pdf
< 6.10.16.10.1
AttributeDetail
Vulnerability TypeUncontrolled Resource Consumption
CWE IDsCWE-400, CWE-834
Attack VectorLocal / Remote via File Upload
ImpactDenial of Service (DoS)
Authentication RequiredNone
Affected Componentpypdf xref and object stream parser

MITRE ATT&CK Mapping

T1499Endpoint Denial of Service
Impact
T1499.004Endpoint Denial of Service: Application or System Exploitation
Impact
CWE-400
Uncontrolled Resource Consumption

The software does not properly control the allocation and maintenance of a limited resource, thereby enabling an attacker to influence the amount of resources consumed.

Vulnerability Timeline

Vulnerability identified and fixed by maintainer stefan6419846 via PR #3733
2025-02-01
pypdf Version 6.10.1 released containing the fix
2025-02-01
Public advisory GHSA-JJ6C-8H6C-HPPX published
2025-02-01

References & Sources

  • [1]GitHub Advisory GHSA-JJ6C-8H6C-HPPX
  • [2]Fix Pull Request: SEC: Limit the allowed size of xref and object streams
  • [3]pypdf v6.10.1 Release Notes
  • [4]Maintainer Profile: stefan6419846

Attack Flow Diagram

Press enter or space to select a node. You can then use the arrow keys to move the node around. Press delete to remove it and escape to cancel.
Press enter or space to select an edge. You can then press delete to remove it or escape to cancel.