Apr 15, 2026·5 min read·6 visits
pypdf versions prior to 6.10.1 are vulnerable to Denial of Service (DoS) due to inadequate validation of xref and object stream sizes, allowing crafted PDFs to trigger unbounded resource consumption.
The pypdf library prior to version 6.10.1 contains a moderate-severity vulnerability related to the handling of cross-reference (xref) and object streams. The library fails to adequately validate the sizes of these streams against supplied metadata, leading to excessive iteration and uncontrolled resource consumption when parsing maliciously crafted PDF documents.
The pypdf library is a widely deployed Python package used for PDF manipulation and data extraction. Version 6.10.1 patches a vulnerability identified as GHSA-JJ6C-8H6C-HPPX, which involves uncontrolled resource consumption (CWE-400). Applications parsing untrusted PDF files are exposed to Denial of Service (DoS) attacks.
The flaw resides in the processing of cross-reference (xref) and object streams. Modern PDF files use these structures to compress and index internal objects efficiently. The vulnerability occurs because the library does not adequately validate the metadata size parameters of these streams against their physical content.
Exploitation results in extreme CPU utilization or memory exhaustion. An attacker requires no authentication, only the ability to supply a crafted PDF file to the target application. This creates a high-probability attack vector for applications supporting user uploads.
PDF cross-reference streams employ a width array (the /W entry) to define the byte lengths of integer fields within the stream data. The stream dictionary also specifies a /Size entry, declaring the total number of entries in the table. The pypdf parser uses these values to iterate over the stream data and extract object offsets.
Prior to version 6.10.1, the parser trusted the /Size parameter without corroborating it against the actual byte length of the stream data. A mismatch between the declared size and the physical data length triggered unbounded parsing loops. The parser relied entirely on the attacker-controlled dictionary values.
When an attacker supplies an artificially large /Size value, the parser enters an excessive iteration state (CWE-834). The implementation attempts to read and process data points that do not physically exist in the stream. This operation consumes memory and CPU cycles strictly proportionate to the manipulated integer value.
The patch implemented in Pull Request #3733 introduces dynamic validation for stream sizes. The maintainer added logic to compute the maximum possible number of valid entries based on the physical length of the stream data. This bounds the iteration loop strictly to the available data.
The vulnerable implementation relied on the size variable derived directly from the PDF dictionary without verification. The patched version calculates a max_size value using integer division of the actual stream length by the sum of the width parameters. This mathematically guarantees the iteration cannot exceed the bounds of the actual byte stream.
# Vulnerable pattern equivalent
size = stream_dict.get("/Size")
widths = stream_dict.get("/W")
# Blind iteration based on attacker-supplied size
for i in range(size):
process_stream_entry(stream_data, widths)
# Patched pattern equivalent (PR #3733)
size = stream_dict.get("/Size")
widths = stream_dict.get("/W")
total_width = sum(widths)
if total_width > 0:
# Calculate physical maximum elements
max_elements = len(stream_data) // total_width
# Enforce strict bound
safe_size = min(size, max_elements)
for i in range(safe_size):
process_stream_entry(stream_data, widths)This modification ensures high backward compatibility. Valid PDFs with slightly malformed metadata continue to process up to the limit of their actual data. Maliciously inflated metadata fails to trigger the resource exhaustion condition.
Exploitation requires constructing a minimal PDF file containing a malformed xref or object stream. The attacker modifies the stream dictionary to include a standard /W array and an exceptionally large /Size integer. The stream data itself remains minimal to bypass common application-level file size limits.
When the application receives the file, it invokes the pypdf.PdfReader class. The parsing engine reaches the malformed stream and attempts to parse the declared number of objects. The process enters an extended execution loop, blocking the executing thread immediately.
In Python environments using asynchronous I/O or single-threaded event loops, this blocking operation halts all concurrent request processing. A single malicious file submission degrades the performance of the entire application instance. No further exploitation steps are necessary.
The direct consequence of this vulnerability is an application-level Denial of Service. CPU-bound parsing operations consume 100% of the allocated processing core. Memory allocation scales with the iteration count, frequently triggering the operating system's Out-Of-Memory (OOM) killer.
Applications utilizing containerized environments face rapid pod eviction. Kubernetes or Docker orchestrators terminate the affected container due to memory threshold violations. The orchestrator subsequently restarts the pod, leading to severe service disruption if the malicious payload is continually reprocessed from a queue.
Data extraction services, invoice processing pipelines, and web applications generating previews are primary targets. The vulnerability does not compromise data confidentiality or system integrity. The attack solely disrupts availability, rendering the service inoperable for legitimate users.
Organizations must upgrade the pypdf dependency to version 6.10.1 or later. This release contains the dynamic validation logic required to safely parse complex stream structures. The update addresses both xref and object stream parsing vectors completely.
Development teams unable to deploy the patch immediately should implement strict operational resource limits. Execute PDF parsing tasks in isolated, short-lived subprocesses rather than the main application thread. Apply strict memory limits and aggressive execution timeouts to these subprocesses using the resource module or container constraints.
Applications must avoid processing untrusted PDF files synchronously within web request handlers. Defer parsing operations to background task queues with automated failure recovery mechanisms. This architecture prevents a single malformed file from impacting the primary service API.
CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:N/I:N/A:H| Product | Affected Versions | Fixed Version |
|---|---|---|
pypdf py-pdf | < 6.10.1 | 6.10.1 |
| Attribute | Detail |
|---|---|
| Vulnerability Type | Uncontrolled Resource Consumption |
| CWE IDs | CWE-400, CWE-834 |
| Attack Vector | Local / Remote via File Upload |
| Impact | Denial of Service (DoS) |
| Authentication Required | None |
| Affected Component | pypdf xref and object stream parser |
The software does not properly control the allocation and maintenance of a limited resource, thereby enabling an attacker to influence the amount of resources consumed.