Feb 28, 2026·6 min read·9 visits
pypdf versions before 6.7.4 contain a vulnerability in the RunLengthDecode filter that allows for unbounded memory allocation. By crafting a PDF with a malformed RLE stream, an attacker can crash the host application via OOM. The fix in version 6.7.4 introduces strict output size limits.
A resource exhaustion vulnerability exists in the pypdf library versions prior to 6.7.4, specifically within the RunLengthDecode filter implementation. The flaw allows attackers to trigger an infinite loop or excessive memory allocation via crafted PDF streams, leading to Denial of Service (DoS) through Out-Of-Memory (OOM) conditions. This issue affects automated PDF processing pipelines where untrusted files are parsed without strict resource limits.
The pypdf library is a widely used pure-Python PDF toolkit capable of splitting, merging, cropping, and transforming PDF files. It includes various filters to handle the decompression of data streams embedded within PDF documents. One such filter is RunLengthDecode, which implements a simple compression algorithm defined in the ISO 32000-1 specification.
CVE-2026-28351 identifies a critical flaw in how pypdf handles RunLengthDecode streams. The library failed to impose upper bounds on the size of the decompressed output. This omission allows a specifically crafted input stream—often very small in size—to expand into a disproportionately large amount of data in memory. This class of vulnerability is known as a "decompression bomb" or "zip bomb."
When a vulnerable application processes a malicious PDF, the pypdf decoder attempts to allocate memory for the expanded data until system resources are exhausted. This results in an Out-Of-Memory (OOM) crash, rendering the application or the entire host service unavailable. The vulnerability is tracked as CWE-400 (Uncontrolled Resource Consumption).
The root cause lies in the algorithmic implementation of the Run-Length Encoding (RLE) decoder in pypdf/filters.py. The RLE format uses a control byte n to determine how to process subsequent data. If n is between 129 and 255, the decoder repeats the next byte 257 - n times. This allows for compression of repeated data.
In affected versions, the decode function iterates through the input stream inside a while loop that continues until an End-of-Data (EOD) marker is reached. Crucially, the loop appends decoded bytes to a list (lst) without checking the cumulative size of that list against a safety threshold.
An attacker can exploit this by providing a stream consisting of repeated instructions to duplicate bytes. For example, the sequence 0x81 (decimal 129) followed by a single byte triggers the decoder to output 128 copies of that byte. By chaining these sequences, a small malicious payload can force the interpreter to construct a byte string gigabytes in size, exceeding the available RAM of the process.
The vulnerability is evident in the RunLengthDecode.decode static method. Below is the comparison between the vulnerable logic and the remediated code in version 6.7.4.
Vulnerable Code (pypdf < 6.7.4)
The loop purely follows the input instructions without any guardrails on len(lst) or the total bytes generated.
# pypdf/filters.py
@staticmethod
def decode(data: bytes, parameters: Optional[Dict[str, Any]] = None) -> bytes:
# ... (initialization)
while True:
# ... (read length byte)
if length < 128:
# Copy literal bytes
lst.append(data[index : index + length + 1])
index += length + 1
elif length > 128:
# Repeat next byte (257 - length) times
length = 257 - length
lst.append(bytes((data[index],)) * length) # Unbounded allocation
index += 1
# ...
return b"".join(lst)Patched Code (pypdf >= 6.7.4)
The fix introduces a constant RUN_LENGTH_MAX_OUTPUT_LENGTH (defaulting to 75MB) and tracks the total_length during iteration. If the decoded size exceeds this limit, the operation is aborted.
# pypdf/filters.py
# Security constant added
RUN_LENGTH_MAX_OUTPUT_LENGTH = 75_000_000
@staticmethod
def decode(data: bytes, parameters: Optional[Dict[str, Any]] = None) -> bytes:
# ... (initialization)
total_length = 0 # Accumulator for safety check
while True:
# ... (parsing logic)
# Update the accumulator
total_length += length
# Enforce the limit
if total_length > RUN_LENGTH_MAX_OUTPUT_LENGTH:
raise LimitReachedError("Limit reached while decompressing.")
# ... (append logic)
return b"".join(lst)Exploitation requires the attacker to submit a PDF file where an internal object (typically an image or a content stream) uses the RunLengthDecode filter. This is a standard PDF feature, so the presence of the filter itself is not suspicious.
The attacker constructs a stream payload designed to maximize the expansion ratio. In RLE, the byte 0x81 (decimal 129) is optimal for this purpose, as it commands the decoder to repeat the subsequent byte 128 times (calculated as 257 - 129).
Proof of Concept Logic:
from pypdf.filters import RunLengthDecode
# 1. Create a payload where every 2 bytes of input become 128 bytes of output.
# Expansion ratio: 64:1
runs = 1_000_000
encoded_payload = (b"\x81A" * runs) + b"\x80"
# 2. Input size: ~2 MB
# 3. Target Output size: 128 MB (128 bytes * 1,000,000)
# Triggers OOM in vulnerable versions; raises LimitReachedError in fixed versions.
RunLengthDecode.decode(encoded_payload)In a real-world attack, multiple such streams can be chained or nested to consume memory rapidly. Since pypdf is often used in web backends to process user-uploaded documents (e.g., for resizing, metadata extraction, or OCR preprocessing), a single malicious upload can crash the worker process handling the request.
The primary impact of CVE-2026-28351 is Denial of Service (DoS). The vulnerability allows for the exhaustion of system memory resources (RAM).
Operational Impact:
Severity Metrics:
AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:LThe vulnerability is patched in pypdf version 6.7.4. The fix involves a hardcoded safety limit for the RunLengthDecode filter.
Remediation Steps:
pip freeze or poetry show --tree) for pypdf versions below 6.7.4.pip install --upgrade pypdfMitigation / Workarounds:
If an immediate upgrade is not feasible, you can monkey-patch the RunLengthDecode.decode method in your application initialization code to include the length check, as shown in the patch analysis. Alternatively, implement strict file size limits on uploaded PDFs, although this is an imperfect defense as the malicious PDF can be small (high compression ratio).
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:L/SC:N/SI:N/SA:N| Product | Affected Versions | Fixed Version |
|---|---|---|
pypdf py-pdf | < 6.7.4 | 6.7.4 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-400 |
| CVSS v4.0 | 6.9 |
| Attack Vector | Network |
| Impact | Denial of Service (DoS) |
| Exploit Status | PoC Available |
| Fix Version | 6.7.4 |
The software does not properly control the allocation and maintenance of a limited resource, thereby enabling an actor to influence the amount of resources consumed, eventually leading to the exhaustion of available resources.