Feb 26, 2026·5 min read·4 visits
LangGraph's checkpointing system enabled `pickle_fallback=True` by default. Attackers who can write to the cache (Redis/Filesystem) can inject malicious pickle payloads. When the application reads the cache, the payload executes, resulting in RCE.
LangGraph, the brain behind many stateful LLM agents, contained a critical deserialization vulnerability in its caching layer. By defaulting to Python's insecure `pickle` module for fallback serialization, the library opened a backdoor for attackers with write access to the cache backend (like Redis) to execute arbitrary code on the application server. It turns out that trusting serialized data from your cache is just as dangerous as trusting user input directly.
LangGraph is the cool kid on the block for building stateful, multi-actor applications with LLMs. It manages the "memory" of your AI agents, allowing them to pause, resume, and loop through complex tasks. To make this performant, developers often use caching. You don't want your expensive LLM to re-think the same thought twice, right?
So, LangGraph provides a BaseCache interface. It sits there, quietly storing node states and results in backends like Redis, Postgres, or the filesystem. It’s the perfect optimization. But as with all optimizations in software engineering, there is a trade-off. In this case, the trade-off was security for convenience.
The library's developers wanted to ensure that anything you threw at the cache could be stored. JSON is great, but it can't handle complex Python objects. The solution? Allow the cache to fall back to Python's built-in pickle serialization if JSON fails. If you know anything about Python security, you know that pickle is essentially eval() in a trench coat.
The vulnerability (CWE-502) lies in the langgraph-checkpoint package. Specifically, the BaseCache class initialized its serializer with a fatal configuration default. The JsonPlusSerializer was told to use pickle_fallback=True.
Here is the logic: The application tries to read data from the cache. The serializer attempts to parse it as JSON or msgpack. If that fails, or if the data has specific magic bytes indicating a pickle stream, it passes the data to pickle.loads(). This is a classic "Insecure Deserialization" flaw.
Why is this bad? Because pickle allows object reconstruction to trigger arbitrary code execution during the unpickling process. It doesn't wait for you to call a method on the object; the mere act of loading it triggers the payload. If an attacker can poison the cache—say, by compromising a shared Redis instance or writing to a shared file—they can turn that cache read into a Remote Code Execution (RCE) event.
Let's look at the diff. It’s almost comical how small the change is versus how massive the implication is. The vulnerable code lived in libs/checkpoint/langgraph/cache/base/__init__.py.
The Vulnerable Code (< 4.0.0):
class BaseCache(ABC, Generic[ValueT]):
"""Base class for a cache."""
# The road to hell is paved with good intentions (and defaults)
serde: SerializerProtocol = JsonPlusSerializer(pickle_fallback=True)That True flag is the culprit. It tells the serializer: "If you don't understand these bytes, just execute them as Python code." It’s the equivalent of a bouncer letting someone into a club just because they're speaking a language the bouncer doesn't understand.
The Fix (>= 4.0.0):
class BaseCache(ABC, Generic[ValueT]):
"""Base class for a cache."""
# Door slammed shut.
serde: SerializerProtocol = JsonPlusSerializer(pickle_fallback=False)The fix was simply to invert the boolean. Now, if the serializer encounters data it can't handle with JSON/msgpack, it raises an error instead of executing it.
This is a post-compromise or infrastructure-escalation exploit. You can't hit this directly from the public internet unless the cache is also exposed (which, honestly, happens more often than it should with Redis).
The Attack Chain:
__reduce__ method.import pickle
import os
class RCE:
def __reduce__(self):
# The classic reverse shell or command execution
return (os.system, ("id > /tmp/pwned",))
payload = pickle.dumps(RCE())payload bytes in Redis.pickle.loads() detonates the payload. The application server executes the command with the privileges of the LangGraph process.Why should you care if someone needs Redis access to exploit this? Because in modern microservices architectures, we often treat the cache as "internal" and therefore "safe." We might have strict firewall rules for the app server, but leave Redis wide open within the VPC.
This vulnerability turns a data-layer compromise into an application-layer compromise. If an attacker can write to your cache, they no longer just see your data—they own your execution flow.
Impacts include:
The remediation is straightforward, but it requires action. The LangChain AI team released patched versions that disable the pickle fallback by default.
Immediate Steps:
langgraph-checkpoint to version 4.0.0 or higher.langgraph to version 1.0.6 or higher.langgraph-checkpoint-postgres or sqlite, update those to their latest versions (3.0.3 and 3.0.2 respectively).If you absolutely must use pickle (and please, ask yourself why), you have to explicitly opt-in now by passing your own serializer configuration. But for 99% of users, the default safe behavior is what you want.
Lesson Learned: Never implement a "fallback" that lowers security standards. Fail secure, not convenient.
CVSS:3.1/AV:N/AC:H/PR:H/UI:N/S:U/C:H/I:H/A:H| Product | Affected Versions | Fixed Version |
|---|---|---|
langgraph-checkpoint langchain-ai | < 4.0.0 | 4.0.0 |
langgraph langchain-ai | < 1.0.6 | 1.0.6 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-502 (Deserialization of Untrusted Data) |
| CVSS v3.1 | 6.6 (Medium) |
| Vector | AV:N/AC:H/PR:H/UI:N/S:U/C:H/I:H/A:H |
| Attack Vector | Network (via Cache Backend) |
| Privileges Required | High (Write access to cache) |
| KEV Status | Not Listed |
The application deserializes untrusted data without sufficient verification, which can result in the execution of arbitrary code.