picklescan, a tool designed to detect malicious Python pickles, failed to block the C-based `_operator` module. By using `_operator.attrgetter` instead of the blocked `operator.attrgetter`, attackers can bypass the scanner and achieve full RCE on systems attempting to verify untrusted serialized data.
A critical bypass in the picklescan security scanner allowing remote code execution via the Python C-implementation module '_operator'.
In the world of Python security, the phrase "secure pickle" is usually the punchline to a bad joke. The pickle module is essentially a stack-based virtual machine that, by design, allows arbitrary code execution during deserialization. It is not a data format; it is a program.
Enter picklescan. This tool took on the Herculean task of trying to make pickles safe by statically analyzing the bytecode stream before loading it. It acts as a bouncer at the club, checking the ID of every function trying to get into memory. If it sees os.system or subprocess.Popen, it throws them out.
But here is the problem with blocklists (or "denylists"): you have to know every possible way to be malicious. If you miss one alias, one obscure built-in, or one C-extension, the bouncer steps aside, and the attacker walks right in. That is exactly what happened here. The developers blocked the front door, but they didn't realize the house had a back door labeled _operator.
To understand this bypass, you have to understand a quirk of Python's standard library. Many Python modules have a pure-Python implementation and a faster C-language implementation for performance. The operator module provides functional equivalents to standard operators (like +, -, or attribute access).
picklescan correctly identified that operator.attrgetter is dangerous. Why? Because attrgetter can be used to grab methods off objects. If you can import the os module, you can use attrgetter("system") to fetch the system function and run shell commands.
So, picklescan added operator to its list of unsafe_globals. Case closed, right? Wrong.
Python often exposes the C-implementation directly via an underscore prefix. While operator was blocked, _operator (the C-backed version) was not. They do the exact same thing. It is like banning "Robert" from your server but allowing "Bob" to have root access. The logic flaw wasn't in the scanning engine itself, but in the dictionary of known threats.
The vulnerability lived entirely in src/picklescan/scanner.py. The scanner maintains a dictionary called _unsafe_globals. This is the "No Fly List" for pickle opcodes.
Here is what the code looked like before the fix. Notice it explicitly blocks operator but is blissfully unaware of its C-based sibling:
# OLD CODE (Vulnerable)
_unsafe_globals = {
"operator": {
"attrgetter",
"itemgetter",
"methodcaller",
},
# ... other modules ...
}The fix, applied in commit f2dea43e0c838e09ace1e62994143254b51de927, was pitifully simple but absolutely necessary. They just had to tell the bouncer about "Bob":
# PATCHED CODE (Fixed)
_unsafe_globals = {
"operator": {
"attrgetter",
"itemgetter",
"methodcaller",
},
"_operator": { # <--- The Fix
"attrgetter",
"itemgetter",
"methodcaller",
},
# ...
}[!NOTE] This highlights the inherent fragility of denylisting. The moment a new Python version introduces a new alias for a dangerous function, this scanner becomes obsolete instantly.
Let's construct the murder weapon. A pickle stream is just a sequence of opcodes. To exploit this, we need to construct a chain (gadget) that leads to os.system without ever explicitly typing os.system in a way the scanner recognizes.
Here is the attack chain:
os: We use builtins.__import__ to load the os module. This puts the module object on the stack._operator: We load the unblocked _operator module.attrgetter: We resolve _operator.attrgetter. The scanner allows this because _operator isn't in the bad list.system: We call attrgetter("system") and apply it to the os module object we loaded in step 1. This returns the actual os.system function.id, whoami, or a reverse shell).The raw pickle bytecode looks something like this:
# The bypass opcode sequence
opcode = b'''cbuiltins
__import__
(Vos
tRp0
0c_operator <-- The Bypass
attrgetter
(Vsystem
tR(g0
tR(Vecho "You have been pwned"
tR.'''When picklescan looks at this, it parses the tokens. It sees builtins.__import__ (which it might allow for innocuous modules) and _operator.attrgetter. Since _operator triggers no alarms, the payload passes validation. When pickle.loads() is finally called by the victim application believing the data is safe, the shell command executes immediately.
This vulnerability is particularly nasty because of where picklescan is used. It is primarily deployed in Machine Learning (ML) pipelines. ML models (like those from PyTorch) are often distributed as pickle files.
Hubs like Hugging Face or corporate model repositories rely on scanners to ensure that user-uploaded models aren't actually Trojan horses. A developer using picklescan < 0.0.34 believes they have sanitized the input. They will happily deserialize a model file provided by a third party, thinking their shield is up.
Successful exploitation means immediate Remote Code Execution (RCE) on the server processing the model. In an ML context, this usually means access to heavy GPU instances, proprietary training data, and potential lateral movement into the corporate cloud environment. It turns a data science workflow into a foothold for ransomware.
The immediate remediation is to upgrade picklescan to version 0.0.34. This version includes the updated blocklist covering _operator.
However, the deeper lesson here is about architecture. If you are relying on a Python script to scan a pickle file to determine if it is safe to execute, you have already lost. The complexity of the Python runtime means there will almost always be another gadget, another alias, or another memory corruption trick to bypass the scanner.
The Real Fix: Stop using pickles for untrusted data. Use JSON, Safetensors, or ONNX. If you absolutely must use pickles, do not rely on static scanning. execute them inside a disposable, network-isolated sandbox (like a microVM or a restricted container) where an RCE doesn't matter.
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H| Product | Affected Versions | Fixed Version |
|---|---|---|
picklescan mmaitre314 | < 0.0.34 | 0.0.34 |
| Attribute | Detail |
|---|---|
| CWE | CWE-502 (Deserialization of Untrusted Data) |
| CVSS | 9.8 (Critical) |
| Attack Vector | Network / Local (via File) |
| Impact | Remote Code Execution (RCE) |
| Key Component | _operator (Python C-Module) |
| Exploit Status | PoC Available |
The application deserializes untrusted data without sufficiently verifying the resulting data will be valid.
Get the latest CVE analysis reports delivered to your inbox.