Heavy Weights: Crushing PyTorch's 'Secure' Loader via Heap Corruption
Jan 27, 2026·6 min read·5 visits
Executive Summary (TL;DR)
The `weights_only=True` flag in PyTorch was supposed to be the silver bullet against Pickle RCE. However, a logic flaw in the underlying C++ unpickler allows attackers to use the `SETITEM` opcode on non-container objects. This causes Type Confusion on the heap, allowing a malicious model file to corrupt memory and execute arbitrary code, even when the user explicitly requests 'safe' loading.
A critical heap corruption vulnerability in PyTorch's restricted unpickler allows attackers to bypass the `weights_only=True` security flag, turning safe model loading into arbitrary code execution.
The False Sense of Security
For years, the Python security community has screamed one mantra until our throats were sore: Do not unpickle untrusted data. Pickle is not a serialization format; it is a stack-based virtual machine that executes instructions. Giving someone a pickle file is like handing them a loaded gun and hoping they don't pull the trigger.
PyTorch, realizing that the Machine Learning ecosystem is built entirely on people downloading random .pt files from Hugging Face, introduced weights_only=True. This flag was supposed to be the bouncer at the club. It uses a restricted unpickler that only allows a whitelist of classes (like torch.Tensor and basic primitives) and blocks the dangerous GLOBAL opcodes used to import os.system.
It was a beautiful dream. We thought we were safe. But CVE-2026-24747 is the wake-up call that reminds us: if you build a security boundary in C++, you better make sure you validate your pointers. This vulnerability proves that even a 'neutered' pickle machine can still be weaponized if the interpreter gets confused about what it's holding.
The Flaw: Identity Crisis on the Heap
To understand this bug, you have to look at torch/csrc/jit/serialization/unpickler.cpp. The unpickler is a state machine that reads opcodes from a stream and manipulates a stack of IValue objects. When the unpickler encounters a SETITEM opcode, it assumes the developer is behaving normally—assigning a value to a dictionary.
Here is the logic flaw: The unpickler checked what was being put into the container, but it didn't rigorously check if the container was actually a container. It blindly trusted that if the pickle stream invoked SETITEM, the object currently sitting on top of the stack was a Dict or List.
> [!NOTE] > The Bug Class: This is a classic Type Confusion vulnerability (CWE-843). The software reads a piece of memory acting as Object A (e.g., a Tensor) but treats it as Object B (e.g., a Dictionary).
When the C++ code executes the set operation on a non-container object (like a SWALR scheduler instance or a raw Tensor), it calculates a memory offset where it thinks the dictionary buckets are. Since the object isn't a dictionary, that write operation lands somewhere else entirely—potentially overwriting a vtable, a function pointer, or distinct object metadata. It’s the memory equivalent of trying to file a document in a cabinet, but the cabinet is actually a wood chipper.
The Smoking Gun: SWALR's Bad Habits
The vulnerability was exposed by an innocent bystander: the SWALR (Stochastic Weight Averaging Learning Rate) scheduler. The developers of this component made a classic mistake—they tried to serialize a python function (self.anneal_func) inside the checkpoint.
When weights_only=True attempted to load this, it usually just threw a fit. But specifically, the internal state representation of SWALR combined with the restricted unpickler created a scenario where valid objects were on the stack, but the SETITEM logic was invoked in a context the C++ engine wasn't prepared for.
Here is the patch that fixed the trigger in torch/optim/swa_utils.py (Commit 954dc5183ee9205cbe79876ad05dd2d9ae752139):
# BEFORE: Blindly pickling everything
# def state_dict(self):
# return {key: value for key, value in self.__dict__.items()}
# AFTER: Explicitly sanitizing the dangerous 'anneal_func'
def state_dict(self):
state = self.__dict__.copy()
# Remove the callable function from the state
state.pop('anneal_func', None)
# Store the strategy as a string instead
state['_anneal_strategy'] = self._anneal_strategy
return stateWhile this Python patch stops SWALR from accidentally triggering the bug, the real fix had to happen in the C++ unpickler to validate opcodes properly. If they only fixed the Python code, attackers could still manually craft a pickle stream to replicate the SWALR state configuration and trigger the crash.
Exploitation: Crafting the Poisoned Pickle
To exploit this, we don't need a valid neural network. We need a handcrafted pickle stream. The goal is to confuse the unpickler into writing 8 bytes of our choosing to an arbitrary memory address relative to a heap object.
The Attack Chain:
- Preparation: We start a pickle stream. We can legitimately load a
torch.Tensorbecauseweights_only=Trueallows it. - The Bait: We push a
Tensoronto the stack. This is our 'victim' object. - The Switch: We push a malicious integer (the payload) onto the stack.
- The Trigger: We emit the
SETITEMopcode.
The unpickler looks at the stack. It sees [Tensor, Payload]. It executes SETITEM. The C++ code interprets the Tensor as a Dict. It attempts to hash the key (which might be missing or defaulted) and write the value.
If the attacker aligns the heap correctly (Heap Feng Shui), this out-of-bounds write can overwrite the Tensor's internal data pointer or its C++ vtable. Once we control the instruction pointer, we bypass the 'No-Code-Execution' promise of the restricted loader entirely.
The Impact: Trust No One
This vulnerability is particularly nasty because it targets the specific mechanism designed to enable trust. Security teams often allow .pt files through firewalls or into air-gapped training environments under the condition that weights_only=True is enforced.
With CVE-2026-24747, that condition is moot. An attacker can upload a model to a public repository (like Hugging Face or CivitAI) that looks like a valid SafeTensors or PyTorch checkpoint. When a data scientist downloads it to fine-tune a model locally, the exploit fires immediately upon loading.
The impact ranges from crashing the training cluster (Denial of Service) to full Remote Code Execution (RCE) on the GPU cluster head node. Considering these nodes often have access to massive datasets and proprietary algorithms, the confidentiality loss is catastrophic.
Mitigation: Hardening the Stack
The only viable mitigation is to upgrade PyTorch to version 2.10.0 or later. The fix involves adding strict type checking in unpickler.cpp before processing modification opcodes. If SETITEM is called, the engine must verify the target is actually a mutable container.
Immediate Actions:
- Patch:
pip install torch>=2.10.0 - Audit: Scan your model storage. If you have
weights_only=Trueenabled, do not assume existing files are safe. - Safer Formats: Whenever possible, prefer SafeTensors over Pickle-based PyTorch files. SafeTensors is a pure data format with no executable stack machine, rendering this entire class of bugs impossible.
Remember: In the world of serialization, parsing is just coding with someone else's bugs.
Official Patches
Fix Analysis (1)
Technical Appendix
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:HAffected Systems
Affected Versions Detail
| Product | Affected Versions | Fixed Version |
|---|---|---|
PyTorch Meta | < 2.10.0 | 2.10.0 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-843 (Type Confusion) |
| Attack Vector | Network / File (Context Dependent) |
| CVSS | 8.8 (Critical) |
| Impact | Remote Code Execution (RCE) |
| Trigger | Opcode 'SETITEM' on non-container |
| Component | torch/csrc/jit/serialization/unpickler.cpp |
MITRE ATT&CK Mapping
The program accesses a resource using an incompatible type, which triggers a logical error because the resource does not have the expected properties.
Known Exploits & Detection
Vulnerability Timeline
Subscribe to updates
Get the latest CVE analysis reports delivered to your inbox.