Executive Summary (TL;DR)

The `zae-limiter` library (< v0.10.1) creates a single DynamoDB partition key per entity. Because a single DynamoDB partition is physically limited to ~1,000 Write Capacity Units (WCU) per second, an attacker can crash the rate-limiting service for a specific user (and potentially neighbors on the same shard) simply by flooding it with high-frequency requests.

To understand why this breaks, you have to look at how DynamoDB hashes data. When you write an item, DynamoDB hashes the Partition Key to determine which physical server holds that data. In the vulnerable version of zae-limiter, the schema looked something like this:

PK = {namespace}/ENTITY#{entity_id}

This is clean. It's logical. It allows you to query everything about user_123 easily. But it is also a death sentence for high-throughput applications. Every time user_123 makes a request, zae-limiter performs a write operation (UpdateItem or TransactWriteItems) to decrement the token count. This consumes WCU.

If user_123—or an attacker impersonating them—pushes traffic beyond 1,000 requests per second, they saturate the physical partition dedicated to that key. DynamoDB doesn't just queue these requests; it rejects them with a ProvisionedThroughputExceededException. The application catches this exception and, usually, fails closed (denying the request) or fails open (ignoring the rate limit). In zae-limiter's case, this results in a targeted Denial of Service. The irony is palpable: the traffic meant to be rate-limited actually destroys the rate-limiter itself.

# The fixed approach distributes load def get_pk(namespace, entity_id, resource, shard_id): # Now we have N shards to write to return f"{namespace}/BUCKET#{entity_id}#{resource}#{shard_id}" # Proactive Sharding Logic if current_wcu_usage > 0.80 * limit: double_shard_count() # 1 -> 2 -> 4 -> 8

import requests from concurrent.futures import ThreadPoolExecutor def hammer(): while True: try: # Send requests faster than 1/ms requests.get("https://target-api.com/resource") except: pass # Ignite the threads with ThreadPoolExecutor(max_workers=50) as executor: for _ in range(50): executor.submit(hammer)

Product

Affected Versions

Fixed Version

zae-limiter

zeroae

< 0.10.1

0.10.1

Attribute

Detail

CWE ID

CWE-770

Attack Vector

Network

CVSS

4.3 (Medium)

Impact

Denial of Service

Exploit Status

PoC Available

Architecture

Serverless / DynamoDB

CVE-2026-27695

4.3

The 1,000 WCU Ceiling: Crashing zae-limiter with DynamoDB Hot Partitions

Amit Schendel

Senior Security Researcher

Feb 25, 2026·6 min read·5 visits

PoC Available

Executive Summary (TL;DR)

A deep dive into an architectural race condition in the `zae-limiter` library where the promise of 'infinite scale' collides with the hard reality of DynamoDB physical partition limits. By funneling all rate-limiting state for a single entity into one partition key, the library inadvertently created a 'hot partition' bottleneck. This allows attackers to trigger a denial of service (DoS) simply by exceeding 1,000 write units per second, turning the rate limiter—the very tool designed to prevent floods—into the point of failure.

Attack Flow Diagram

The Hook: When Infinite Scale Isn't Infinite

We love Amazon DynamoDB. It's the serverless darling of the cloud world, promising near-infinite scalability. You just throw JSON at it, and Jeff Bezos catches it, right? Well, not exactly. Beneath the marketing fluff of 'seamless scaling' lies a physical reality: data has to live on actual hard drives. DynamoDB organizes this data into partitions, and these partitions have hard, physical limits. specifically, a single partition can only handle about 1,000 Write Capacity Units (WCU) per second.

Enter zae-limiter, a Python library designed to handle rate limiting using the Token Bucket algorithm. It stores the state of your buckets (how many tokens you have left) in DynamoDB. The developers, likely assuming DynamoDB handles all the heavy lifting, made a classic architectural blunder: they designed their schema based on logical purity rather than physical constraints.

In versions prior to 0.10.1, the library maps all rate-limiting operations for a specific entity—say, a user ID or an API key—to a single Partition Key (PK). This creates what we in the database world call a 'Hot Partition.' It’s like building a stadium with 50,000 seats but only opening one turnstile. It doesn't matter how big the stadium is; if everyone tries to enter at once, people are going to get crushed against the gate.

The Flaw: The physics of the Partition Key

PK = {namespace}/ENTITY#{entity_id}

The Code: Autopsy of a Bottleneck

Let's look at the architectural diff. The vulnerability isn't a buffer overflow or a missing semicolon; it's a schema design flaw. The fix involved moving from a static, entity-based key to a dynamic, sharded key system.

The Vulnerable Design (Logical View):

All buckets for an entity lived under one roof. If you hit the limit, you hit the wall.

# pseudo-code representation of the PK generation
def get_pk(namespace, entity_id):
    # ALL traffic for this entity hits this single hash
    return f"{namespace}/ENTITY#{entity_id}"

The Fix (v0.10.1 - Pre-Shard Buckets):

The developers introduced 'Adaptive Sharding'. Instead of one bucket, the state is split across multiple 'shards'. Clients pick a random shard to write to. If a shard is full, they try another. This spreads the WCU load across multiple physical partitions.

# The fixed approach distributes load
def get_pk(namespace, entity_id, resource, shard_id):
    # Now we have N shards to write to
    return f"{namespace}/BUCKET#{entity_id}#{resource}#{shard_id}"
 
# Proactive Sharding Logic
if current_wcu_usage > 0.80 * limit:
    double_shard_count() # 1 -> 2 -> 4 -> 8

This change is massive. It introduces an internal wcu tracker bucket that decrements 1,000 millitokens per write. If this internal tracker nears exhaustion, a background Lambda aggregator (triggered by DynamoDB Streams) proactively doubles the shard count, essentially creating new lanes on the highway before traffic jams occur.

The Exploit: Dropping the Hammer

Exploiting this is trivially easy if you have a valid API key or user session. You don't need fancy shellcode; you just need curl or a python script.

The Attack Chain:

Reconnaissance: Identify an endpoint protected by zae-limiter. You can often tell by headers like X-RateLimit-Limit.
Target Selection: The goal is to maximize WCU consumption. Complex rate limits (e.g., limits that update a user bucket and a global tenant bucket simultaneously) are juicier because they consume transactional write units, which are more expensive.
The Flood: Launch a multi-threaded attack sending 1,500+ requests per second.

import requests
from concurrent.futures import ThreadPoolExecutor
 
def hammer():
    while True:
        try:
            # Send requests faster than 1/ms
            requests.get("https://target-api.com/resource")
        except:
            pass
 
# Ignite the threads
with ThreadPoolExecutor(max_workers=50) as executor:
    for _ in range(50):
        executor.submit(hammer)

The Result: The backend DynamoDB partition glows red hot. The application starts throwing 503 Service Unavailable or internal server errors as the ProvisionedThroughputExceededException bubbles up. Valid traffic for that user is dropped. Even worse, due to how DynamoDB colocates partitions, this creates a "Noisy Neighbor" effect. Unrelated data that happens to live on the same physical storage node might also see increased latency or throttling.

The Impact: Why This Matters

You might think, "So what? The spammer gets blocked. Working as intended." Not quite. A Denial of Service vulnerability in a security control is a critical failure. If the rate limiter is the first line of defense, knocking it over can leave the gates wide open (fail-open) or permanently shut (fail-closed).

If the system fails open (catches the DB exception and lets the request through), the attacker has successfully bypassed the rate limit entirely, opening the door for scraping, brute-forcing, or further resource exhaustion of the application servers.

If the system fails closed (the default behavior for zae-limiter), the attacker has successfully DoS'd the entity. In a multi-tenant SaaS environment, if an attacker targets a shared tenant ID (like a company-wide API key), they can take down operations for an entire organization. Furthermore, the sheer volume of UpdateItem retries burns through the victim's AWS budget, as DynamoDB charges for write units even if they are throttled in some configurations.

The Mitigation: Sharding is Caring

The fix is strictly architectural. You cannot configuration-tune your way out of a hot partition if your schema is fundamentally designed to hot-spot.

Immediate Action: Upgrade to zae-limiter v0.10.1 or higher. This version introduces the "Pre-Shard Buckets" architecture.

How it works:

Migration: The update is a "clean break." Old buckets are orphaned and will expire naturally via TTL. New requests create buckets using the new sharded schema.
Infrastructure: Ensure you have the Lambda aggregator deployed. This component listens to DynamoDB Streams and handles the "proactive doubling" of shards. Without it, the system relies on reactive retries, which are slower.
Monitoring: Set up CloudWatch alarms for ProvisionedThroughputExceededException. If you see this metric spiking even after the patch, your shard_count might not be scaling fast enough, or you are hitting global table limits.