The Polyglot Parser: Smuggling Requests with Devanagari Digits
Jan 6, 2026·5 min read
Executive Summary (TL;DR)
Python's `re` module is Unicode-aware by default, meaning `\d` matches more than just `0-9`. Aiohttp failed to restrict its `Range` header parser to ASCII, allowing attackers to use characters like Devanagari digits to sneak payloads past strict upstream proxies while the backend treats them as valid numbers. Fixed in version 3.13.3.
A regex flaw in aiohttp allows Unicode digits to bypass HTTP validation, enabling potential request smuggling and cache poisoning via parser differentials.
The Hook: When "Digit" Means Everything
We love Python. It's the language that holds your hand, handles your memory, and whispers sweet nothings about Unicode support into your ear. But in the ruthless world of HTTP protocol parsing, Python's helpfulness is a loaded gun pointed at your foot.
HTTP is a protocol born in an era where ASCII was king and bytes were bytes. However, modern frameworks like aiohttp live in Python 3's "everything is a string" utopia. This CVE is a classic collision of these two worlds.
Specifically, it targets the Range header. This header tells the server, "I only want bytes 500-1000 of that cat video." Simple, right? But what happens when the definition of a "number" differs between the grumpy reverse proxy sitting at the edge of your network and the cheerful Python backend application? You get a Parser Differential—the bread and butter of HTTP desynchronization attacks.
The Flaw: The Curse of the Magic Regex
The vulnerability lies in how aiohttp defined "digits" when parsing the Range header. The developers used the standard Python regex shorthand \d to capture the start and end bytes. In the regex world of Perl or C, \d usually means [0-9]. In Python 3, however, \d means "Any character in the Unicode category 'Nd' (Number, Decimal Digit)".
This includes the standard 0-9, but it also includes the Arabic-Indic digits (٠-٩), Devanagari digits (०-९), and Fullwidth digits (0-9).
The flaw is deceptively simple: The developer wrote r"^bytes=(\d*)-(\d*)$" assuming it would only catch ASCII integers. They forgot that Python is a polyglot. This means a header like Range: bytes=4-५ (where ५ is the Devanagari digit 5) matches the regex perfectly. Crucially, Python's int() function is also helpful enough to convert int("५") directly to the integer 5.
The Code: The Smoking Gun
Let's look at the diff from aiohttp/web_request.py. It’s a one-line change, but it highlights a systemic misunderstanding of Python's regex engine in security-critical contexts.
Here is the vulnerable code logic compared to the patch:
# VULNERABLE (Pre-3.13.3)
# The pattern uses default Python 3 behavior (Unicode)
pattern = r"^bytes=(\d*)-(\d*)$"
start, end = re.findall(pattern, rng)[0]
# PATCHED (Fixed in commit c7b7a04)
# The pattern is forced to use ASCII rules only
pattern = r"^bytes=(\d*)-(\d*)$"
start, end = re.findall(pattern, rng, re.ASCII)[0]By adding the re.ASCII flag, the meaning of \d reverts to the safe, boring [0-9] that RFC 9110 expects. Without this flag, the parser was accepting hundreds of different characters as valid numbers.
The Exploit: Speaking in Tongues
This vulnerability enables a Parser Differential. To exploit this, we need an architecture with two layers: a strict upstream proxy (like Nginx, AWS WAF, or HAProxy) and the loose aiohttp backend.
Here is the attack flow:
- The Setup: The attacker sends a request with
Range: bytes=0-५. - The Bypass: The upstream WAF parses the
Rangeheader. It sees a non-ASCII character (५). It likely concludes, "This is not a valid Range header per RFC," and ignores it, or treats it as an opaque string. It passes the request through without applying range-based caching rules or restrictions. - The Interpretation: The request lands on
aiohttp. The regex matches.int("५")becomes5. The backend serves the partial content (bytes 0-5).
[!WARNING] Cache Poisoning Risk: If the proxy ignores the Range header, it might expect a full
200 OKresponse. Whenaiohttpreturns a206 Partial Content(but logically valid for the backend), the proxy might get confused about the content length or cache a partial file as if it were the complete resource.
The Impact: Why Should We Panic?
Let's be realistic: this isn't RCE. You aren't going to pop a shell by sending Devanagari digits (unless you are attacking a very strange system). That's why the CVSS score is a lowly 2.7.
However, in the hands of a creative attacker, this is a useful primitive. It allows for:
- WAF Evasion: If a WAF has rules to block specific byte ranges (e.g., to prevent scraping large files), this bypasses them.
- Cache Poisoning: As described in the exploit section, desynchronizing the state between the cache and the backend is dangerous.
- Logic Errors: Any application logic relying on
Rangeheaders to restrict access to specific parts of a file could be circumvented if the validation logic is performed upstream by a component that doesn't speak Unicode.
This is a classic "Death by a Thousand Cuts" vulnerability—not fatal on its own, but devastating when chained.
The Fix: Remediation
The fix is straightforward: Update aiohttp to version 3.13.3 or later.
For Python developers, this is a teachable moment. If you are parsing network protocols, file formats, or anything that relies on strict standards (like RFCs), strictly avoid default regex behavior.
Best Practices:
- Always use
re.ASCII(or the inline flag(?a)) when using\d,\w, or\son protocol data. - Alternatively, be explicit: write
[0-9]instead of\d. - Audit your codebase for
int()calls on untrusted input—Python'sint()is surprisingly permissive with whitespace and Unicode digits.
Official Patches
Fix Analysis (1)
Technical Appendix
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:L/VA:N/SC:N/SI:N/SA:N/E:UAffected Systems
Affected Versions Detail
| Product | Affected Versions | Fixed Version |
|---|---|---|
aiohttp aio-libs | <= 3.13.2 | 3.13.3 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-444 |
| Attack Vector | Network |
| CVSS v4.0 | 2.7 (Low) |
| Impact | Low (Integrity) |
| Exploit Status | PoC Available |
| Pattern | Regex Mismatch / Parser Differential |
MITRE ATT&CK Mapping
The web application or API relies on an HTTP intermediary to perform security checks, but the intermediary interprets the request differently than the backend, allowing a bypass.
Known Exploits & Detection
Vulnerability Timeline
Subscribe to updates
Get the latest CVE analysis reports delivered to your inbox.