Mar 5, 2026·5 min read·6 visits
The xgrammar library (< 0.1.32) is vulnerable to a remote Denial of Service via stack exhaustion. By submitting a grammar with deeply nested parentheses, an attacker can trigger infinite recursion in the C++ parsing logic, crashing the application.
xgrammar, a library used for structured generation in Large Language Model (LLM) pipelines, contains a critical denial of service vulnerability in its EBNF parser. The issue stems from uncontrolled recursion during the parsing of nested grammar structures. An attacker can supply a crafted grammar string with excessive nesting (e.g., thousands of parentheses), causing the recursive descent parser to consume all available stack memory. This results in a segmentation fault (SIGSEGV) that crashes the host process.
The xgrammar library is a core component for enforcing structured output formats (such as JSON schemas or specific EBNF grammars) in LLM inference pipelines. It is designed to be portable and efficient, often running within high-performance inference servers. The vulnerability, tracked as CVE-2026-25048, resides in the C++ core of the library, specifically within the EBNFParser class. This component is responsible for tokenizing and parsing grammar definitions provided by users.
The flaw is a classic stack exhaustion issue caused by uncontrolled recursion (CWE-674). When the parser encounters nested structures, such as parenthetical groups in an EBNF grammar, it processes them using a recursive descent algorithm. Prior to version 0.1.32, the implementation lacked a mechanism to limit the depth of this recursion. Consequently, input containing deeply nested elements forces the allocation of stack frames beyond the operating system's limits, leading to an immediate process crash.
This vulnerability is particularly significant in multi-tenant LLM environments. If a service allows end-users to supply custom grammar constraints (a common feature for guiding model output), a single malicious request can terminate the inference service, disrupting availability for all users.
The root cause lies in the recursive invocation chain within cpp/grammar_parser.cc. The EBNFParser class parses grammar rules by breaking them down into elements, terms, sequences, and choices. The parsing logic for handling parentheses created a cycle that did not enforce a termination condition based on depth.
The specific recursion cycle involves the following function calls:
ParseElement: Encounters a TokenType::LParen (left parenthesis).ParseChoices: Called to handle the content within the parentheses.ParseSequence: Called by ParseChoices to handle a sequence of terms.ParseTerm: Called by ParseSequence.ParseElement: Called again by ParseTerm to handle the next nested element.Each iteration of this cycle pushes a new frame onto the call stack. In standard Linux environments, the default stack size is often limited (e.g., 8MB). A grammar string containing 30,000 nested parentheses requires stack space significantly exceeding this limit. Since the parser did not track nest_layer_guard_ or check against a max_nest_layer_ threshold, the execution continues until the stack pointer exceeds the mapped memory region, triggering a segmentation fault.
The remediation for this vulnerability involved introducing a depth-tracking mechanism to the EBNFParser. The maintainers added a nest_layer_guard_ integer to track current depth and a max_nest_layer_ constant (defaulting to 1000) to enforce a safety limit.
The following code comparison illustrates the vulnerability and the fix in cpp/grammar_parser.cc:
Vulnerable Code (Prior to 0.1.32):
Note the absence of depth checks before the recursive call to ParseChoices.
// Inside ParseElement()
if (Peek().type == TokenType::LParen) {
Consume(TokenType::LParen);
// UNCHECKED RECURSION
auto choices = ParseChoices();
Consume(TokenType::RParen);
// ... processing ...
}Patched Code (Version 0.1.32): The fix introduces a guard that increments on entry and decrements on exit. If the limit is reached, a specific error is raised instead of crashing.
// Inside ParseElement()
if (Peek().type == TokenType::LParen) {
// FIX: Increment depth counter
nest_layer_guard_++;
// FIX: Check against maximum limit (1000)
if (nest_layer_guard_ > max_nest_layer_) {
ReportParseError("Nest layer exceeded the maximum limit", -1);
}
Consume(TokenType::LParen);
auto choices = ParseChoices();
Consume(TokenType::RParen);
// FIX: Decrement depth counter
nest_layer_guard_--;
// ... processing ...
}This change ensures that any attempt to nest beyond 1,000 layers results in a controlled ParseError rather than uncontrolled memory corruption.
Exploiting this vulnerability is trivial and requires no authentication if the target application exposes the grammar compilation feature to unprivileged users. The attack vector is strictly data-driven: the attacker constructs a string where the nesting depth exceeds the stack capacity.
The Proof of Concept (PoC) utilizes the Python bindings of xgrammar to pass a malicious string to the underlying C++ parser. The payload consists of a sequence of opening parentheses ( followed by a closing term.
import xgrammar as xgr
# 1. Payload Construction
# 30,000 layers is sufficient to crash a standard 8MB stack.
payload = '(' * 30000 + 'a'
# 2. Grammar Definition
grammar = f"root ::= {payload}"
# 3. Trigger
# The compilation step invokes the vulnerable C++ ParseElement function.
# Result: Segmentation fault (core dumped)
compiler.compile_grammar(grammar)In a real-world scenario, an attacker would submit this grammar via an API endpoint designed to accept JSON schemas or EBNF definitions for guiding LLM output. Upon processing the request, the worker process hosting the model would crash immediately. If the service lacks robust supervisor processes or restart logic, the service remains down. Even with auto-restart, continuous submission of this payload causes a persistent denial of service loop.
The primary impact of CVE-2026-25048 is Availability. The vulnerability allows for a highly asymmetric attack: a payload of a few kilobytes can crash a server process that may be managing gigabytes of GPU memory and state.
Severity Factors:
While this vulnerability does not inherently allow for Remote Code Execution (RCE), stack overflows in C++ can theoretically be chained with other memory corruption primitives. However, given the nature of the crash (exhausting the guard page), it is purely a Denial of Service vector in this context. For organizations running inference-as-a-service, this represents a significant stability risk.
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:H/SC:N/SI:N/SA:N| Product | Affected Versions | Fixed Version |
|---|---|---|
xgrammar mlc-ai | < 0.1.32 | 0.1.32 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-674 |
| Vulnerability Type | Stack Exhaustion |
| CVSS v4.0 | 8.7 (High) |
| Attack Vector | Network |
| Attack Complexity | Low |
| Impact | Denial of Service |
The software executes a recursive function that does not have a base case or has a base case that is not reached, leading to infinite recursion and stack exhaustion.