Feb 20, 2026·5 min read·5 visits
User-supplied XML entity names are passed directly into `new RegExp()`. Attackers can define an entity named `l.` which creates a regex that matches `<`, allowing them to overwrite the less-than symbol with malicious HTML tags (XSS).
A critical regex injection vulnerability exists in the `fast-xml-parser` library (versions 4.1.3 to <5.3.5). The parser constructs regular expressions dynamically from untrusted DOCTYPE entity names without proper escaping. This allows attackers to define malicious entities that 'shadow' built-in XML entities like `<` or `&`. By replacing these safe entities with arbitrary content, attackers can bypass entity encoding and achieve Cross-Site Scripting (XSS) in downstream applications relying on the parser's output.
XML parsing is a thankless job. It’s verbose, complicated, and prone to 'Billion Laughs' attacks. So, when a library like fast-xml-parser comes along promising high performance and low overhead, developers flock to it like moths to a flame. It is a staple in the Node.js ecosystem, used to translate the ancient language of XML into the modern comfort of JSON.
But speed often comes at the cost of correctness—or in this case, sanity. In an effort to handle custom XML entities defined in DOCTYPE blocks (those <!ENTITY ...> declarations), the library developers made a classic, fatal error: they trusted the input. Specifically, they trusted that an entity name would just be a name.
This vulnerability isn't your standard buffer overflow or logic error. It is a Regex Injection. The parser takes strings from the XML document and compiles them directly into executable Regular Expressions. If you know anything about security, you know that new RegExp(userInput) is the software equivalent of handing a loaded gun to a toddler.
Here is the fundamental disconnect: The XML specification allows periods (.) in entity names. Regular Expressions use the period (.) as a wildcard that matches any single character (except newlines).
When fast-xml-parser encounters a DOCTYPE definition like <!ENTITY my.entity "value">, it needs a way to find and replace usages of that entity later in the document. To do this, it dynamically generates a Global Regular Expression. The logic effectively boils down to:
const regex = new RegExp('&' + entityName + ';', 'g');
See the problem? If an attacker defines an entity named l., the generated regex becomes /&l.;/g.
In the world of regex, /&l.;/ doesn't just match the literal string &l.;. It matches &la;, &lb;, &l!;, and—crucially—<. Since < is the standard XML entity for the less-than character (<), this creates a collision. The parser doesn't treat < as a protected keyword; it just sees text that matches the attacker's wildcard regex.
Let's look at the crime scene in src/xmlparser/DocTypeReader.js. This is where the parser reads the DTD and builds its substitution map.
Vulnerable Code (< 5.3.5):
// Inside the entity parsing loop
entities[entityName] = {
// The fatal flaw: passing entityName directly to RegExp constructor
regx : RegExp(`&${entityName};`, "g"),
val: val
};It is shockingly simple. There is no sanitization, no escaping, and no RegExp.escape() (which doesn't natively exist in JS anyway, but that's no excuse).
When the parser later iterates through the document to replace entities, it loops through this entities object. If the attacker's malicious regex runs before the built-in handlers (or if it simply shadows them by nature of the replacement logic), the built-in safety mechanisms are bypassed.
The logic essentially says: "Find anything looking like &l<any_char>; and replace it with the attacker's string." Since < fits that description, it gets clobbered.
To exploit this, we don't need memory corruption. We just need to define an entity that, when turned into a regex, matches a target we want to overwrite. The most valuable target in an XML context is < because it represents the opening of a tag.
The Attack Chain:
l.. The value of this entity will be our malicious HTML/JS.<./&l.;/g. It scans the text <. The regex matches. The parser swaps < for our payload.Proof of Concept:
<?xml version="1.0"?>
<!DOCTYPE pwn [
<!-- The entity name 'l.' becomes regex /&l.;/ which matches '<' -->
<!ENTITY l. "<img src=x onerror=alert('Pwned')>">
]>
<root>
<!-- The parser sees '<', matches it against /&l.;/, and injects the tag -->
<data>Hello <b>World</b></data>
</root>Result:
Instead of rendering safe text like Hello <b>World</b>, the application renders:
Hello <img src=x onerror=alert('Pwned')>b>World<...
The browser sees the <img> tag and executes the JavaScript.
The maintainers released version 5.3.5 to address this. Let's look at their solution. Instead of refactoring to avoid Regex for entity replacement (the ideal solution), they opted for a blacklist/escape approach.
The Patch in 5.3.5:
// The attempt to sanitize the entity name
const escaped = entityName.replace(/[.\-+*:]/g, '\\.');
const regx = new RegExp(`&${escaped};`, "g");> [!WARNING] > Researcher Note: This patch is brittle.
. with \.. That's good. But it also replaces -, +, *, and : with \.. This effectively changes the entity name in the regex. If I have an entity my-ent, the regex becomes /my\.ent/. This breaks exact matching for those characters.|, ?, ^, $, (, ), [, ], {, }.Re-exploitation Potential:
An attacker might try to use the pipe | (OR operator). If we define an entity named lt|amp, the regex becomes /<|amp;/. This is invalid regex syntax (it would look for < OR amp;), but a clever attacker might find combinations of unescaped characters (like optional ? or classes []) to construct valid regexes that still shadow built-in entities. The "fix" is more of a band-aid than a cure.
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:L/I:H/A:N| Product | Affected Versions | Fixed Version |
|---|---|---|
fast-xml-parser NaturalIntelligence | >= 4.1.3, < 5.3.5 | 5.3.5 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-185 (Incorrect Regular Expression) |
| CVSS Score | 9.3 (Critical) |
| Attack Vector | Network (AV:N) |
| Exploit Status | PoC Available |
| Impact | XSS / Integrity Compromise |
| Patch Quality | Partial / Brittle |