Apr 23, 2026·8 min read·2 visits
justhtml versions up to 1.16.0 contain multiple sanitization bypasses, mXSS vectors via foreign namespaces, and DOM cycle DoS vulnerabilities. Developers must upgrade to 1.17.0 to secure sanitization pipelines and programmatic DOM interfaces.
The `justhtml` library (versions <= 1.16.0) is vulnerable to multiple security flaws, including cross-site scripting (XSS), mutation XSS (mXSS), CSS injection, and denial-of-service (DoS). These vulnerabilities arise from improper handling of foreign namespaces, incomplete DOM serialization constraints, and a lack of cycle detection in programmatic DOM node manipulation.
The justhtml Python library provides HTML DOM manipulation and sanitization capabilities. Versions up to 1.16.0 contain a suite of logical flaws within its parsing, sanitization, and serialization implementations. These vulnerabilities expose applications relying on justhtml to client-side attacks and resource exhaustion. The flaws manifest primarily when applications utilize custom sanitization policies or construct DOM trees programmatically.
The initial attack vector involves sanitization bypasses related to foreign namespaces. justhtml allows developers to define custom policies to preserve SVG and MathML markup. The sanitization engine failed to properly neutralize active HTML integration points within these namespaces. Elements such as <foreignObject> or <annotation-xml> could encapsulate arbitrary HTML, including script tags, which the sanitizer would incorrectly treat as inert data.
Simultaneously, the library suffered from a structural vulnerability in its programmatic DOM handling interface. The serialization engine did not enforce appropriate escaping or boundary checks on text nodes inserted into specific element contexts. This allowed for contextual breakouts, where data intended to remain within a <script>, <style>, or comment block could terminate the block and inject executable markup into the surrounding document.
Finally, the library lacked structural validation during programmatic DOM manipulation. Developers or attackers influencing the DOM structure could create circular references among nodes. When the engine attempted to serialize or sanitize these cyclic structures, it entered an infinite loop, resulting in a denial-of-service condition via resource exhaustion.
The root cause of the foreign namespace sanitization bypass stems from an incomplete implementation of the HTML5 parsing specification regarding active integration points. When the justhtml sanitizer processed SVG or MathML subtrees under a permissive custom policy, it treated the descendants as governed by the respective foreign namespace rules. However, tags like <foreignObject> in SVG explicitly transition the parser back into the HTML namespace. The sanitizer failed to register this context switch, allowing malicious HTML payloads within the <foreignObject> tag to pass through unfiltered.
In the context of the programmatic serialization breakout, the vulnerability is classified as Improper Neutralization of Input During Web Page Generation (CWE-79). When a developer programmatically added text content to a <script> or <style> element, justhtml serialized the output literally. If the text content contained the sequence </script>, the browser parsing the resulting HTML would terminate the script block prematurely. Any subsequent text would be parsed as standard HTML, enabling the execution of arbitrary elements injected by the attacker.
Similarly, the comment serialization logic failed to encode or reject the --> sequence. Programmatically generated Comment() nodes containing this sequence allowed attackers to prematurely close the comment context. This structural assumption failure meant the serialization process trusted the internal state of the DOM tree without verifying that the text content adhered to the lexical boundaries of the serialized output format.
The denial-of-service vulnerability (CWE-674: Uncontrolled Recursion) is rooted in the absence of graph cycle detection. justhtml represents the DOM as a tree of objects in memory. The node appending functions did not verify whether the prospective child node was an ancestor of the parent node. Creating a structure where Node A is a child of Node B, and Node B is a child of Node A, formed a circular dependency. Recursive operations, such as to_html() and sanitize_dom(), traversed this cyclic graph indefinitely.
The remediation effort across version 1.17.0 involved several targeted commits to address the distinct vulnerability classes. Commit 7efd8240f72b9bca303c9a488e04e94a95940ff4 addressed the foreign namespace bypass by introducing explicit checks for HTML integration points. The updated sanitization logic now correctly identifies when the parsing context re-enters the HTML namespace and applies the baseline HTML sanitization rules to the descendants of those elements.
To resolve the serialization breakouts, commit a1268460bf3161855131463b74ec0e709ddb8ba9 implemented rawtext serialization checks. The patch modifies the to_html() rendering logic for <script> and <style> elements to scan the inner text for closing tags. If a context-breaking sequence is detected, the library now either raises an exception or applies appropriate encoding, preventing the premature termination of the element.
Commit 56d438433f7e6619a8b874a121177e5866552025 applied a similar fix for the Comment() nodes. The serialization routine now validates the comment content against the --> sequence and the <!-- sequence, ensuring that the resulting HTML comment remains well-formed and contained.
The cycle detection mechanism was introduced in commit 559ae89ff344df7b7b8805812f947421f54dbc9b. The patch alters the core Node class hierarchy. Before a node is appended as a child, the implementation transverses the parent chain up to the root. If the prospective child is encountered during this traversal, the library throws an exception, preventing the formation of the circular reference. This structural validation guarantees that the in-memory representation remains a directed acyclic graph (DAG), ensuring termination for recursive operations.
Exploiting the foreign namespace sanitization bypass requires an application configuration that permits SVG or MathML elements. An attacker submits a payload containing a <svg> element with a <foreignObject> child. Inside the <foreignObject>, the attacker embeds standard HTML XSS payloads, such as <iframe onload=alert(1)> or <script>alert(1)</script>. Because the custom policy instructs the sanitizer to preserve SVG content, the entire structure is passed through. When the victim's browser renders the output, the <foreignObject> triggers an HTML parsing context, and the embedded payload executes.
The programmatic DOM breakout requires an application pattern where user-controlled input is placed into a script, style, or comment node via the justhtml programmatic API. For example, if an application populates a script variable using script_node.text = user_input, the attacker provides the input "; </script><img src=x onerror=alert(1)>. The library serializes this directly into the document. The browser processes </script>, closes the script block, and interprets the subsequent <img> tag as standard HTML, leading to code execution.
Triggering the denial-of-service vulnerability depends on an application exposing control over the DOM hierarchy to untrusted users. If an application parses user input to build a DOM and allows restructuring or referencing of nodes, the attacker crafts a payload that commands the application to set a parent node as a child of one of its own descendants. Once the cycle is established in memory, any subsequent call by the application to serialize the DOM or run the sanitization routine will hang the process, consume available CPU, and eventually crash due to recursion depth limits or memory exhaustion.
The vulnerabilities present a high integrity risk to applications utilizing justhtml for input sanitization or output generation. The primary impact is Cross-Site Scripting (XSS). Successful exploitation allows an attacker to execute arbitrary JavaScript within the context of the victim's session. This execution grants the attacker the ability to steal session cookies, capture user input, perform actions on behalf of the user, and modify the presentation of the web application.
The mutation XSS (mXSS) vectors compound the risk by bypassing secondary defensive layers. Payloads that appear benign to intermediate security controls, such as Web Application Firewalls (WAFs) or naive string filters, mutate into active executable content only when processed by the browser's DOM parser. This discrepancy between the sanitization engine's understanding of the markup and the browser's actual rendering behavior makes detection and mitigation challenging.
The DOM cycle vulnerability introduces a severe availability impact. Applications processing untrusted data to build programmatic DOM structures are susceptible to instant denial-of-service attacks. A single crafted request can permanently lock a thread or worker process. In high-concurrency environments, repeated exploitation will rapidly deplete a server's processing capacity, leading to complete service unavailability for all users.
The definitive remediation for all identified vulnerabilities is to upgrade the justhtml package to version 1.17.0 or later. This release contains comprehensive fixes for the integration point sanitization, text serialization breakouts, and DOM cycle detection. Applications using poetry, pip, or other Python package managers should specify justhtml >= 1.17.0 in their dependency manifests.
In environments where immediate patching is not feasible, administrators and developers must implement configuration workarounds. Custom sanitization policies that explicitly preserve the SVG and MathML namespaces should be removed or disabled. Reverting to the default sanitization policy, which strips these foreign namespaces entirely, neutralizes the integration point bypass and the associated mXSS vectors. Furthermore, policies preserving <style> tags should be restricted to fully trusted input sources.
For applications utilizing the programmatic DOM generation APIs, strict input validation must be enforced before assigning text to nodes. Any data injected into <script>, <style>, or Comment() nodes must be scanned for context-breaking sequences such as </script>, </style>, and -->. Inputs containing these sequences must be rejected or appropriately encoded. Finally, applications constructing complex DOM trees should implement manual tree traversal checks to ensure directed acyclic graph integrity before calling the vulnerable to_html() or sanitize_dom() functions.
CVSS:4.0/AV:N/AC:L/AT:P/PR:N/UI:P/VC:N/VI:H/VA:N/SC:N/SI:L/SA:N| Product | Affected Versions | Fixed Version |
|---|---|---|
justhtml EmilStenstrom | <= 1.16.0 | 1.17.0 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-79, CWE-674 |
| Attack Vector | Network |
| CVSS Score | 6.0 (Medium) |
| Impact | Integrity (High), Availability (High - DoS via Resource Exhaustion) |
| Exploit Status | No known weaponized exploits in the wild |
| Affected Component | Sanitization engine, DOM serialization, Node hierarchy |
The software does not adequately neutralize context-breaking sequences or properly manage foreign namespace transitions, leading to XSS. Additionally, uncontrolled recursive node relationships enable Denial of Service.