Mar 19, 2026·5 min read·7 visits
JustHTML versions prior to 1.12.0 fail to escape angle brackets during Markdown serialization. Entity-encoded HTML inputs safely parsed by the DOM are emitted as raw HTML in the Markdown output, leading to XSS if rendered downstream.
A sanitizer bypass vulnerability in the JustHTML Python library allows for Cross-Site Scripting (XSS) when safe, entity-encoded HTML input is improperly serialized into raw HTML tags during Markdown generation.
The JustHTML package for Python contains a sanitizer bypass vulnerability affecting the to_markdown() serialization method. This function is responsible for converting parsed HTML document structures into Markdown formatted text. In versions prior to 1.12.0, this conversion process fails to apply necessary character escaping to text nodes.
When JustHTML parses an input document, it safely decodes HTML entities into literal text nodes within the Document Object Model (DOM). For example, an encoded string like <script> is stored internally as a text node containing the literal characters <script>. This behavior is standard and safe within the context of the DOM.
The vulnerability manifests during the serialization phase. While the to_html() method correctly re-encodes these characters into safe entities, the to_markdown() method omits this step. It explicitly preserves angle brackets (< and >), resulting in the emission of raw HTML tags into the output Markdown. If a downstream processor renders this Markdown back into HTML without secondary sanitization, the application is vulnerable to Cross-Site Scripting (XSS).
The root cause of this vulnerability is an incomplete escaping routine in the Markdown serialization logic. The to_markdown() method is designed to escape Markdown-specific metacharacters to prevent layout disruption, but it lacks specific handling for HTML-significant characters within text nodes.
During document parsing, JustHTML handles entities and literal text elements appropriately. Content within tags such as <title>, <textarea>, <noscript>, and <plaintext> is processed as raw text states. The parser decodes entity references found in standard text blocks, correctly treating them as benign data rather than structural markup.
However, the to_markdown() function iterates over these text nodes and serializes their literal contents directly into the output stream. Because the serialization logic intentionally or accidentally skips the escaping of < and >, the text node <script> is written exactly as <script>. This effectively unwraps the initial sanitization, generating a malicious payload from an originally benign input.
The JustHTML library filters actual HTML elements like <script> or <style> by default, provided html_passthrough=True is not set. This protection mechanism creates a false sense of security, as developers assume the resulting Markdown is entirely stripped of active HTML content.
The flaw targets the precise boundary between DOM representation and string serialization. When text nodes are derived from entity-decoded input or extracted from literal text elements, their content bypasses the structural element filters. The to_markdown() routine processes these nodes strictly as text, applying only Markdown-specific escaping rules.
The concrete security impact occurs when the generated Markdown is consumed by a downstream Markdown-to-HTML renderer. Many standard Markdown renderers permit inline raw HTML by default. When the unsanitized JustHTML output is fed into such a renderer, the injected tags are executed by the victim's browser, leading to unauthenticated remote code execution within the context of the application front-end.
Exploiting this vulnerability requires the attacker to supply input containing specific HTML entities. The application must process this input using JustHTML and subsequently export it using the to_markdown() method.
The following Python proof-of-concept demonstrates the injection technique. The input begins as safe HTML containing encoded entities. The entities are successfully decoded into the DOM, but the to_markdown() serialization fails to re-encode them, outputting a raw image tag.
from justhtml import JustHTML
# Input with encoded entities that is safe for HTML output
input_html = "<p><img src=x onerror=alert(1)></p>"
doc = JustHTML(input_html, fragment=True)
print("Safe HTML Output (Escaped):")
print(doc.to_html())
# Output: <p><img src=x onerror=alert(1)></p>
print("\nUnsafe Markdown Output (Tag Injection):")
print(doc.to_markdown())
# Output: <img src=x onerror=alert(1)>This demonstration highlights the sanitizer bypass. The angle brackets are preserved in the Markdown string, creating a valid HTML element. When rendered by a web browser, the onerror event handler triggers, executing the payload.
To remediate this vulnerability, administrators and developers must upgrade the justhtml package to version 1.12.0 or later. This update modifies the to_markdown() method to properly encode HTML-significant characters within text nodes during the serialization process.
In environments where immediate patching is not feasible, developers must implement secondary defense mechanisms. Any Markdown produced by vulnerable versions of to_markdown() should be processed by a robust, security-conscious Markdown-to-HTML renderer. This renderer must be explicitly configured to strip or encode raw HTML tags by default.
Additionally, development teams should audit the application's input handling pipeline. Identifying all sources that feed into the JustHTML parser and mapping where the serialized Markdown is consumed will ensure that the entire data flow is protected against XSS. Strict input validation rejecting unnecessary entity-encoded payloads provides an additional layer of defense.
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:N/VI:N/VA:N/SC:L/SI:L/SA:N| Product | Affected Versions | Fixed Version |
|---|---|---|
justhtml EmilStenstrom | < 1.12.0 | 1.12.0 |
| Attribute | Detail |
|---|---|
| Vulnerability Class | Sanitizer Bypass / Cross-Site Scripting (XSS) |
| CWE ID | CWE-79 |
| Attack Vector | Network |
| Authentication Required | None |
| CVSS v4.0 Score | 5.1 (Moderate) |
| Exploit Maturity | Proof-of-Concept |
| Affected Component | JustHTML.to_markdown() |
The software does not neutralize or incorrectly neutralizes user-controllable input before it is placed in output that is used as a web page that is served to other users.