CVEReports
CVEReports

Automated vulnerability intelligence platform. Comprehensive reports for high-severity CVEs generated by AI.

Product

  • Home
  • Sitemap
  • RSS Feed

Company

  • About
  • Contact
  • Privacy Policy
  • Terms of Service

© 2026 CVEReports. All rights reserved.

Made with love by Amit Schendel & Alon Barad



GHSA-3RCM-VJRC-P45J
5.1

GHSA-3rcm-vjrc-p45j: JustHTML Sanitizer Bypass in Markdown Serialization

Amit Schendel
Amit Schendel
Senior Security Researcher

Mar 19, 2026·5 min read·7 visits

PoC Available

Executive Summary (TL;DR)

JustHTML versions prior to 1.12.0 fail to escape angle brackets during Markdown serialization. Entity-encoded HTML inputs safely parsed by the DOM are emitted as raw HTML in the Markdown output, leading to XSS if rendered downstream.

A sanitizer bypass vulnerability in the JustHTML Python library allows for Cross-Site Scripting (XSS) when safe, entity-encoded HTML input is improperly serialized into raw HTML tags during Markdown generation.

Vulnerability Overview

The JustHTML package for Python contains a sanitizer bypass vulnerability affecting the to_markdown() serialization method. This function is responsible for converting parsed HTML document structures into Markdown formatted text. In versions prior to 1.12.0, this conversion process fails to apply necessary character escaping to text nodes.

When JustHTML parses an input document, it safely decodes HTML entities into literal text nodes within the Document Object Model (DOM). For example, an encoded string like &lt;script&gt; is stored internally as a text node containing the literal characters <script>. This behavior is standard and safe within the context of the DOM.

The vulnerability manifests during the serialization phase. While the to_html() method correctly re-encodes these characters into safe entities, the to_markdown() method omits this step. It explicitly preserves angle brackets (< and >), resulting in the emission of raw HTML tags into the output Markdown. If a downstream processor renders this Markdown back into HTML without secondary sanitization, the application is vulnerable to Cross-Site Scripting (XSS).

Root Cause Analysis

The root cause of this vulnerability is an incomplete escaping routine in the Markdown serialization logic. The to_markdown() method is designed to escape Markdown-specific metacharacters to prevent layout disruption, but it lacks specific handling for HTML-significant characters within text nodes.

During document parsing, JustHTML handles entities and literal text elements appropriately. Content within tags such as <title>, <textarea>, <noscript>, and <plaintext> is processed as raw text states. The parser decodes entity references found in standard text blocks, correctly treating them as benign data rather than structural markup.

However, the to_markdown() function iterates over these text nodes and serializes their literal contents directly into the output stream. Because the serialization logic intentionally or accidentally skips the escaping of < and >, the text node <script> is written exactly as <script>. This effectively unwraps the initial sanitization, generating a malicious payload from an originally benign input.

Serialization Mechanics & Impact

The JustHTML library filters actual HTML elements like <script> or <style> by default, provided html_passthrough=True is not set. This protection mechanism creates a false sense of security, as developers assume the resulting Markdown is entirely stripped of active HTML content.

The flaw targets the precise boundary between DOM representation and string serialization. When text nodes are derived from entity-decoded input or extracted from literal text elements, their content bypasses the structural element filters. The to_markdown() routine processes these nodes strictly as text, applying only Markdown-specific escaping rules.

The concrete security impact occurs when the generated Markdown is consumed by a downstream Markdown-to-HTML renderer. Many standard Markdown renderers permit inline raw HTML by default. When the unsanitized JustHTML output is fed into such a renderer, the injected tags are executed by the victim's browser, leading to unauthenticated remote code execution within the context of the application front-end.

Exploitation and Proof of Concept

Exploiting this vulnerability requires the attacker to supply input containing specific HTML entities. The application must process this input using JustHTML and subsequently export it using the to_markdown() method.

The following Python proof-of-concept demonstrates the injection technique. The input begins as safe HTML containing encoded entities. The entities are successfully decoded into the DOM, but the to_markdown() serialization fails to re-encode them, outputting a raw image tag.

from justhtml import JustHTML
 
# Input with encoded entities that is safe for HTML output
input_html = "<p>&lt;img src=x onerror=alert(1)&gt;</p>"
doc = JustHTML(input_html, fragment=True)
 
print("Safe HTML Output (Escaped):")
print(doc.to_html())
# Output: <p>&lt;img src=x onerror=alert(1)&gt;</p>
 
print("\nUnsafe Markdown Output (Tag Injection):")
print(doc.to_markdown())
# Output: <img src=x onerror=alert(1)>

This demonstration highlights the sanitizer bypass. The angle brackets are preserved in the Markdown string, creating a valid HTML element. When rendered by a web browser, the onerror event handler triggers, executing the payload.

Mitigation & Remediation

To remediate this vulnerability, administrators and developers must upgrade the justhtml package to version 1.12.0 or later. This update modifies the to_markdown() method to properly encode HTML-significant characters within text nodes during the serialization process.

In environments where immediate patching is not feasible, developers must implement secondary defense mechanisms. Any Markdown produced by vulnerable versions of to_markdown() should be processed by a robust, security-conscious Markdown-to-HTML renderer. This renderer must be explicitly configured to strip or encode raw HTML tags by default.

Additionally, development teams should audit the application's input handling pipeline. Identifying all sources that feed into the JustHTML parser and mapping where the serialized Markdown is consumed will ensure that the entire data flow is protected against XSS. Strict input validation rejecting unnecessary entity-encoded payloads provides an additional layer of defense.

Official Patches

EmilStenstromGitHub Security Advisory

Technical Appendix

CVSS Score
5.1/ 10
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:N/VI:N/VA:N/SC:L/SI:L/SA:N

Affected Systems

JustHTML (PyPI Package)

Affected Versions Detail

Product
Affected Versions
Fixed Version
justhtml
EmilStenstrom
< 1.12.01.12.0
AttributeDetail
Vulnerability ClassSanitizer Bypass / Cross-Site Scripting (XSS)
CWE IDCWE-79
Attack VectorNetwork
Authentication RequiredNone
CVSS v4.0 Score5.1 (Moderate)
Exploit MaturityProof-of-Concept
Affected ComponentJustHTML.to_markdown()

MITRE ATT&CK Mapping

T1190Exploit Public-Facing Application
Initial Access
T1059.007Command and Scripting Interpreter: JavaScript
Execution
CWE-79
Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

The software does not neutralize or incorrectly neutralizes user-controllable input before it is placed in output that is used as a web page that is served to other users.

Vulnerability Timeline

Advisory published via GitHub Security Advisory (GHSA)
2026-03-18
Version 1.12.0 released on PyPI to address the vulnerability
2026-03-18

References & Sources

  • [1]GitHub Security Advisory: JustHTML Sanitizer Bypass
  • [2]JustHTML Repository
  • [3]OSV Vulnerability Entry

Attack Flow Diagram

Press enter or space to select a node. You can then use the arrow keys to move the node around. Press delete to remove it and escape to cancel.
Press enter or space to select an edge. You can then press delete to remove it or escape to cancel.