Mar 24, 2026·7 min read·1 visit
justhtml < 1.13.0 fails to dynamically size backtick fences when serializing <pre> tags to Markdown, enabling XSS through code block breakouts.
The Python library `justhtml` versions prior to 1.13.0 suffer from a Cross-Site Scripting (XSS) vulnerability due to improper handling of HTML `<pre>` elements during Markdown serialization. This flaw permits attackers to break out of generated Markdown code blocks and execute arbitrary JavaScript when the output is processed by downstream Markdown renderers.
The justhtml package is a Python library designed for handling and serializing HTML content into Markdown formats. It processes raw HTML elements and translates them into corresponding Markdown syntax for downstream rendering. This library is commonly utilized in content management systems, static site generators, and web applications that require automated Markdown conversion.
A security vulnerability exists in versions of justhtml prior to 1.13.0, categorized under CWE-79 (Cross-site Scripting) and CWE-74 (Injection). The vulnerability specifically manifests during the serialization of HTML <pre> elements. The serialization engine fails to properly sanitize or encapsulate nested code block delimiters supplied within the raw input.
This flaw permits an attacker to inject arbitrary HTML payloads that survive the Markdown conversion process. When the resulting Markdown is subsequently parsed by standard engines like CommonMark or GitHub Flavored Markdown (GFM), the injected payload is rendered as raw HTML. This results in Cross-Site Scripting (XSS) if the downstream application serves this content to end users without secondary sanitization.
The root cause of this vulnerability lies in the to_markdown() method within the justhtml serialization engine, specifically located in the src/justhtml/node.py file. When the engine encounters an HTML <pre> element, it attempts to convert it into a Markdown fenced code block. The implementation utilized a static, hardcoded three-backtick sequence to delineate the start and end of this block.
The serialization logic did not inspect the inner text of the <pre> element for matching backtick sequences. If the user-supplied content contained three or more consecutive backticks, the generated Markdown document would contain nested or premature closing delimiters. The parser systematically prepended and appended the hardcoded fence around the raw input without validating the structural integrity of the output.
Markdown specifications dictate that a fenced code block terminates when it encounters a closing fence of equal or greater length than the opening fence. By supplying a payload containing exactly three backticks, an attacker forcibly terminates the code block prematurely. Any content following this sequence within the original <pre> tag is then treated as standard Markdown or raw HTML by downstream renderers.
Prior to version 1.13.0, the justhtml codebase handled <pre> tag serialization by blindly wrapping the innerHTML with static backticks. This approach failed to account for adversarial input designed to break the structural boundaries of the generated Markdown document. The lack of dynamic fence sizing created a direct injection vector for subsequent processing stages.
The patch introduced in commit f35f8f723c713bd8f912d86e9ec6881275ff5af9 remediates this issue by implementing a dynamic backtick fence calculation. A new function, _markdown_backtick_fence, scans the input string to determine the longest contiguous run of backtick characters. It then generates a boundary marker that is strictly at least one character longer than the longest identified sequence.
This defensive programming approach ensures that the opening and closing fences will always safely encapsulate the inner content, regardless of how many backticks the attacker provides. The relevant patched logic is demonstrated in the code snippet below, highlighting the calculation methodology. By iterating through the string, the parser guarantees mathematical precedence over the attacker's input.
# Updated logic in src/justhtml/node.py
def _markdown_backtick_fence(s: str | None, *, minimum: int) -> str:
if s is None: s = ""
longest = 0
run = 0
for ch in s:
if ch == "`":
run += 1
if run > longest:
longest = run
else:
run = 0
return "`" * max(minimum, longest + 1)The following architecture diagram illustrates the parser flow and where the dynamic calculation mitigates the breakout path. This visual representation clarifies the state transitions during serialization. The parser strictly enforces the boundary markers before yielding the final document.
Exploitation requires the attacker to submit a crafted HTML string containing a <pre> element to an application utilizing a vulnerable version of justhtml. The payload must contain a sequence of backticks sufficient to close the hardcoded Markdown fence, followed by the actual XSS payload. No authentication is inherently required unless the target application enforces access controls on the input mechanism.
The provided proof-of-concept demonstrates the exact attack mechanics against the vulnerable parser logic. The attacker submits a carefully structured HTML payload containing encoded backticks. The justhtml.to_markdown() function serializes this input, placing the attacker's backticks directly after its own opening sequence.
from justhtml import JustHTML
vulnerable_html = "<pre>```\n<img src=x onerror=alert(1)></pre>"
doc = JustHTML(vulnerable_html, fragment=True)
print(doc.to_markdown())
# Output:
# ```
# ```
# <img src=x onerror=alert(1)>
# ```This execution results in an empty code block, immediately followed by the raw <img src=x onerror=alert(1)> tag on a new line. Downstream Markdown parsers interpret this raw HTML tag verbatim, executing the embedded JavaScript. This completely bypasses intended sanitization filters that operate under the assumption that the justhtml output is purely text-based Markdown.
The successful exploitation of this vulnerability results in arbitrary JavaScript execution within the context of the victim's browser session. This manifests as a classic Stored or Reflected XSS scenario, contingent upon how the application handles the serialized Markdown output. The primary impact is isolated to the client side, but the consequences for affected users remain severe.
An attacker leveraging this flaw can steal active session cookies, capture anti-CSRF tokens, or perform unauthorized administrative actions on behalf of the victim. If the targeted victim holds elevated privileges within the application, the attacker can systematically pivot this client-side execution into broader system compromise or sensitive data exfiltration.
The vulnerability severity is classified as High due to the low attack complexity and the complete lack of required privileges to trigger the payload in typical deployment architectures. Applications that rely exclusively on justhtml for sanitization before passing data to a Markdown renderer are completely exposed to this structural injection vector.
The implemented patch in version 1.13.0 effectively neutralizes the primary code block breakout vector by ensuring dynamic fence sizing. Security engineers must analyze the broader serialization engine for similar structural vulnerabilities. Markdown relies heavily on specific delimiter sequences, such as asterisks for emphasis or hashes for headers, which present similar architectural challenges if not handled dynamically.
An examination of the remaining justhtml codebase indicates that other block-level elements do not currently require the same dynamic delimiter calculation. Elements like <blockquote> or <ul> do not use closing fences in the same structural manner as <pre>. The current patch logic is tightly localized to the to_markdown() backtick enumeration phase and completely resolves the identified CWE-79 vector.
Furthermore, the patch assumes the downstream Markdown parser perfectly adheres to the CommonMark specification regarding backtick fence length termination. If a specific downstream implementation contains parsing discrepancies or edge cases regarding fence termination limits, highly specific variant breakouts might occur. Developers must pair the upgraded justhtml library with a strictly compliant, standardized Markdown renderer to prevent cross-parser exploitation scenarios.
The primary and most effective remediation strategy is to immediately upgrade the justhtml library to version 1.13.0 or later. This version contains the dynamic fence calculation logic that mathematically guarantees the strict encapsulation of backtick sequences within <pre> elements. Package managers can apply this update seamlessly via pip install justhtml>=1.13.0.
In environments where immediate patching is not technically feasible, developers must implement secondary, post-processing sanitization controls. This involves passing the finalized output of justhtml through a robust, dedicated HTML sanitizer, such as bleach, before rendering the content in the DOM. Alternatively, downstream Markdown renderers can be strictly configured to universally forbid the processing of raw HTML tags.
Security teams should systematically audit their codebases to identify all instances where untrusted user input is passed to JustHTML.to_markdown(). Implementing strict Content Security Policy (CSP) headers provides a critical defense-in-depth layer, inherently restricting the execution of inline scripts and sharply mitigating the impact of any successful XSS injections.
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N| Product | Affected Versions | Fixed Version |
|---|---|---|
justhtml EmilStenstrom | < 1.13.0 | 1.13.0 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-79, CWE-74 |
| Attack Vector | Network |
| CVSS v3.1 Score | 7.5 (High) |
| Impact | Arbitrary JavaScript Execution |
| Exploit Status | Proof of Concept Available |
| KEV Status | Not Listed |
| Affected Component | justhtml.to_markdown() |
| Remediation | Upgrade to >= 1.13.0 |
The software does not neutralize or incorrectly neutralizes user-controllable input before it is placed in output that is used as a web page that is served to other users.