CVEReports
CVEReports

Automated vulnerability intelligence platform. Comprehensive reports for high-severity CVEs generated by AI.

Product

  • Home
  • Sitemap
  • RSS Feed

Company

  • About
  • Contact
  • Privacy Policy
  • Terms of Service

© 2026 CVEReports. All rights reserved.

Made with love by Amit Schendel & Alon Barad



GHSA-8RFP-98V4-MMR6

GHSA-8RFP-98V4-MMR6: Protocol-Filtering Bypass via Unicode Obfuscation in Mozilla Bleach

Amit Schendel
Amit Schendel
Senior Security Researcher

Jun 16, 2026·7 min read·2 visits

Executive Summary (TL;DR)

Mozilla Bleach versions up to 6.3.0 fail to sanitize URLs containing high-plane Unicode or invisible characters in the scheme prefix. This allows blocked protocols like 'javascript:' to bypass sanitization filters, creating stored Cross-Site Scripting (XSS) risks in downstream environments that normalize or strip Unicode data.

Mozilla Bleach is an open-source HTML sanitizing library for Python. Versions up to and including 6.3.0 contain an incomplete filtering implementation in the URI validation logic ('sanitize_uri_value'). This logic fails to detect disallowed protocols, such as 'javascript:', if they contain Unicode invisible characters, whitespace characters, or characters with a code point greater than U+00A0. While standard-compliant web browsers do not directly execute invalid URI schemes containing these non-standard characters, downstream systems that normalize Unicode text by stripping invisible or non-ASCII characters can unintentionally reactivate the 'javascript:' prefix, causing Cross-Site Scripting (XSS). Additionally, this behavior violates Bleach's core sanitization contract by outputting URIs that bypass protocol allowlists configured by the caller.

Vulnerability Overview

Mozilla Bleach is a Python-based HTML sanitization library that parses, cleans, and filters HTML fragments from untrusted inputs. It operates as a primary defense boundary in web applications, validating allowed tags, attributes, and URI schemes to prevent Cross-Site Scripting (XSS) and data injection vulnerabilities.

Under normal execution, when developers enable anchor tags (<a>) and link attributes (href), Bleach checks the value of the href attribute. It guarantees that the parsed scheme matches an explicit allowlist of safe protocols, such as http, https, or mailto. Any URI containing an disallowed scheme, such as javascript:, is stripped or nullified.

The vulnerability tracked as GHSA-8RFP-98V4-MMR6 resides in the preprocessing loop of the URI validation logic. When processing attributes, Bleach fails to identify blocked protocols if they contain specific high-range Unicode code points, invisible characters, or non-standard whitespace. Consequently, the invalid URI bypasses validation checks and is preserved in the output, which violates Bleach's security guarantees and compromises downstream rendering components.

Root Cause Analysis

The underlying technical flaw lies within the sanitize_uri_value function inside the bleach/sanitizer.py component. Prior to passing a URI string to Python's standard library parser (urllib.parse.urlparse), Bleach performs an initial cleanup operation to strip backticks, control characters, and standard whitespace.

This cleanup is executed using the regular expression re.sub(r"[\000-\040\177-\240\s]+", "", normalized_uri). This regex matches only ASCII control characters (up to \040), standard whitespace characters matched by \s, and characters in the \177-\240 range. It fails to match high-plane Unicode characters or invisible formatting characters such as the Zero-Width Space (\u200b), Byte-Order Mark (\ufeff), or Soft Hyphen (\u00ad`).

When an attacker constructs a link containing an invisible Unicode character inside the protocol scheme name (e.g., javascript\u200b:alert(1)), the character is not removed during the regex cleanup stage. The resulting string is subsequently passed to urllib.parse.urlparse. Because the string contains a non-ASCII character within the scheme sequence, urlparse fails to recognize javascript\u200b as a valid scheme under RFC 3986 rules and parses the value as a relative path.

Since urlparse does not extract javascript as the scheme, Bleach treats the input as a safe, relative path. The sanitizer permits the obfuscated href attribute to pass through unchanged. This allows the dangerous payload to reside in the cleaned dataset.

Code Analysis and Patch Verification

In version 6.3.0 and prior, the preprocessing code in bleach/sanitizer.py is configured as follows:

# Vulnerable Code: bleach/sanitizer.py (<= v6.3.0)
def sanitize_uri_value(self, value, allowed_protocols):
    # Convert HTML entities to raw characters
    normalized_uri = html5lib_shim.convert_entities(value)
 
    # Remove backtick, space, and control characters (ASCII only)
    normalized_uri = re.sub(r"[`\000-\040\177-\240\s]+", "", normalized_uri)
 
    # Remove REPLACEMENT characters
    normalized_uri = normalized_uri.replace("\ufffd", "")
 
    # Lowercase for pattern matching
    normalized_uri = normalized_uri.lower()

The vulnerability was remediated in commit 7c4867c32344d1c961107fae62240a6f0dc680dc by removing the specific replacement of the \ufffd character and introducing a regular expression that strips all non-ASCII characters from the URI prior to the validation phase:

# Patched Code: bleach/sanitizer.py (v6.4.0)
def sanitize_uri_value(self, value, allowed_protocols):
    # Convert HTML entities to raw characters
    normalized_uri = html5lib_shim.convert_entities(value)
 
    # Strip backtick, whitespace, and control characters
    normalized_uri = re.sub(r"[`\000-\040\177-\240\s]+", "", normalized_uri)
 
    # Strip non-ASCII characters so that urlparse can parse the url into
    # components correctly. This drops invisible and whitespace unicode
    # characters among other things.
    normalized_uri = re.sub(r"[^\x00-\x7f]", "", normalized_uri)
 
    # Lowercase value to make matching easier
    normalized_uri = normalized_uri.lower()

This remediation ensures that any high-plane Unicode or non-ASCII invisible character is removed before scheme parsing. An input of javascript\u200b:alert(1) is sanitized to javascript:alert(1) during preprocessing. This collapsed string is successfully parsed by urlparse as the javascript scheme, which Bleach flags as a blocked protocol and correctly strips from the tag.

While this fix is highly complete and secures the parser against Unicode-based bypasses, it strips legitimate internationalized domain names (IDNs) or paths that utilize non-ASCII characters. This functional limitation was accepted by the maintainers as a necessary trade-off for security enforcement in the final release of the project.

Exploitation Methodology

Exploiting this vulnerability requires a multi-tier environment where Bleach's output is processed by a downstream component before rendering. Standard browsers do not execute schemes containing raw invisible Unicode characters directly, as they interpret them as relative paths or unrecognized schemas.

The vulnerability is triggered when the database, template formatter, or downstream backend processor normalizes the Unicode input (e.g., stripping non-ASCII characters, applying compatibility decompositions, or running character cleanup scripts). If a backend normalizes the string javascript\u200b:alert(document.cookie) by removing non-ASCII elements, the invisible character is discarded, resulting in the executable payload javascript:alert(document.cookie).

A Python proof of concept demonstrates how Bleach preserves the payload while downstream execution collapses it into a functional exploit vector:

import bleach
import re
 
# Configure sanitizer allowlists
allowed_tags = ['a']
allowed_attrs = {'a': ['href']}
allowed_protocols = ['http', 'https']
 
# Construct input payload with Zero-Width Space (\u200b)
raw_input = '<a href="javascript\u200b:alert(document.cookie)">Target Link</a>'
 
# Stage 1: Bleach sanitization (v6.3.0)
sanitized_output = bleach.clean(raw_input, tags=allowed_tags, attributes=allowed_attrs, protocols=allowed_protocols)
print("Sanitized Output:", repr(sanitized_output))
# Result: '<a href="javascript\u200b:alert(document.cookie)">Target Link</a>'
 
# Stage 2: Downstream Normalization (dropping non-ASCII characters)
final_html = re.sub(r"[^\x00-\x7f]", "", sanitized_output)
print("Final Rendered HTML:", repr(final_html))
# Result: '<a href="javascript:alert(document.cookie)">Target Link</a>'

Impact & Security Risk Assessment

The CVSS Base Score is evaluated as 0.0 (Low) due to the reliance on downstream processing to trigger the vulnerability in typical web browser environments. However, in applications that perform multi-stage processing, database character conversion, or Unicode normalization post-sanitization, this represents a severe stored Cross-Site Scripting (XSS) pathway.

An attacker who successfully executes XSS via this bypass can bypass the Same-Origin Policy (SOP), hijack user sessions, read confidential application data, extract session cookies, or execute arbitrary authenticated actions on behalf of the victim.

Furthermore, this vulnerability breaks the absolute safety contract promised by Bleach to callers. Developers rely on sanitizers to output clean, safe fragments regardless of the downstream processing architecture. Failing to filter these vectors introduces security degradation across complex application architectures.

Remediation and Migration Guidance

The immediate remediation is to upgrade to Mozilla Bleach version 6.4.0. This is the final release of the library and patches the vulnerability by stripping non-ASCII characters prior to scheme parsing.

pip install --upgrade bleach==6.4.0

Because Bleach is officially deprecated and has reached End-of-Life (EOL), organizations should plan a migration path to an actively maintained alternative. The recommended alternative is 'nh3', which provides Python bindings to the Rust-based 'ammonia' HTML sanitizer library. Ammonia enforces strict, RFC-compliant URI validation and is not subject to this Unicode parsing issue.

Where upgrading is not immediately possible, implement a pre-processing filter to strip Unicode invisible characters (such as \u200b, \ufeff, and \u00ad) before submitting strings to Bleach:

import re
 
def pre_sanitize(html_input):
    # Remove common invisible Unicode characters prior to Bleach processing
    unicode_pattern = re.compile(r"[\u200b\ufeff\u00ad]")
    return unicode_pattern.sub("", html_input)

Deploy a robust Content Security Policy (CSP) header to mitigate potential script execution if an active scheme bypasses the filters. Ensure the policy prevents inline scripts and script execution via URI schemes:

Content-Security-Policy: default-src 'self'; script-src 'self';

Official Patches

MozillaCommit implementing non-ASCII character removal before validation

Fix Analysis (1)

Technical Appendix

CVSS Score
0.0/ 10
CVSS:3.1/AV:N/AC:H/PR:N/UI:R/S:U/C:N/I:N/A:N

Affected Systems

Mozilla Bleach <= 6.3.0

Affected Versions Detail

Product
Affected Versions
Fixed Version
bleach
Mozilla
<= 6.3.06.4.0
AttributeDetail
CWE IDCWE-184 (Incomplete List of Disallowed Inputs)
Attack VectorNetwork (AV:N)
CVSS v3.1 Score0.0 (Low due to indirect downstream dependency)
ImpactBypass of protocol validation filters / Secondary stored XSS
Exploit StatusProof-of-Concept (PoC) available
KEV StatusNot listed in CISA KEV

MITRE ATT&CK Mapping

T1204.001User Execution: Malicious Link
Execution
T1036Masquerading
Defense Evasion
CWE-184
Incomplete List of Disallowed Inputs

The product or library uses a list of disallowed inputs but does not fully validate or normalize characters, allowing attackers to bypass protection mechanisms using alternative Unicode representations.

Known Exploits & Detection

Research ContextDemonstrates Bleach v6.3.0 bypass using Zero-Width Space in the protocol scheme to evade urlparse and trigger XSS on downstream normalization.

Vulnerability Timeline

Fix commit developed and merged into the main development branch
2026-03-16
Bleach version 6.4.0 tagged and published, announcing EOL deprecation
2026-06-05
GHSA-8RFP-98V4-MMR6 advisory published
2026-06-16

References & Sources

  • [1]GitHub Advisory Database Advisory
  • [2]Mozilla Bugzilla #2023812

Attack Flow Diagram

Press enter or space to select a node. You can then use the arrow keys to move the node around. Press delete to remove it and escape to cancel.
Press enter or space to select an edge. You can then press delete to remove it or escape to cancel.

More Reports

•about 1 hour ago•GHSA-534H-C3CW-V3H9
5.5

GHSA-534h-c3cw-v3h9: Local Information Disclosure via Abstract-Namespace Socket in Nuxt Dev Server

A local security vulnerability in the Nuxt development server (nuxt dev) allows local unprivileged users to access sensitive configuration files and source code. On Linux environments running Node.js 20+, Nuxt bound its internal vite-node IPC server to an abstract-namespace Unix socket without any peer authentication, enabling co-resident local users to connect and request module code directly.

Amit Schendel
Amit Schendel
2 views•5 min read
•about 2 hours ago•GHSA-G75F-G53V-794X
4.3

GHSA-G75F-G53V-794X: CPU Exhaustion via Unbounded Email Regular Expression Scanning in Bleach

An uncontrolled resource consumption vulnerability exists in the Python package Bleach when parsing text to linkify email addresses. When `parse_email=True` is enabled, the regular expression engine is forced into a quadratic-time complexity scan on specially crafted payloads lacking an '@' symbol. This causes immediate CPU exhaustion and blocks application server worker processes.

Amit Schendel
Amit Schendel
2 views•6 min read
•about 3 hours ago•GHSA-GR75-JV2W-4656
4.7

GHSA-GR75-JV2W-4656: Path Traversal and Sandbox Escape in LangChain File-Search Middleware and Loaders

A path traversal and sandbox escape vulnerability in LangChain and LangChain-Anthropic Python packages allows unauthenticated local attackers to access files outside the restricted directory via crafted input, symbolic links, or prefix bypasses.

Alon Barad
Alon Barad
2 views•8 min read
•about 3 hours ago•GHSA-M557-WRGG-6RP4
5.8

GHSA-m557-wrgg-6rp4: Server-Side Request Forgery via Authority Information Access (AIA) Chasing in phpseclib

The PHP Secure Communications Library (phpseclib) contains a Server-Side Request Forgery (SSRF) vulnerability due to an insecure default implementation of Authority Information Access (AIA) certificate chasing. This flaw allows remote, unauthenticated attackers to coerce applications validating user-supplied X.509 certificates into generating arbitrary outbound HTTP requests to internal networks or local interfaces.

Amit Schendel
Amit Schendel
3 views•6 min read
•about 4 hours ago•CVE-2026-45491
6.2

CVE-2026-45491: Directory Traversal via Improper Link Resolution in .NET System.Formats.Tar

A directory traversal vulnerability exists in the Microsoft .NET System.Formats.Tar library during archive extraction. When extracting a TAR archive using the TarFile.ExtractToDirectory API, the extraction engine improperly resolves symbolic links prior to file creation, allowing local unauthorized attackers to write or overwrite arbitrary files outside the target directory. This can lead to local tampering, privilege escalation, or arbitrary code execution.

Amit Schendel
Amit Schendel
7 views•6 min read
•about 4 hours ago•GHSA-GJ48-438W-JH9V
6.1

GHSA-GJ48-438W-JH9V: Client-Side HTML Sanitization Bypass in Bleach

A client-side HTML sanitization bypass vulnerability exists in the Bleach library where the formaction attribute is not recognized as a URI. This allows attackers to inject javascript: URIs when formaction is on the allowed list, resulting in Cross-Site Scripting (XSS).

Alon Barad
Alon Barad
5 views•6 min read