Jun 16, 2026·6 min read·2 visits
A ReDoS vulnerability in Bleach's email linkifier allows remote attackers to cause severe CPU exhaustion by submitting a 30KB payload of repeating dot-atom sequences, resulting in thread starvation and denial of service.
An uncontrolled resource consumption vulnerability exists in the Python package Bleach when parsing text to linkify email addresses. When `parse_email=True` is enabled, the regular expression engine is forced into a quadratic-time complexity scan on specially crafted payloads lacking an '@' symbol. This causes immediate CPU exhaustion and blocks application server worker processes.
The Python bleach package provides HTML sanitization and linkification utilities commonly used to parse user-submitted text and render safe HTML content. One key feature is the linkify module, which converts plain-text URLs and email addresses into clickable HTML anchor tags. When processing text with email linkification enabled, the library relies on a regular expression compilation function to locate and format valid email addresses.
This vulnerability belongs to the Inefficient Regular Expression Complexity class (CWE-1333), also categorized under Uncontrolled Resource Consumption (CWE-400). The attack surface is exposed whenever an application accepts untrusted text inputs and processes them using bleach.linkify() with the parse_email=True parameter enabled.
Because the underlying regular expression engine executes without an explicit timeout, input length boundaries, or linear-time pre-filtering, an attacker can construct input sequences that exploit the pattern matching logic. The resulting CPU exhaustion can degrade application performance, consume all available server worker threads, and trigger a denial of service condition.
The vulnerability resides in the build_email_re function inside bleach/linkifier.py, which constructs the regular expression used to scan text tokens. The function utilizes a complex pattern to match the local-part (the section before the @ symbol) of email addresses. This pattern is structured around a sequence of valid characters followed by optional repetitions of a dot and additional characters.
The specific dot-atom sub-pattern in the compiled regular expression is defined as ([-!#$%&'*+/=?^_{}|~0-9A-Z]+(.[-!#$%&'*+/=?^_{}|~0-9A-Z]+)*. This matching rule requires that each period character (.) be followed by at least one valid local-part character. The engine scans the input token sequentially, attempting to validate the expression.
When the input contains a repeating sequence of characters like a. (such as a.a.a.a.a.a...) but lacks the mandatory @ symbol and domain component, the engine suffers a design flaw during the lookup phase. The engine first matches the entire pattern up to the end of the input string. Once it reaches the end of the string and fails to locate the @ symbol, the match attempt at the current index fails.
Instead of abandoning the search, the engine shifts its scan pointer forward. The engine advances to the next valid starting position and repeats the entire sequence matching process down to the end of the string. For an input of length $N$, this results in overlapping scans that scale quadratically: the first scan processes $N$ characters, the second scans $N-2$, the third scans $N-4$, and so on. This produces a total instruction complexity proportional to $O(N^2)$, causing significant CPU time accumulation.
The vulnerable code path is initiated during tokenization within the LinkifyFilter.handle_email_addresses method. When iterating over text tokens, if the token type is identified as "Characters", the library executes self.email_re.finditer(text) to locate matching instances.
# Vulnerable implementation in bleach/linkifier.py
def build_email_re(tlds=TLDS):
return re.compile(
r"""(?<!//)
(([-!#$%&'*+/=?^_`{{}}|~0-9A-Z]+
(\.[-!#$%&'*+/=?^_`{{}}|~0-9A-Z]+)* # Dot-atom local-part
|^"([\001-\010\013\014\016-\037!#-\[\]-\177]
|\\[\001-\011\013\014\016-\177])*" # Quoted-string local-part
)@(?:[A-Z0-9](?:[A-Z0-9-]{{0,61}}[A-Z0-9])?\.)+(?:{0}))
""".format(
"|".join(tlds)
),
re.IGNORECASE | re.MULTILINE | re.VERBOSE,
)The matching loops are executed sequentially within the token handler method:
def handle_email_addresses(self, src_iter):
"""Handle email addresses in character tokens"""
for token in src_iter:
if token["type"] == "Characters":
text = token["data"]
new_tokens = []
end = 0
# This call triggers the O(N^2) evaluation loop
for match in self.email_re.finditer(text):
# Process the matches...Because finditer processes the entire string from multiple starting positions sequentially, it cannot determine that a match is impossible without traversing the entire remaining string length on each attempt. This behavior occurs because the pattern allows multiple overlapping permutations of dot-atoms before checking for the static @ character.
Exploiting this vulnerability does not require authentication if the target application processes user-supplied text on a public endpoint. An attacker needs to submit a long text string consisting of repeating local-part character groups separated by periods, intentionally omitting the @ character. A payload size of approximately 30,000 bytes is sufficient to cause measurable thread blocking.
import bleach
import time
# Construct the exploit payload (30,001 bytes)
payload = ("a." * 15000) + "a"
print("Executing linkify parsing...")
start = time.time()
# Triggers the quadratic scanning behavior
bleach.linkify(payload, parse_email=True)
print(f"Execution completed in {time.time() - start:.4f} seconds")When a single core executes this script, the CPU utilization spikes to 100 percent for approximately 8.7 seconds. In a production web application using multi-worker servers like Gunicorn, uWSGI, or Celery, sending multiple concurrent requests containing this payload will exhaust all available worker threads. While the worker threads are occupied recalculating the regex matches, the application will fail to respond to any incoming legitimate traffic.
The security impact is restricted to a localized Denial of Service (DoS). The vulnerability does not allow remote code execution, data exfiltration, or unauthorized privilege escalation. However, because many web frameworks deploy a limited number of synchronous worker processes, a sustained flood of small payloads can cause a prolonged service outage.
The CVSS v3.1 base score is assessed at 4.3 (Medium), with the vector string CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:L. This reflects that the vulnerability is remotely exploitable, has low attack complexity, requires low privileges, requires no user interaction, and has a low but distinct impact on application availability.
Because the bleach package is officially deprecated by its maintainers, no official patches or security releases are planned. Consequently, the vulnerability is likely to remain present in systems that continue to use the package without manual mitigation.
To mitigate this vulnerability, developers can implement several programmatic workarounds. The most direct approach is to disable the parse_email argument. If email address parsing is not a core functional requirement of your application, ensure that parse_email is set to False.
If email parsing is required, a highly efficient linear-time ($O(N)$) pre-filter check should be implemented. Because an email address must contain an @ character, checking for its presence using Python's optimized in keyword will prevent the regular expression engine from running on invalid inputs. This check resolves the performance issue for malicious payloads with zero computational overhead.
def safe_linkify(text, parse_email=True):
# If parse_email is True but no '@' symbol is present,
# bypass email linkification to prevent CPU exhaustion.
if parse_email and "@" not in text:
return bleach.linkify(text, parse_email=False)
return bleach.linkify(text, parse_email=parse_email)Additionally, applications should enforce strict length boundaries on all incoming user-submitted text fields. Limiting input fields to a maximum of 2,000 characters prevents attackers from submitting the large strings necessary to trigger prolonged CPU stalls.
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:L| Product | Affected Versions | Fixed Version |
|---|---|---|
bleach Mozilla | <= 6.3.0 | None (Deprecated) |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-1333 |
| Attack Vector | Network |
| CVSS Score | 4.3 |
| Impact | Denial of Service (CPU Exhaustion) |
| Exploit Status | Proof of Concept Available |
| KEV Status | Not Listed |
The regular expression engine can be forced into an inefficient execution path when evaluating inputs, leading to high CPU usage.
A local security vulnerability in the Nuxt development server (nuxt dev) allows local unprivileged users to access sensitive configuration files and source code. On Linux environments running Node.js 20+, Nuxt bound its internal vite-node IPC server to an abstract-namespace Unix socket without any peer authentication, enabling co-resident local users to connect and request module code directly.
Mozilla Bleach is an open-source HTML sanitizing library for Python. Versions up to and including 6.3.0 contain an incomplete filtering implementation in the URI validation logic ('sanitize_uri_value'). This logic fails to detect disallowed protocols, such as 'javascript:', if they contain Unicode invisible characters, whitespace characters, or characters with a code point greater than U+00A0. While standard-compliant web browsers do not directly execute invalid URI schemes containing these non-standard characters, downstream systems that normalize Unicode text by stripping invisible or non-ASCII characters can unintentionally reactivate the 'javascript:' prefix, causing Cross-Site Scripting (XSS). Additionally, this behavior violates Bleach's core sanitization contract by outputting URIs that bypass protocol allowlists configured by the caller.
A path traversal and sandbox escape vulnerability in LangChain and LangChain-Anthropic Python packages allows unauthenticated local attackers to access files outside the restricted directory via crafted input, symbolic links, or prefix bypasses.
The PHP Secure Communications Library (phpseclib) contains a Server-Side Request Forgery (SSRF) vulnerability due to an insecure default implementation of Authority Information Access (AIA) certificate chasing. This flaw allows remote, unauthenticated attackers to coerce applications validating user-supplied X.509 certificates into generating arbitrary outbound HTTP requests to internal networks or local interfaces.
A directory traversal vulnerability exists in the Microsoft .NET System.Formats.Tar library during archive extraction. When extracting a TAR archive using the TarFile.ExtractToDirectory API, the extraction engine improperly resolves symbolic links prior to file creation, allowing local unauthorized attackers to write or overwrite arbitrary files outside the target directory. This can lead to local tampering, privilege escalation, or arbitrary code execution.
A client-side HTML sanitization bypass vulnerability exists in the Bleach library where the formaction attribute is not recognized as a URI. This allows attackers to inject javascript: URIs when formaction is on the allowed list, resulting in Cross-Site Scripting (XSS).