Mar 18, 2026·5 min read·4 visits
Direct use of Loofah's `allowed_uri?` method fails to properly sanitize HTML-encoded control characters in URIs. Attackers can exploit this by passing payloads like `java script:alert(1)`, bypassing validation and achieving XSS when browsers render and decode the string. Default `Loofah.sanitize()` calls are not affected.
The Loofah Ruby gem version 2.25.0 contains an improper URI validation vulnerability in the `Loofah::HTML5::Scrub.allowed_uri?` helper method. An attacker can bypass protocol validation by using HTML-encoded control characters, leading to Cross-Site Scripting (XSS) when the validated URI is rendered in a browser.
The Loofah Ruby gem is widely used in the Ruby ecosystem for HTML/XML document sanitization and manipulation. It serves as a core dependency for frameworks like Ruby on Rails to prevent Cross-Site Scripting (XSS) attacks. Loofah exposes various helper methods to assist developers in writing custom sanitizers or validating specific inputs.
One such helper is Loofah::HTML5::Scrub.allowed_uri?, which determines whether a given URI string employs an approved protocol scheme, such as http, https, or mailto. This method is intended to prevent the injection of dangerous schemes like javascript: or vbscript: into anchor tags or image sources.
Version 2.25.0 introduces a logic flaw in this helper method regarding the sequence of sanitization operations. The vulnerability specifically affects applications that invoke Loofah::HTML5::Scrub.allowed_uri? directly on user-controlled input prior to embedding that input into DOM attributes.
It is critical to note that the default sanitization paths, such as Loofah.sanitize() and scrub!, remain unaffected by this vulnerability. The underlying Nokogiri XML/HTML parser handles HTML entity decoding before Loofah evaluates the URI protocol, neutralizing the attack vector in default configurations.
The root cause of GHSA-46fp-8f5p-pf2m lies in the improper sequencing of sanitization steps within the Loofah::HTML5::Scrub.allowed_uri? method. Effective URI validation requires removing control characters that browsers typically ignore when parsing URIs, as these characters can obfuscate malicious protocol schemes.
The vulnerable implementation strips literal control characters (such as null bytes, tabs, carriage returns, and line feeds) before it decodes HTML entities. This order of operations creates a bypass condition. If an attacker encodes a control character as an HTML entity, the initial stripping phase fails to detect it.
For example, the carriage return character (\r) can be encoded as . When the allowed_uri? method processes a string containing , the literal control character removal step ignores the entity. The method then checks the apparent protocol scheme against its allowlist.
Because the scheme check does not recognize the split string as a blocked protocol, the method returns true, marking the URI as safe. The flaw manifests fully when the application inserts the "safe" string into an HTML attribute. Modern web browsers decode HTML entities within attributes before evaluating the attribute's content, reconstituting the obfuscated protocol.
Understanding the attack requires examining how browsers handle malformed URIs compared to strict parser logic. Browsers implement highly permissive parsing rules for URI schemes, explicitly ignoring whitespace and control characters embedded within the scheme identifier. This behavior is defined in the WHATWG URL standard to maintain compatibility with legacy web content.
An attacker crafts a payload utilizing this discrepancy:
java script:alert(1)When this payload is passed directly to the vulnerable allowed_uri? method, the string is evaluated verbatim. The method searches for prohibited schemes like javascript:. Due to the embedded , the literal string begins with java script:, which does not match the blocklist. The method consequently flags the input as a valid, permitted URI.
The application proceeds to render the string into the Document Object Model (DOM):
<a href="java script:alert(1)">Click Me</a>Upon rendering, the browser's HTML parser processes the href attribute. It decodes the numeric character reference back into a literal carriage return (\r). The browser's URL parser then evaluates java\rscript:alert(1). Adhering to the WHATWG specification, it discards the carriage return, recognizes the javascript: scheme, and executes the trailing payload in the origin's context.
The exploitation of this vulnerability results in Cross-Site Scripting (XSS). An attacker can execute arbitrary JavaScript within the security context of the victim's browser session. The vulnerability carries a CVSS v4.0 score of 5.3 (Medium), reflecting the localized impact on the client side.
The CVSS vector CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:N/SC:L/SI:L/SA:N/E:U indicates a network-based attack vector with low complexity and no privileges required. While the XSS payload ultimately requires user interaction (or an automated trigger) to execute in the browser, the validation bypass itself occurs autonomously on the server side without user involvement.
Successful execution allows the attacker to perform actions on behalf of the victim. This includes accessing sensitive session tokens stored in localStorage or sessionStorage, reading CSRF tokens to forge state-changing requests, or defacing the application interface. The severity is bounded by the permissions of the authenticated user interacting with the malicious link.
The scope of impact is strictly limited to applications that deviate from Loofah's standard sanitization routines. Codebases relying solely on Loofah.sanitize() or Rails' default sanitize helper are natively protected against this specific bypass technique.
The primary remediation for GHSA-46fp-8f5p-pf2m is upgrading the loofah gem to version 2.25.1. The maintainers have patched the vulnerability by correcting the sanitization sequence within the allowed_uri? method. The fix ensures that HTML entities are properly decoded before literal control characters are stripped and the protocol scheme is evaluated.
Developers must audit their codebases to identify direct invocations of Loofah::HTML5::Scrub.allowed_uri?. Tools like grep or advanced static analysis (SAST) solutions can locate instances where this helper is used to validate user input prior to manual DOM insertion.
For environments where immediate upgrading is not feasible, developers should implement a pre-processing step to decode HTML entities before passing strings to allowed_uri?. Alternatively, routing the input through the standard Loofah.sanitize() pipeline will utilize Nokogiri's robust parsing capabilities to neutralize the threat natively.
As a defense-in-depth measure, organizations should implement a strict Content Security Policy (CSP). A properly configured CSP that disables unsafe-inline and restricts script execution to trusted domains will prevent the browser from executing the injected javascript: scheme, fully mitigating the impact of the validation bypass.
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:N/SC:L/SI:L/SA:N/E:U| Product | Affected Versions | Fixed Version |
|---|---|---|
Loofah Flavorjones | 2.25.0 | 2.25.1 |
| Attribute | Detail |
|---|---|
| Vulnerability Class | Improper URI Validation / Filter Bypass |
| CWE ID | CWE-79 / CWE-116 |
| Attack Vector | Network (AV:N) |
| CVSS v4.0 Score | 5.3 (Medium) |
| Exploit Status | Unproven / Theoretical PoC Available |
| Affected Component | Loofah::HTML5::Scrub.allowed_uri? |
| CISA KEV Status | Not Listed |
Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')