CVEReports
CVEReports

Automated vulnerability intelligence platform. Comprehensive reports for high-severity CVEs generated by AI.

Product

  • Home
  • Sitemap
  • RSS Feed

Company

  • About
  • Contact
  • Privacy Policy
  • Terms of Service

© 2026 CVEReports. All rights reserved.

Made with love by Amit Schendel & Alon Barad



GHSA-8JR5-V98P-W75M

GHSA-8JR5-V98P-W75M: Perception Desynchronization via Unnormalized EXIF Orientation and PNG Transparency in vLLM

Alon Barad
Alon Barad
Software Engineer

Jun 17, 2026·8 min read·2 visits

Executive Summary (TL;DR)

vLLM failed to normalize image EXIF orientation and PNG transparency metadata. This causes Vision-Language Models to see a different image (e.g., rotated or with visible high-contrast text) than what is visually shown to human moderators, enabling silent prompt injections and safety bypasses.

A critical preprocessing mismatch exists in vLLM's multimodal image pipeline before commit cf1c90672404548aa3bc51f92c4745576a65ee26. The vulnerability occurs because the engine loads user-submitted images and passes them to underlying Vision-Language Models (VLMs) without normalizing their EXIF orientation metadata or fully resolving complex transparency structures. This gap creates a perception desynchronization vulnerability where the physical pixel grid processed by the AI model differs significantly from how the image is visually rendered to human moderators or frontend applications. Attackers can exploit this mismatch to perform silent prompt injections, bypass safety moderation systems, or execute adversarial jailbreaks.

Vulnerability Overview

The integration of vision capabilities into Large Language Models (LLMs) introduces a new class of input processing vulnerabilities. In multimodal pipelines, the engine must ingest, decode, and normalize diverse image formats before feeding them to the underlying neural network. This ingestion path constitutes a critical security boundary, particularly when the system relies on human-in-the-loop validation or upstream automated classifiers.

GHSA-8JR5-V98P-W75M identifies a perception desynchronization vulnerability in the vLLM preprocessing engine. The flaw lies in the handling of image metadata and transparency structures, specifically EXIF orientation and PNG transparency channels. When vLLM processes user-supplied images, it generates a pixel representation that differs fundamentally from the image rendered to human moderators or web frontends.

This discrepancy creates an interpretation conflict classified under CWE-1156 and CWE-436. Attackers can leverage this conflict to hide adversarial payloads or bypass visual safety filters. The vulnerability affects all integrations where vLLM is deployed as the backend inference engine for multimodal applications.

Root Cause Analysis

The root cause of this vulnerability lies in the default handling of image metadata and alpha channels by the Python Imaging Library (Pillow) within vLLM's preprocessing code. The vulnerability manifests through two primary vectors: EXIF orientation desynchronization and unnormalized transparency layers.

In the first vector, images captured by physical cameras or generated by specific tools contain Exchangeable Image File Format (EXIF) metadata. This metadata includes an Orientation tag (tag ID 0x0112) that defines the correct viewing angle. Standard rendering engines, such as modern web browsers or image viewers, parse this tag and rotate the pixel grid dynamically before rendering. Prior to the fix, vLLM omitted this normalization step, passing the raw, unrotated pixel grid to the vision encoder. This caused the model to process spatial features differently from how human reviewers saw them.

In the second vector, the processing engine failed to normalize transparency indicators across diverse image modes. Transparent PNG and GIF files use alpha channels or transparency tables to mask background pixels. While vLLM previously handled explicit 'RGBA' mode images, it neglected other transparency-carrying modes. These modes include Palette-based images ('P') with a 'tRNS' chunk, Grayscale with Alpha ('LA'), and standard RGB images containing 'tRNS' metadata chunks.

When converting these unhandled formats to 'RGB', Pillow's default behavior drops transparency and maps transparent pixels to solid black or default palette index values. This behavior allows attackers to execute 'Looming' or 'AlphaDog' style prompt injections. An attacker can write malicious text in white on a transparent background. To a user or moderator on a white web interface, the text is invisible. To the vLLM engine, the background becomes black, rendering the high-contrast white text highly legible to the model.

Code Analysis

The vulnerability was located in vllm/multimodal/image.py and vllm/multimodal/media/image.py. In vulnerable versions, the image conversion routine directly called the native PIL .convert() method without inspecting the image's internal metadata for orientation or extended transparency attributes.

# Vulnerable Implementation
def load_bytes(self, data: bytes) -> MediaWithBytes[Image.Image]:
    try:
        # Lazily opens the image bytes
        image = Image.open(BytesIO(data))
        # Loads the image without checking EXIF Orientation (0x0112)
        image.load()
        # Direct conversion to RGB potentially discards transparency chunks
        image = image.convert("RGB")

The patch introduces the normalize_image helper, which uses PIL's ImageOps.exif_transpose to physically rewrite the pixel matrix according to the embedded EXIF orientation tag. This guarantees that the physical matrix processed by the vision encoder matches the visual representation rendered in standard web interfaces.

# Patched Implementation
def normalize_image(image: Image.Image) -> Image.Image:
    """Normalize EXIF orientation so the pixel data matches visual display."""
    with contextlib.suppress(Exception):
        image = ImageOps.exif_transpose(image)
    return image

Additionally, the patch addresses the transparency vulnerability by implementing a robust detection function _has_transparency and updating the conversion routine to perform alpha-compositing over a solid white canvas prior to RGB conversion.

def _has_transparency(image: Image.Image) -> bool:
    """Detect whether an image carries transparency data."""
    if image.mode in ("RGBA", "LA", "PA"):
        return True
    return "transparency" in getattr(image, "info", {})
 
def convert_image_mode(
    image: Image.Image,
    to_mode: str,
    background_color: tuple[int, int, int] | list[int] = (255, 255, 255),
) -> Image.Image:
    if image.mode == to_mode:
        return image
 
    # If converting to RGB and transparency is detected, perform alpha-compositing
    if to_mode == "RGB" and _has_transparency(image):
        if image.mode != "RGBA":
            image = image.convert("RGBA")
        return rgba_to_rgb(image, background_color)
 
    return image.convert(to_mode)

A critical technical limitation remains in the patch. The use of contextlib.suppress(Exception) in normalize_image ignores errors thrown during EXIF parsing. If an attacker crafts an image with a corrupt EXIF header that crashes Pillow's parser but is successfully recovered and rendered by a target web browser, the perception desynchronization vulnerability can still be achieved.

Exploitation

An attacker can exploit this vulnerability through two principal vectors to bypass visual content filters or insert hidden text instructions. The first vector leverages the alpha channel desynchronization to conduct invisible prompt injections, while the second uses EXIF orientation parameters to scramble input features for classifiers.

In the prompt injection scenario, the attacker generates a PNG image in Palette ('P') mode. The background of this image is set to palette index 1, which is marked as transparent using the tRNS metadata chunk. The attacker then writes text instructions in white (RGB: 255, 255, 255) over this background. When displayed in a typical browser UI with a white background, the white text is unreadable against the white canvas. However, when processed by vLLM, the transparent index resolves to black, making the white text highly visible to the Vision-Language Model.

In the classification bypass scenario, the attacker takes an offensive image, rotates the raw pixel canvas by 180 degrees, and embeds an EXIF orientation metadata value of 3. A human moderator or front-end classifier that respects EXIF tags will rotate the image back to upright and flag or review it based on its content. If the backend inference pipeline does not normalize EXIF metadata, vLLM processes the raw, upside-down image. The spatial encoding of the vision model (such as the patch embeddings in a Vision Transformer) is scrambled relative to the downstream text generator, preventing the model from recognizing safety-violating concepts.

Impact Assessment

The impact of perception desynchronization in multimodal environments is high, especially for systems automating decisions based on visual content. Because Vision-Language Models are increasingly deployed to automate administrative, moderating, or security-critical tasks, the ability to feed different data to the model than what humans approve represents a substantial security bypass vector.

If the VLM is used for document extraction (e.g., parsing invoices or legal contracts), an attacker can inject hidden instructions to modify financial amounts or extract system variables. Since the human auditor only sees a clean, benign document in the PDF viewer, the exploit runs silently in the background. The model processes the raw text extracted from the unnormalized pixels, leading to unauthorized operations or data exfiltration.

Furthermore, safety filters relying on visual classifiers can be completely bypassed. By manipulating the EXIF orientation metadata, offensive, copyrighted, or sensitive material can be ingested into the training or inference pipeline without triggering spatial detection heuristics. This vulnerability shows that the security boundary of LLM-based applications extends deep into standard media preprocessing utilities.

Remediation

To remediate this vulnerability, organizations must upgrade vLLM to a version containing the official fix integrated in Pull Request #44974 and Commit cf1c90672404548aa3bc51f92c4745576a65ee26.

For systems where immediate updates are not feasible due to compatibility constraints, legacy pipelines must implement custom preprocessing wrappers. These wrappers must normalize EXIF orientation using Pillow's ImageOps.exif_transpose and flatten any transparency channels prior to submitting the images to the inference engine. The following Python middleware example demonstrates how to secure an image input stream:

from io import BytesIO
from PIL import Image, ImageOps
 
def secure_preprocess_image(raw_bytes: bytes) -> Image.Image:
    image = Image.open(BytesIO(raw_bytes))
    
    # Normalize EXIF orientation
    try:
        image = ImageOps.exif_transpose(image)
    except Exception as e:
        # Handle or log parsing errors instead of silently suppressing them
        raise ValueError("Malformed EXIF header") from e
    
    # Normalize transparency by compositing over white canvas
    if image.mode in ("RGBA", "LA", "PA") or "transparency" in image.info:
        image = image.convert("RGBA")
        canvas = Image.new("RGBA", image.size, (255, 255, 255, 255))
        image = Image.alpha_composite(canvas, image).convert("RGB")
    else:
        image = image.convert("RGB")
        
    return image

Security teams should also audit their frontends to ensure that the rendering of images matches the background color used for alpha blending in the backend (RGB: 255, 255, 255). Alignment between frontend display constraints and backend model preprocessing is critical to preventing interpretation conflicts.

Official Patches

vllm-projectOfficial Pull Request resolving image preprocessing bugs

Fix Analysis (1)

Technical Appendix

CVSS Score
8.6/ 10
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:N

Affected Systems

vllm

Affected Versions Detail

Product
Affected Versions
Fixed Version
vllm
vllm-project
< commit cf1c90672404548aa3bc51f92c4745576a65ee26commit cf1c90672404548aa3bc51f92c4745576a65ee26
AttributeDetail
CWE IDCWE-1156 / CWE-436
Attack VectorNetwork
CVSS8.6
ImpactPerception Desynchronization / Security Bypass
Exploit StatusPoC Available
KEV StatusNot Listed

MITRE ATT&CK Mapping

T1036Masquerading
Defense Evasion
T1566Phishing
Initial Access
T1204User Execution
Execution
CWE-1156
Identification of Entity with Multiple Interpretations

The application identifies or resolves an entity with multiple potential interpretations, leading to an interpretation conflict or perception desynchronization.

Known Exploits & Detection

GitHub Security AdvisoryProof of Concept validation code demonstrating transparency and EXIF manipulation using Pillow.

References & Sources

  • [1]GHSA-8JR5-V98P-W75M Security Advisory
  • [2]vLLM Pull Request #44974
  • [3]vLLM Bug Fix Commit

Attack Flow Diagram

Press enter or space to select a node. You can then use the arrow keys to move the node around. Press delete to remove it and escape to cancel.
Press enter or space to select an edge. You can then press delete to remove it or escape to cancel.

More Reports

•about 4 hours ago•GHSA-664H-GPGQ-H6XX
5.4

GHSA-664h-gpgq-h6xx: Privilege Escalation via Broken Authorization in n8n Evaluation Test Runs Controller

An incorrect authorization vulnerability exists in the open-source workflow automation platform n8n within the Evaluation Test Runs Controller. In deployments utilizing Advanced Permissions, an authenticated user assigned a low-privilege project:viewer role can bypass configured permission policies. This allows the unauthorized user to execute, terminate, or delete workflow evaluation test runs by exploiting misconfigured API scope validations that map read-only scopes to mutating endpoints.

Amit Schendel
Amit Schendel
4 views•6 min read
•about 11 hours ago•GHSA-JWM3-QCFW-C5PP
5.1

GHSA-jwm3-qcfw-c5pp: Security Bypass in n8n Python Code Node AST Validator

An authenticated security-bypass vulnerability in n8n allows users with workflow creation or modification privileges to bypass the Python AST security validator. By circumventing AST validation logic, attackers can execute arbitrary statements, access the task executor's root module namespace, and disclose sensitive host environment variables on self-hosted instances.

Amit Schendel
Amit Schendel
7 views•6 min read
•about 11 hours ago•GHSA-H3JJ-5F3V-3685
6.4

GHSA-H3JJ-5F3V-3685: Public API Execution Retry Authorization Bypass in n8n

An incorrect authorization vulnerability in the Public API of n8n allows authenticated users with read-only permissions to bypass access control boundaries. By invoking the execution retry endpoint, an unauthorized user can trigger workflow executions, effectively escalating their privileges from workflow:read to workflow:execute.

Amit Schendel
Amit Schendel
6 views•5 min read
•about 17 hours ago•GHSA-M3Q2-P4FW-W38M
2.3

GHSA-M3Q2-P4FW-W38M: Cross-Site Scripting (XSS) via Unsafe innerHTML Assignment in Nuxt <NoScript> Component

A low-severity Cross-Site Scripting (XSS) vulnerability in Nuxt's globally registered <NoScript> head component allows unauthenticated attackers to execute arbitrary JavaScript. By injecting dynamic, untrusted data into <NoScript> slots, standard Vue HTML escaping is bypassed because the component processes slot text nodes and assigns them directly to the target element's innerHTML property instead of textContent. In modern browsers with scripting enabled, this raw injection can implicitly close the <noscript> tag, triggering script execution.

Amit Schendel
Amit Schendel
5 views•8 min read
•about 18 hours ago•CVE-2026-49993
5.7

CVE-2026-49993: Proprietary Source Code Exfiltration via Incomplete Same-Origin Verification in Nuxt Dev Servers

CVE-2026-49993 identifies an incomplete same-origin check validation mechanism in @nuxt/webpack-builder and @nuxt/rspack-builder dev server middleware. When the local development server is bound to a non-loopback address, cross-origin attackers can bypass verification checks by suppressing browser headers, leading to unauthorized retrieval and exfiltration of compiled source code chunks.

Amit Schendel
Amit Schendel
8 views•4 min read
•about 19 hours ago•GHSA-69QJ-PVH9-C5WG
7.5

GHSA-69QJ-PVH9-C5WG: Command Injection in yt-dlp `--exec` Option

An OS command injection vulnerability in yt-dlp before 2026.06.09 allows unauthenticated remote attackers to execute arbitrary shell commands via crafted media metadata when a user processes media using the --exec post-processing parameter with unsafe string interpolation conversions.

Alon Barad
Alon Barad
10 views•7 min read