The root cause of this vulnerability lies in the default handling of image metadata and alpha channels by the Python Imaging Library (Pillow) within vLLM's preprocessing code. The vulnerability manifests through two primary vectors: EXIF orientation desynchronization and unnormalized transparency layers.

In the first vector, images captured by physical cameras or generated by specific tools contain Exchangeable Image File Format (EXIF) metadata. This metadata includes an Orientation tag (tag ID 0x0112) that defines the correct viewing angle. Standard rendering engines, such as modern web browsers or image viewers, parse this tag and rotate the pixel grid dynamically before rendering. Prior to the fix, vLLM omitted this normalization step, passing the raw, unrotated pixel grid to the vision encoder. This caused the model to process spatial features differently from how human reviewers saw them.

In the second vector, the processing engine failed to normalize transparency indicators across diverse image modes. Transparent PNG and GIF files use alpha channels or transparency tables to mask background pixels. While vLLM previously handled explicit 'RGBA' mode images, it neglected other transparency-carrying modes. These modes include Palette-based images ('P') with a 'tRNS' chunk, Grayscale with Alpha ('LA'), and standard RGB images containing 'tRNS' metadata chunks.

When converting these unhandled formats to 'RGB', Pillow's default behavior drops transparency and maps transparent pixels to solid black or default palette index values. This behavior allows attackers to execute 'Looming' or 'AlphaDog' style prompt injections. An attacker can write malicious text in white on a transparent background. To a user or moderator on a white web interface, the text is invisible. To the vLLM engine, the background becomes black, rendering the high-contrast white text highly legible to the model.

# Vulnerable Implementation def load_bytes(self, data: bytes) -> MediaWithBytes[Image.Image]: try: # Lazily opens the image bytes image = Image.open(BytesIO(data)) # Loads the image without checking EXIF Orientation (0x0112) image.load() # Direct conversion to RGB potentially discards transparency chunks image = image.convert("RGB")

# Patched Implementation def normalize_image(image: Image.Image) -> Image.Image: """Normalize EXIF orientation so the pixel data matches visual display.""" with contextlib.suppress(Exception): image = ImageOps.exif_transpose(image) return image

def _has_transparency(image: Image.Image) -> bool: """Detect whether an image carries transparency data.""" if image.mode in ("RGBA", "LA", "PA"): return True return "transparency" in getattr(image, "info", {}) def convert_image_mode( image: Image.Image, to_mode: str, background_color: tuple[int, int, int] | list[int] = (255, 255, 255), ) -> Image.Image: if image.mode == to_mode: return image # If converting to RGB and transparency is detected, perform alpha-compositing if to_mode == "RGB" and _has_transparency(image): if image.mode != "RGBA": image = image.convert("RGBA") return rgba_to_rgb(image, background_color) return image.convert(to_mode)

from io import BytesIO from PIL import Image, ImageOps def secure_preprocess_image(raw_bytes: bytes) -> Image.Image: image = Image.open(BytesIO(raw_bytes)) # Normalize EXIF orientation try: image = ImageOps.exif_transpose(image) except Exception as e: # Handle or log parsing errors instead of silently suppressing them raise ValueError("Malformed EXIF header") from e # Normalize transparency by compositing over white canvas if image.mode in ("RGBA", "LA", "PA") or "transparency" in image.info: image = image.convert("RGBA") canvas = Image.new("RGBA", image.size, (255, 255, 255, 255)) image = Image.alpha_composite(canvas, image).convert("RGB") else: image = image.convert("RGB") return image

Product

Affected Versions

Fixed Version

vllm

vllm-project

< commit cf1c90672404548aa3bc51f92c4745576a65ee26

commit cf1c90672404548aa3bc51f92c4745576a65ee26

Attribute

Detail

CWE ID

CWE-1156 / CWE-436

Attack Vector

Network

CVSS

8.6

Impact

Perception Desynchronization / Security Bypass

Exploit Status

PoC Available

KEV Status

Not Listed

GHSA-8JR5-V98P-W75M

GHSA-8JR5-V98P-W75M: Perception Desynchronization via Unnormalized EXIF Orientation and PNG Transparency in vLLM

Alon Barad

Software Engineer

Jun 17, 2026·8 min read·12 visits

Executive Summary (TL;DR)

vLLM failed to normalize image EXIF orientation and PNG transparency metadata. This causes Vision-Language Models to see a different image (e.g., rotated or with visible high-contrast text) than what is visually shown to human moderators, enabling silent prompt injections and safety bypasses.

A critical preprocessing mismatch exists in vLLM's multimodal image pipeline before commit cf1c90672404548aa3bc51f92c4745576a65ee26. The vulnerability occurs because the engine loads user-submitted images and passes them to underlying Vision-Language Models (VLMs) without normalizing their EXIF orientation metadata or fully resolving complex transparency structures. This gap creates a perception desynchronization vulnerability where the physical pixel grid processed by the AI model differs significantly from how the image is visually rendered to human moderators or frontend applications. Attackers can exploit this mismatch to perform silent prompt injections, bypass safety moderation systems, or execute adversarial jailbreaks.

Attack Flow Diagram

Vulnerability Overview

The integration of vision capabilities into Large Language Models (LLMs) introduces a new class of input processing vulnerabilities. In multimodal pipelines, the engine must ingest, decode, and normalize diverse image formats before feeding them to the underlying neural network. This ingestion path constitutes a critical security boundary, particularly when the system relies on human-in-the-loop validation or upstream automated classifiers.

GHSA-8JR5-V98P-W75M identifies a perception desynchronization vulnerability in the vLLM preprocessing engine. The flaw lies in the handling of image metadata and transparency structures, specifically EXIF orientation and PNG transparency channels. When vLLM processes user-supplied images, it generates a pixel representation that differs fundamentally from the image rendered to human moderators or web frontends.

This discrepancy creates an interpretation conflict classified under CWE-1156 and CWE-436. Attackers can leverage this conflict to hide adversarial payloads or bypass visual safety filters. The vulnerability affects all integrations where vLLM is deployed as the backend inference engine for multimodal applications.

Root Cause Analysis

Code Analysis

The vulnerability was located in vllm/multimodal/image.py and vllm/multimodal/media/image.py. In vulnerable versions, the image conversion routine directly called the native PIL .convert() method without inspecting the image's internal metadata for orientation or extended transparency attributes.

# Vulnerable Implementation
def load_bytes(self, data: bytes) -> MediaWithBytes[Image.Image]:
    try:
        # Lazily opens the image bytes
        image = Image.open(BytesIO(data))
        # Loads the image without checking EXIF Orientation (0x0112)
        image.load()
        # Direct conversion to RGB potentially discards transparency chunks
        image = image.convert("RGB")

The patch introduces the normalize_image helper, which uses PIL's ImageOps.exif_transpose to physically rewrite the pixel matrix according to the embedded EXIF orientation tag. This guarantees that the physical matrix processed by the vision encoder matches the visual representation rendered in standard web interfaces.

# Patched Implementation
def normalize_image(image: Image.Image) -> Image.Image:
    """Normalize EXIF orientation so the pixel data matches visual display."""
    with contextlib.suppress(Exception):
        image = ImageOps.exif_transpose(image)
    return image

Additionally, the patch addresses the transparency vulnerability by implementing a robust detection function _has_transparency and updating the conversion routine to perform alpha-compositing over a solid white canvas prior to RGB conversion.

def _has_transparency(image: Image.Image) -> bool:
    """Detect whether an image carries transparency data."""
    if image.mode in ("RGBA", "LA", "PA"):
        return True
    return "transparency" in getattr(image, "info", {})
 
def convert_image_mode(
    image: Image.Image,
    to_mode: str,
    background_color: tuple[int, int, int] | list[int] = (255, 255, 255),
) -> Image.Image:
    if image.mode == to_mode:
        return image
 
    # If converting to RGB and transparency is detected, perform alpha-compositing
    if to_mode == "RGB" and _has_transparency(image):
        if image.mode != "RGBA":
            image = image.convert("RGBA")
        return rgba_to_rgb(image, background_color)
 
    return image.convert(to_mode)

A critical technical limitation remains in the patch. The use of contextlib.suppress(Exception) in normalize_image ignores errors thrown during EXIF parsing. If an attacker crafts an image with a corrupt EXIF header that crashes Pillow's parser but is successfully recovered and rendered by a target web browser, the perception desynchronization vulnerability can still be achieved.

Exploitation

An attacker can exploit this vulnerability through two principal vectors to bypass visual content filters or insert hidden text instructions. The first vector leverages the alpha channel desynchronization to conduct invisible prompt injections, while the second uses EXIF orientation parameters to scramble input features for classifiers.

In the prompt injection scenario, the attacker generates a PNG image in Palette ('P') mode. The background of this image is set to palette index 1, which is marked as transparent using the tRNS metadata chunk. The attacker then writes text instructions in white (RGB: 255, 255, 255) over this background. When displayed in a typical browser UI with a white background, the white text is unreadable against the white canvas. However, when processed by vLLM, the transparent index resolves to black, making the white text highly visible to the Vision-Language Model.

In the classification bypass scenario, the attacker takes an offensive image, rotates the raw pixel canvas by 180 degrees, and embeds an EXIF orientation metadata value of 3. A human moderator or front-end classifier that respects EXIF tags will rotate the image back to upright and flag or review it based on its content. If the backend inference pipeline does not normalize EXIF metadata, vLLM processes the raw, upside-down image. The spatial encoding of the vision model (such as the patch embeddings in a Vision Transformer) is scrambled relative to the downstream text generator, preventing the model from recognizing safety-violating concepts.

Impact Assessment

The impact of perception desynchronization in multimodal environments is high, especially for systems automating decisions based on visual content. Because Vision-Language Models are increasingly deployed to automate administrative, moderating, or security-critical tasks, the ability to feed different data to the model than what humans approve represents a substantial security bypass vector.

If the VLM is used for document extraction (e.g., parsing invoices or legal contracts), an attacker can inject hidden instructions to modify financial amounts or extract system variables. Since the human auditor only sees a clean, benign document in the PDF viewer, the exploit runs silently in the background. The model processes the raw text extracted from the unnormalized pixels, leading to unauthorized operations or data exfiltration.

Furthermore, safety filters relying on visual classifiers can be completely bypassed. By manipulating the EXIF orientation metadata, offensive, copyrighted, or sensitive material can be ingested into the training or inference pipeline without triggering spatial detection heuristics. This vulnerability shows that the security boundary of LLM-based applications extends deep into standard media preprocessing utilities.

Remediation

To remediate this vulnerability, organizations must upgrade vLLM to a version containing the official fix integrated in Pull Request #44974 and Commit cf1c90672404548aa3bc51f92c4745576a65ee26.

For systems where immediate updates are not feasible due to compatibility constraints, legacy pipelines must implement custom preprocessing wrappers. These wrappers must normalize EXIF orientation using Pillow's ImageOps.exif_transpose and flatten any transparency channels prior to submitting the images to the inference engine. The following Python middleware example demonstrates how to secure an image input stream:

from io import BytesIO
from PIL import Image, ImageOps
 
def secure_preprocess_image(raw_bytes: bytes) -> Image.Image:
    image = Image.open(BytesIO(raw_bytes))
    
    # Normalize EXIF orientation
    try:
        image = ImageOps.exif_transpose(image)
    except Exception as e:
        # Handle or log parsing errors instead of silently suppressing them
        raise ValueError("Malformed EXIF header") from e
    
    # Normalize transparency by compositing over white canvas
    if image.mode in ("RGBA", "LA", "PA") or "transparency" in image.info:
        image = image.convert("RGBA")
        canvas = Image.new("RGBA", image.size, (255, 255, 255, 255))
        image = Image.alpha_composite(canvas, image).convert("RGB")
    else:
        image = image.convert("RGB")
        
    return image

Security teams should also audit their frontends to ensure that the rendering of images matches the background color used for alpha blending in the backend (RGB: 255, 255, 255). Alignment between frontend display constraints and backend model preprocessing is critical to preventing interpretation conflicts.

Official Patches

vllm-projectOfficial Pull Request resolving image preprocessing bugs

Fix Analysis (1)

Technical Appendix

CVSS Score

8.6/ 10

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:N

Affected Systems

vllm

Affected Versions Detail

Product	Affected Versions	Fixed Version
vllm vllm-project	< commit cf1c90672404548aa3bc51f92c4745576a65ee26	commit cf1c90672404548aa3bc51f92c4745576a65ee26

Attribute	Detail
CWE ID	CWE-1156 / CWE-436
Attack Vector	Network
CVSS	8.6
Impact	Perception Desynchronization / Security Bypass
Exploit Status	PoC Available
KEV Status	Not Listed

MITRE ATT&CK Mapping

CWE-1156

Identification of Entity with Multiple Interpretations

The application identifies or resolves an entity with multiple potential interpretations, leading to an interpretation conflict or perception desynchronization.

Known Exploits & Detection

GitHub Security AdvisoryProof of Concept validation code demonstrating transparency and EXIF manipulation using Pillow.

More Reports

•32 minutes ago•CVE-2026-53606

5.4

CVE-2026-53606: Stored Cross-Site Scripting (XSS) via Unsanitized URI-bearing Attributes in sanitize-html

An incomplete default configuration vulnerability in sanitize-html prior to version 2.17.5 allows remote attackers to execute arbitrary JavaScript code via crafted HTML payloads containing neglected URI-bearing attributes (e.g., action, formaction, data, xlink:href) that bypass input validation logic.

Alon Barad

0 views•6 min read

•about 2 hours ago•CVE-2026-53609

9.1

CVE-2026-53609: Server-Side Prototype Pollution in ApostropheCMS

A critical server-side prototype pollution vulnerability in ApostropheCMS versions up to and including 4.30.0 allows authenticated editors to write arbitrary properties to the global Object.prototype via patch operators. Exploiting a confirmed gadget in publicApiCheck() bypasses authorization on all piece-type REST API endpoints framework-wide, persisting for the lifetime of the Node.js process.

Alon Barad

2 views•6 min read

•about 3 hours ago•CVE-2026-53607

3.7

CVE-2026-53607: Server-Side Request Forgery in ApostropheCMS via Host Header Manipulation

An unauthenticated Server-Side Request Forgery (SSRF) vulnerability exists in ApostropheCMS versions up to and including 4.30.0. When the prettyUrls option is enabled in the @apostrophecms/file module, the server constructs internal self-requests using the client-provided HTTP Host header, allowing remote attackers to coerce the server into initiating outbound requests to arbitrary internal or external hosts.

Alon Barad

3 views•8 min read

•about 4 hours ago•CVE-2026-53608

8.7

CVE-2026-53608: Stored Cross-Site Scripting in @apostrophecms/seo via Unsanitized Tracking IDs

A stored Cross-Site Scripting (XSS) vulnerability exists in the @apostrophecms/seo package of the ApostropheCMS ecosystem up to and including version 1.4.2. Unsanitized user inputs for Google Analytics and Google Tag Manager IDs are injected directly into script elements within the document header, enabling authenticated editors to execute arbitrary JavaScript in the context of all site visitors.

Alon Barad

5 views•5 min read

•about 5 hours ago•CVE-2026-54909

5.3

CVE-2026-54909: Remote Denial of Service in Pion STUN via Malformed XOR-MAPPED-ADDRESS Attribute

A remote denial of service vulnerability exists in the pion/stun package before version 3.1.3. A malformed STUN packet containing a short or empty XOR-MAPPED-ADDRESS attribute triggers a runtime slice-bounds panic during parsing, terminating the entire Go process.

Amit Schendel

4 views•6 min read

•about 6 hours ago•CVE-2026-54908

6.3

CVE-2026-54908: Remote Denial of Service via Out-of-Bounds Read in Pion DTLS Handshake Parsing

CVE-2026-54908 is a Denial of Service (DoS) vulnerability in the Pion DTLS library, where a malformed ServerKeyExchange message triggers an uncaught out-of-bounds slice read panic during handshake unmarshaling, terminating the hosting application process.

Amit Schendel

7 views•6 min read