CVEReports
CVEReports

Automated vulnerability intelligence platform. Comprehensive reports for high-severity CVEs generated by AI.

Product

  • Home
  • Sitemap
  • RSS Feed

Company

  • About
  • Contact
  • Privacy Policy
  • Terms of Service

© 2026 CVEReports. All rights reserved.

Made with love by Amit Schendel & Alon Barad



GHSA-RR7J-V2Q5-CHGV
5.3

GHSA-RR7J-V2Q5-CHGV: Streaming Token Redaction Bypass in LangSmith SDK

Alon Barad
Alon Barad
Software Engineer

Apr 16, 2026·6 min read·1 visit

No Known Exploit

Executive Summary (TL;DR)

A flaw in the LangSmith SDK's telemetry processing pipeline causes streaming token events to bypass `hide_outputs` redaction controls. Applications processing sensitive data via LLM streams transmit unredacted data to LangSmith servers despite active privacy settings.

The LangSmith SDK for both Python and JavaScript/TypeScript fails to apply output redaction controls to streaming token events. This oversight allows sensitive Large Language Model (LLM) outputs to bypass privacy configurations and transmit raw token data to the LangSmith backend, resulting in unintended data exposure.

Vulnerability Overview

The LangSmith SDK provides telemetry and observability features for applications utilizing Large Language Models (LLMs). The SDK operates by capturing execution traces, referred to as "runs," which encapsulate the inputs, outputs, and intermediate computational events of an LLM invocation. This data is serialized and transmitted to a centralized LangSmith backend for analysis and debugging.

Developers utilize SDK configuration options, specifically hideOutputs in JavaScript and hide_outputs in Python, to prevent sensitive data processed by the LLM from egressing to the remote service. These redaction flags instruct the SDK to scrub the contents of a trace before network transmission. This mechanism is critical for compliance with data privacy regulations when handling Personally Identifiable Information (PII) or API keys.

A structural logic flaw exists in the trace event processing pipeline that renders this protection ineffective during streaming operations. When an LLM application operates in streaming mode, it generates discrete token events asynchronously. The SDK's redaction mechanisms exclusively sanitize the primary inputs and outputs objects, leaving the intermediate stream events completely unprocessed.

Technical Root Cause Analysis

The LangSmith SDK tracks execution via structured run objects that bundle request metadata, inputs, outputs, and an array of granular trace events. When an application streams LLM responses back to a client, the SDK captures each chunk of the response and appends a new_token event to the run's events list. These events contain the raw, unredacted string values generated by the model.

During trace serialization, the SDK routes the run object through a preprocessing function designed to enforce data privacy controls. The functions responsible for this enforcement are _hide_run_outputs located in run_helpers.py for the Python SDK, and prepareRunCreateOrUpdateInputs located in traceable.ts for the JavaScript implementation. Both functions evaluate the state of the output hiding flags.

The core implementation error is an incomplete traversal of the trace data structure. The redaction logic inspects and sanitizes the top-level outputs dictionary but fails to iterate over the events list attached to the same run. As a result, the new_token payload bypasses the sanitization phase and is transmitted over the network in its original, cleartext form.

Code Analysis and Mechanics

In the vulnerable JavaScript implementation located in traceable.ts (lines 997-1003 and 1044-1050), the SDK evaluates the privacy flags during trace preparation. The function explicitly targets the outputs property for modification, assigning a placeholder message to obscure the original data.

// Structural representation of vulnerable logic
if (this.hideOutputs) {
  run.outputs = { redaction_message: "Outputs are hidden." };
  // The run.events array is not processed here
}

The Python implementation in run_helpers.py (lines 1924 and 1996) mirrors this behavior exactly. The _hide_run_outputs function checks the run dictionary for an outputs key and redacts it, but it performs no inspection of the events key. This structural symmetry across both SDKs indicates a shared architectural oversight in how streaming trace data was modeled.

The remediation introduces an iterative inspection of the events array prior to trace serialization. If the redaction flags are enabled, the modified functions iterate through the events list, identifying any entry classified as a token generation event. The SDK then scrubs the specific key within the event object that holds the raw string value.

// Structural representation of patched logic
if (this.hideOutputs) {
  run.outputs = { redaction_message: "Outputs are hidden." };
  if (run.events && run.events.length > 0) {
    run.events = run.events.map(event => {
      if (event.name === "new_token") {
        // Scrub the specific token data
        event.kwargs = { ...event.kwargs, token: "[REDACTED]" };
      }
      return event;
    });
  }
}

Exploitation Methodology

This vulnerability manifests as a passive data leak triggered by normal application behavior rather than an active exploit requiring specific threat actor interaction. The exploitation prerequisite is a targeted application utilizing the LangSmith SDK with explicit privacy controls enabled and streaming output functionality engaged.

As the application processes user requests and streams responses from the LLM, the SDK silently generates the non-compliant telemetry payload. The application transmits this payload to the LangSmith backend over standard HTTPS connections. The leak occurs asynchronously in the background, entirely transparent to the end user and the application developer.

An attacker does not exploit this issue by sending crafted inputs to the vulnerable application. Instead, an unauthorized actor or an insider threat leverages existing read access to the LangSmith project dashboard. By navigating to the execution trace of a supposedly redacted run and examining the "events" tab, the actor can view and exfiltrate the full, cleartext response stream.

Impact Assessment

The primary impact of this vulnerability is the unintended exposure of sensitive information, cataloged technically under CWE-212 (Improper Removal of Sensitive Information) and CWE-200 (Exposure of Sensitive Information). The severity of the impact is directly correlated to the data classification level of the LLM inputs and outputs processed by the affected application.

Applications deployed in healthcare, finance, or enterprise environments frequently rely on output redaction to maintain compliance with regulatory frameworks such as HIPAA, GDPR, or PCI-DSS. This vulnerability causes direct compliance violations by transmitting data implicitly trusted to remain localized or sanitized to a third-party SaaS environment.

The CVSS v3.1 score of 5.3 reflects the specific technical characteristics of the flaw. The attack vector is Network-based with Low complexity, requiring no specialized privileges or user interaction to trigger the leak. The impact is limited to Confidentiality, with zero direct risk to Integrity or Availability of the host system.

Remediation and Mitigation

The definitive resolution for this vulnerability is upgrading the LangSmith SDK to the patched versions where the event array iteration logic is correctly implemented. Python developers must update the langsmith PyPI package to version 0.7.31 or later. Node.js and TypeScript developers must update the langsmith npm package to version 0.5.19 or later.

Verification of the installed package versions should be performed via command-line dependency management tools. Administrators can run pip show langsmith for Python environments and npm list langsmith for Node.js environments to confirm successful application of the patch.

If operational constraints prevent immediate package upgrades, developers can implement temporary application-level mitigations. Disabling streaming for specific LLM runs that process highly sensitive data eliminates the generation of new_token events. Without stream events, the SDK's existing top-level redaction logic successfully processes and sanitizes the final static outputs.

Technical Appendix

CVSS Score
5.3/ 10
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:N/A:N

Affected Systems

LangSmith SDK for PythonLangSmith SDK for JavaScript/TypeScript

Affected Versions Detail

Product
Affected Versions
Fixed Version
langsmith (npm)
LangChain
< 0.5.190.5.19
langsmith (PyPI)
LangChain
< 0.7.310.7.31
AttributeDetail
Vulnerability IDGHSA-RR7J-V2Q5-CHGV
CVSS Score5.3 (Medium)
Attack VectorNetwork
CWE-IDCWE-212
Exploit StatusNone (No Public PoC)
KEV StatusNot Listed

MITRE ATT&CK Mapping

T1567Exfiltration Over Web Service
Exfiltration
T1005Data from Local System
Collection
CWE-212
Improper Removal of Sensitive Information before Storage or Transfer

The software removes sensitive information from an object, resource, or stream before it is stored or transferred, but it does not remove all sensitive information.

References & Sources

  • [1]GitHub Advisory: GHSA-rr7j-v2q5-chgv
  • [2]OSV Record: GHSA-rr7j-v2q5-chgv
  • [3]LangSmith SDK Repository
  • [4]JavaScript SDK Security Advisory

Attack Flow Diagram

Press enter or space to select a node. You can then use the arrow keys to move the node around. Press delete to remove it and escape to cancel.
Press enter or space to select an edge. You can then press delete to remove it or escape to cancel.