CVEReports
CVEReports

Automated vulnerability intelligence platform. Comprehensive reports for high-severity CVEs generated by AI.

Product

  • Home
  • Sitemap
  • RSS Feed

Company

  • About
  • Contact
  • Privacy Policy
  • Terms of Service

© 2026 CVEReports. All rights reserved.

Made with love by Amit Schendel & Alon Barad



GHSA-WVRH-2F4M-924V

GHSA-wvrh-2f4m-924v: Symlink-Following Arbitrary File Write in ChatterBot UbuntuCorpusTrainer

Amit Schendel
Amit Schendel
Senior Security Researcher

Jun 22, 2026·6 min read·5 visits

Executive Summary (TL;DR)

The UbuntuCorpusTrainer component in ChatterBot is vulnerable to a local symlink-following attack that allows arbitrary file writes. Attackers can pre-plant symbolic links in predictable paths to redirect archive extraction, potentially overwriting critical user files or system configurations.

An insecure file extraction vulnerability exists in the UbuntuCorpusTrainer component of the ChatterBot package. Due to a combination of a predictable download path, a check-then-create directory pattern, and unvalidated symbolic link resolution during archive extraction, local attackers can write arbitrary files to restricted filesystem paths.

Vulnerability Overview

The Python package chatterbot provides machine-learning-based dialog engines. To support automated training, the package implements several trainer modules, including UbuntuCorpusTrainer. This trainer is designed to fetch, extract, and parse conversation logs from the Ubuntu Chat Corpus.

To manage downloaded corpora, the trainer establishes a predictable subdirectory within the running user's home directory (~/ubuntu_data/ubuntu_dialogs). This location represents the main attack surface for local administrative manipulation. Because the software runs with the privileges of the executing user, any vulnerabilities in how it handles files in this shared or local namespace directly expose that user's file write capabilities.

This specific security flaw is categorized as a Symlink-Following Arbitrary File Write. It arises from improper resolution of directory paths before extracting compressed file structures. This allows an unprivileged local attacker to hijack the extraction process and write files into sensitive directories that the victim user has permissions to modify.

Root Cause Analysis

The root cause of this vulnerability lies in a weak implementation of the directory validation process combined with insecure extraction practices in standard Python libraries. The target execution flow uses a standard check-then-create (TOCTOU) logic block. Specifically, the trainer checks whether the extraction destination folder exists using os.path.exists() before calling os.makedirs(). Because os.path.exists() inherently follows symbolic links, it returns True if a symlink exists and points to a valid directory. Consequently, the application skips creating a new directory and proceeds directly to extraction.

When tar.extractall() executes, the library resolves the extraction target base directory. If that base directory is a symbolic link, Python's file operations transparently traverse the link, writing the archive members to the attacker-defined target directory. This bypasses the security protections of the environment.

Furthermore, the custom safety check implemented within the software (safe_extract) is flawed. The function attempts to validate that all extracted archive members reside within the boundary of the extraction directory. However, it resolves the extraction directory path using os.path.abspath(). This step resolves the pre-planted symbolic link to the attacker's target directory before validating the paths of the archive members. As a result, the relative path validation of each file is evaluated against the attacker's chosen directory, rendering the safety check completely ineffective.

Code Analysis

The insecure directory initialization path is configured within chatterbot/trainers.py as follows:

home_directory = os.path.expanduser('~')
self.data_directory = kwargs.get(
    'ubuntu_corpus_data_directory',
    os.path.join(home_directory, 'ubuntu_data')   # ~/ubuntu_data - predictable path
)
self.data_path = os.path.join(
    self.data_directory, 'ubuntu_dialogs'          # ~/ubuntu_data/ubuntu_dialogs
)

During execution, the extract function attempts to verify the existence of self.data_path using the following logic:

def extract(self, file_path: str):
    if not os.path.exists(self.data_path):   # follows symlink -> returns True -> skips makedirs
        os.makedirs(self.data_path)          # never reached if symlink exists

Because os.path.exists(self.data_path) resolves the symlink and checks the target directory, it returns True if the target exists, preventing os.makedirs from raising an error. The extraction then proceeds through safe_extract:

def safe_extract(tar, path='.', members=None, *, numeric_owner=False):
    for member in tar.getmembers():
        member_path = os.path.join(path, member.name)
        if not is_within_directory(path, member_path):    # validates MEMBER names only
            raise Exception('Attempted Path Traversal in Tar File')
    tar.extractall(path, members, numeric_owner=numeric_owner)  # path is symlink -> writes to target

In this execution, is_within_directory performs absolute path matching. When path is a symbolic link, os.path.abspath(path) resolves the link to the destination folder. Consequently, member_path is also constructed relative to the resolved destination folder, causing the security validation to pass successfully.

Exploitation Methodology

Exploitation of this vulnerability requires local access to the target system. An attacker must have permissions to write to the running user's home directory structure to pre-seed the predictable directory location.

To exploit the flaw, the attacker creates a symbolic link at ~/ubuntu_data/ubuntu_dialogs that points directly to a sensitive directory on the filesystem (e.g., a target configuration directory, workspace, or an execution path like /var/tmp). When the victim runs the UbuntuCorpusTrainer, the application extracts the compressed dataset. Instead of writing files to the user's local corpus directory, the application extracts the dataset files directly into the directory pointed to by the symbolic link.

import os
import tempfile
from pathlib import Path
from chatterbot.trainers import UbuntuCorpusTrainer
 
ATTACKER_TARGET = Path(tempfile.mkdtemp(prefix="pwned_"))
 
def main():
    # Pre-configure predictable folders
    test_base = Path(tempfile.mkdtemp(prefix="cb_exploit_"))
    data_dir = test_base / "ubuntu_data"
    data_path = data_dir / "ubuntu_dialogs"
    data_dir.mkdir(parents=True, exist_ok=True)
    
    # Plant the symlink pointing to the attacker's target
    os.symlink(str(ATTACKER_TARGET), str(data_path))
    print(f"[*] Symlink planted: {data_path} -> {ATTACKER_TARGET}")
    
    # The application processes standard tar extractall routines...
    # (See full PoC source for compressed archive generation details)

Through this process, an attacker can overwrite critical files. If the victim executes the application with elevated privileges, the attacker can leverage this path redirection to modify system files, write cron jobs, or deposit shell scripts into system directories, facilitating local privilege escalation.

Threat and Impact Assessment

The impact of this vulnerability is localized but can result in complete system compromise depending on the user executing the script. If the chatterbot application runs within an administrative context or within a development workspace containing sensitive files, the arbitrary write primitive can be used to overwrite executable binaries, shared objects, or environment configuration files.

The official CVSS v3.1 score is evaluated at 5.5, indicating a Medium severity. The CVSS vector string is CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:N. Although the official metrics register the impact to system integrity as None, practical exploitation demonstrates that arbitrary files can be written to disk, which technically impacts integrity. Availability remains unaffected unless critical system files are overwritten, causing service failure.

Remediation and Mitigation Analysis

The primary remediation strategy is upgrading the chatterbot dependency to version 1.2.14 or higher, which includes path verification mechanisms during training initialization.

When upgrading is not immediately possible, you must implement manual path validation. This is achieved by ensuring that none of the target directory path elements resolve to symbolic links before proceeding with extraction operations. A robust check can be implemented using the following pattern:

import os
 
def is_safe_directory(path: str) -> bool:
    real_path = os.path.realpath(path)
    normalized_path = os.path.abspath(path)
    return real_path == normalized_path

If the resolved real path does not match the normalized path, the application must abort operations. Additionally, avoid running training routines as a privileged user to limit the impact of potential local path traversal and redirection bugs.

Official Patches

gunthercoxChatterBot Project Codebase

Technical Appendix

CVSS Score
5.5/ 10
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:N

Affected Systems

Systems running chatterbot versions <= 1.2.13Local multi-user environments executing machine learning training pipelines using UbuntuCorpusTrainer

Affected Versions Detail

Product
Affected Versions
Fixed Version
chatterbot
gunthercox
<= 1.2.131.2.14
AttributeDetail
CWE IDCWE-59: Improper Link Resolution Before File Access
Attack VectorLocal (AV:L)
CVSS v3.1 Score5.5 (Medium)
Exploit StatusProof of Concept Publicly Available
CISA KEV StatusNot Listed
ImpactArbitrary File Write / Local Privilege Escalation

MITRE ATT&CK Mapping

T1036Masquerading
Defense Evasion
T1068Exploitation for Privilege Escalation
Privilege Escalation
CWE-59
Improper Link Resolution Before File Access ('Link Following')

The application attempts to access a file or directory based on a path, but does not properly resolve symbolic links first, allowing a local attacker to bypass access controls or write files to unauthorized locations.

Known Exploits & Detection

GitHub Security AdvisoryAdvisory containing the full, reproducible proof-of-concept Python script for the UbuntuCorpusTrainer symlink bypass.

References & Sources

  • [1]GitHub Security Advisory GHSA-wvrh-2f4m-924v
  • [2]ChatterBot GitHub Repository

Attack Flow Diagram

Press enter or space to select a node. You can then use the arrow keys to move the node around. Press delete to remove it and escape to cancel.
Press enter or space to select an edge. You can then press delete to remove it or escape to cancel.

More Reports

•about 4 hours ago•GHSA-6GQW-JQV7-V88M
7.2

GHSA-6GQW-JQV7-V88M: Multi-Tenant Isolation Bypass in stigmem-node via Missing SQL Tenant Predicates

A critical vulnerability exists in the stigmem-node package when running the opt-in stigmem-plugin-multi-tenant plugin. Due to a failure to enforce tenant-scoping filters on database queries within the decay sweep, quarantine moderation, and right-to-be-forgotten (RTBF) subsystems, an authorized caller belonging to one tenant can access, modify, and delete facts belonging to all other tenants. This broken object level authorization (BOLA) vulnerability allows cross-tenant data manipulation and information leakage.

Amit Schendel
Amit Schendel
3 views•6 min read
•about 4 hours ago•GHSA-V3F4-W7R7-V3HM
8.6

GHSA-v3f4-w7r7-v3hm: Remote Command Execution via Origin Validation Error in Uni-CLI Legacy HTTP Transport

An origin validation error and cross-site request forgery vulnerability in @zenalexa/unicli prior to version 0.225.2 allows cross-origin web applications to execute arbitrary tools on a user's local machine via the legacy stateless HTTP transport.

Amit Schendel
Amit Schendel
3 views•7 min read
•about 5 hours ago•GHSA-C795-2G9C-J48M
8.2

GHSA-C795-2G9C-J48M: Remote Path Traversal and Arbitrary File Write in EverOS Memory Ingestion

EverOS versions 1.0.0 and earlier contain a path traversal vulnerability in the user memory ingestion endpoint. By exploiting this flaw, unauthenticated network attackers can escape the designated database memory root and write arbitrary Markdown files to target directories on the local system.

Alon Barad
Alon Barad
4 views•6 min read
•about 5 hours ago•GHSA-X975-RGX4-5FH4
8.2

GHSA-X975-RGX4-5FH4: Unescaped Locator Data Cross-Site Scripting in appium-mcp MCP-UI Resource

GHSA-X975-RGX4-5FH4 is a high-severity Cross-Site Scripting (XSS) vulnerability residing in the Model Context Protocol (MCP) User Interface (UI) component of appium-mcp, an NPM package integrating Appium with MCP clients. The flaw exists within the createLocatorGeneratorUI utility function, which renders UI metadata directly into an HTML template page without performing sanitization or encoding. Because MCP clients use window.parent.postMessage to send commands from the UI to the host, this XSS can be escalated to trigger arbitrary MCP tool calls, potentially leading to Remote Code Execution (RCE) on the host running the MCP client.

Alon Barad
Alon Barad
7 views•6 min read
•about 6 hours ago•GHSA-H3M5-97JQ-QJRF
9.6

GHSA-H3M5-97JQ-QJRF: Insecure Direct Object Reference (IDOR) Cross-Realm Bulk Alarm Deletion in OpenRemote Manager

An Insecure Direct Object Reference (IDOR) and missing authorization flaw in OpenRemote Manager allows an authenticated, low-privilege multi-tenant user to execute cross-realm bulk alarm deletion, resulting in permanent destruction of safety-critical alarms belonging to other tenants.

Amit Schendel
Amit Schendel
6 views•7 min read
•about 7 hours ago•GHSA-CW6H-FFMH-X6VH
6.5

GHSA-CW6H-FFMH-X6VH: Arbitrary Local File Disclosure via Same-Origin Policy Bypass in Anki Desktop

Anki Desktop for Windows, macOS, and Linux is vulnerable to local file disclosure and data exfiltration due to an iframe-based Same-Origin Policy (SOP) bypass. Maliciously crafted user scripts inside imported deck files run within the localhost context, bypassing security filters to query internal endpoints and read arbitrary system files.

Alon Barad
Alon Barad
6 views•4 min read