Jun 22, 2026·6 min read·5 visits
The UbuntuCorpusTrainer component in ChatterBot is vulnerable to a local symlink-following attack that allows arbitrary file writes. Attackers can pre-plant symbolic links in predictable paths to redirect archive extraction, potentially overwriting critical user files or system configurations.
An insecure file extraction vulnerability exists in the UbuntuCorpusTrainer component of the ChatterBot package. Due to a combination of a predictable download path, a check-then-create directory pattern, and unvalidated symbolic link resolution during archive extraction, local attackers can write arbitrary files to restricted filesystem paths.
The Python package chatterbot provides machine-learning-based dialog engines. To support automated training, the package implements several trainer modules, including UbuntuCorpusTrainer. This trainer is designed to fetch, extract, and parse conversation logs from the Ubuntu Chat Corpus.
To manage downloaded corpora, the trainer establishes a predictable subdirectory within the running user's home directory (~/ubuntu_data/ubuntu_dialogs). This location represents the main attack surface for local administrative manipulation. Because the software runs with the privileges of the executing user, any vulnerabilities in how it handles files in this shared or local namespace directly expose that user's file write capabilities.
This specific security flaw is categorized as a Symlink-Following Arbitrary File Write. It arises from improper resolution of directory paths before extracting compressed file structures. This allows an unprivileged local attacker to hijack the extraction process and write files into sensitive directories that the victim user has permissions to modify.
The root cause of this vulnerability lies in a weak implementation of the directory validation process combined with insecure extraction practices in standard Python libraries. The target execution flow uses a standard check-then-create (TOCTOU) logic block. Specifically, the trainer checks whether the extraction destination folder exists using os.path.exists() before calling os.makedirs(). Because os.path.exists() inherently follows symbolic links, it returns True if a symlink exists and points to a valid directory. Consequently, the application skips creating a new directory and proceeds directly to extraction.
When tar.extractall() executes, the library resolves the extraction target base directory. If that base directory is a symbolic link, Python's file operations transparently traverse the link, writing the archive members to the attacker-defined target directory. This bypasses the security protections of the environment.
Furthermore, the custom safety check implemented within the software (safe_extract) is flawed. The function attempts to validate that all extracted archive members reside within the boundary of the extraction directory. However, it resolves the extraction directory path using os.path.abspath(). This step resolves the pre-planted symbolic link to the attacker's target directory before validating the paths of the archive members. As a result, the relative path validation of each file is evaluated against the attacker's chosen directory, rendering the safety check completely ineffective.
The insecure directory initialization path is configured within chatterbot/trainers.py as follows:
home_directory = os.path.expanduser('~')
self.data_directory = kwargs.get(
'ubuntu_corpus_data_directory',
os.path.join(home_directory, 'ubuntu_data') # ~/ubuntu_data - predictable path
)
self.data_path = os.path.join(
self.data_directory, 'ubuntu_dialogs' # ~/ubuntu_data/ubuntu_dialogs
)During execution, the extract function attempts to verify the existence of self.data_path using the following logic:
def extract(self, file_path: str):
if not os.path.exists(self.data_path): # follows symlink -> returns True -> skips makedirs
os.makedirs(self.data_path) # never reached if symlink existsBecause os.path.exists(self.data_path) resolves the symlink and checks the target directory, it returns True if the target exists, preventing os.makedirs from raising an error. The extraction then proceeds through safe_extract:
def safe_extract(tar, path='.', members=None, *, numeric_owner=False):
for member in tar.getmembers():
member_path = os.path.join(path, member.name)
if not is_within_directory(path, member_path): # validates MEMBER names only
raise Exception('Attempted Path Traversal in Tar File')
tar.extractall(path, members, numeric_owner=numeric_owner) # path is symlink -> writes to targetIn this execution, is_within_directory performs absolute path matching. When path is a symbolic link, os.path.abspath(path) resolves the link to the destination folder. Consequently, member_path is also constructed relative to the resolved destination folder, causing the security validation to pass successfully.
Exploitation of this vulnerability requires local access to the target system. An attacker must have permissions to write to the running user's home directory structure to pre-seed the predictable directory location.
To exploit the flaw, the attacker creates a symbolic link at ~/ubuntu_data/ubuntu_dialogs that points directly to a sensitive directory on the filesystem (e.g., a target configuration directory, workspace, or an execution path like /var/tmp). When the victim runs the UbuntuCorpusTrainer, the application extracts the compressed dataset. Instead of writing files to the user's local corpus directory, the application extracts the dataset files directly into the directory pointed to by the symbolic link.
import os
import tempfile
from pathlib import Path
from chatterbot.trainers import UbuntuCorpusTrainer
ATTACKER_TARGET = Path(tempfile.mkdtemp(prefix="pwned_"))
def main():
# Pre-configure predictable folders
test_base = Path(tempfile.mkdtemp(prefix="cb_exploit_"))
data_dir = test_base / "ubuntu_data"
data_path = data_dir / "ubuntu_dialogs"
data_dir.mkdir(parents=True, exist_ok=True)
# Plant the symlink pointing to the attacker's target
os.symlink(str(ATTACKER_TARGET), str(data_path))
print(f"[*] Symlink planted: {data_path} -> {ATTACKER_TARGET}")
# The application processes standard tar extractall routines...
# (See full PoC source for compressed archive generation details)Through this process, an attacker can overwrite critical files. If the victim executes the application with elevated privileges, the attacker can leverage this path redirection to modify system files, write cron jobs, or deposit shell scripts into system directories, facilitating local privilege escalation.
The impact of this vulnerability is localized but can result in complete system compromise depending on the user executing the script. If the chatterbot application runs within an administrative context or within a development workspace containing sensitive files, the arbitrary write primitive can be used to overwrite executable binaries, shared objects, or environment configuration files.
The official CVSS v3.1 score is evaluated at 5.5, indicating a Medium severity. The CVSS vector string is CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:N. Although the official metrics register the impact to system integrity as None, practical exploitation demonstrates that arbitrary files can be written to disk, which technically impacts integrity. Availability remains unaffected unless critical system files are overwritten, causing service failure.
The primary remediation strategy is upgrading the chatterbot dependency to version 1.2.14 or higher, which includes path verification mechanisms during training initialization.
When upgrading is not immediately possible, you must implement manual path validation. This is achieved by ensuring that none of the target directory path elements resolve to symbolic links before proceeding with extraction operations. A robust check can be implemented using the following pattern:
import os
def is_safe_directory(path: str) -> bool:
real_path = os.path.realpath(path)
normalized_path = os.path.abspath(path)
return real_path == normalized_pathIf the resolved real path does not match the normalized path, the application must abort operations. Additionally, avoid running training routines as a privileged user to limit the impact of potential local path traversal and redirection bugs.
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:N| Product | Affected Versions | Fixed Version |
|---|---|---|
chatterbot gunthercox | <= 1.2.13 | 1.2.14 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-59: Improper Link Resolution Before File Access |
| Attack Vector | Local (AV:L) |
| CVSS v3.1 Score | 5.5 (Medium) |
| Exploit Status | Proof of Concept Publicly Available |
| CISA KEV Status | Not Listed |
| Impact | Arbitrary File Write / Local Privilege Escalation |
The application attempts to access a file or directory based on a path, but does not properly resolve symbolic links first, allowing a local attacker to bypass access controls or write files to unauthorized locations.
A critical vulnerability exists in the stigmem-node package when running the opt-in stigmem-plugin-multi-tenant plugin. Due to a failure to enforce tenant-scoping filters on database queries within the decay sweep, quarantine moderation, and right-to-be-forgotten (RTBF) subsystems, an authorized caller belonging to one tenant can access, modify, and delete facts belonging to all other tenants. This broken object level authorization (BOLA) vulnerability allows cross-tenant data manipulation and information leakage.
An origin validation error and cross-site request forgery vulnerability in @zenalexa/unicli prior to version 0.225.2 allows cross-origin web applications to execute arbitrary tools on a user's local machine via the legacy stateless HTTP transport.
EverOS versions 1.0.0 and earlier contain a path traversal vulnerability in the user memory ingestion endpoint. By exploiting this flaw, unauthenticated network attackers can escape the designated database memory root and write arbitrary Markdown files to target directories on the local system.
GHSA-X975-RGX4-5FH4 is a high-severity Cross-Site Scripting (XSS) vulnerability residing in the Model Context Protocol (MCP) User Interface (UI) component of appium-mcp, an NPM package integrating Appium with MCP clients. The flaw exists within the createLocatorGeneratorUI utility function, which renders UI metadata directly into an HTML template page without performing sanitization or encoding. Because MCP clients use window.parent.postMessage to send commands from the UI to the host, this XSS can be escalated to trigger arbitrary MCP tool calls, potentially leading to Remote Code Execution (RCE) on the host running the MCP client.
An Insecure Direct Object Reference (IDOR) and missing authorization flaw in OpenRemote Manager allows an authenticated, low-privilege multi-tenant user to execute cross-realm bulk alarm deletion, resulting in permanent destruction of safety-critical alarms belonging to other tenants.
Anki Desktop for Windows, macOS, and Linux is vulnerable to local file disclosure and data exfiltration due to an iframe-based Same-Origin Policy (SOP) bypass. Maliciously crafted user scripts inside imported deck files run within the localhost context, bypassing security filters to query internal endpoints and read arbitrary system files.