The vulnerability originates in the UTF8Util::NextCharLength(const char* pstr) function. This function determines the expected length of a UTF-8 character, which ranges from one to six bytes, by examining the bit pattern of the first byte. The implementation assumes that the input buffer contains a well-formed UTF-8 string and does not accept a bounds parameter to verify the actual remaining buffer length.

In the MaxMatchSegmentation component, the segmentation logic tracks remaining buffer bytes using an unsigned integer variable named length. When NextCharLength() returns a size greater than the actual remaining bytes, the code executes the subtraction length -= matchedLength;. This operation causes an integer underflow, wrapping the length variable to an extremely large value such as SIZE_MAX.

The Conversion component exhibits a different failure mode driven by the same underlying parsing logic. The Convert function iterates over the input string using a loop that increments a pointer by the calculated character length. When encountering a truncated character immediately preceding a null terminator, the pointer advances past the null terminator, entirely missing the loop's exit condition.

// Vulnerable implementation pattern for (const char* pstr = phrase; *pstr != '\0';) { size_t matchedLength = UTF8Util::NextCharLength(pstr); // Process character... pstr += matchedLength; // Vulnerable: increments past '\0' }

// Patched implementation const char* phraseEnd = phrase + strlen(phrase); for (const char* pstr = phrase; pstr < phraseEnd;) { size_t remainingLength = phraseEnd - pstr; size_t matchedLength = UTF8Util::NextCharLength(pstr); if (matchedLength > remainingLength) { matchedLength = remainingLength; // Bounds clamping applied } // Process character... pstr += matchedLength; }

Product

Affected Versions

Fixed Version

OpenCC

BYVoid

<= 1.1.9

1.2.0

Attribute

Detail

CWE ID

CWE-125

Attack Vector

Network

CVSS Score

7.5

Impact

Denial of Service / Information Disclosure

Exploit Status

Proof of Concept available

KEV Status

Not Listed

GHSA-7FQQ-Q52P-2JJG

GHSA-7FQQ-Q52P-2JJG: Out-of-Bounds Read in OpenCC via Truncated UTF-8 Sequences

Alon Barad

Software Engineer

Mar 29, 2026·7 min read·18 visits

Executive Summary (TL;DR)

OpenCC versions <= 1.1.9 fail to validate the bounds of truncated UTF-8 strings, resulting in heap out-of-bounds reads that cause DoS or information disclosure. The issue is patched in version 1.2.0 via strict length clamping.

The OpenCC (Open Chinese Convert) library prior to version 1.2.0 contains two independent heap-based out-of-bounds read vulnerabilities. These flaws reside in the UTF-8 processing logic and occur when handling malformed or truncated multi-byte character sequences. Exploitation results in denial-of-service conditions or the disclosure of adjacent heap memory.

Attack Flow Diagram

Vulnerability Overview

The OpenCC (Open Chinese Convert) library contains two independent heap-based out-of-bounds read vulnerabilities. These flaws reside in the library's UTF-8 parsing utilities and manifest when the software processes malformed or truncated multi-byte character sequences. The vulnerability is tracked as GHSA-7FQQ-Q52P-2JJG and affects all versions up to and including 1.1.9.

The root cause of both vulnerabilities stems from insufficient bounds checking during UTF-8 sequence decoding. The library relies on a utility function to determine character length based entirely on the leading byte of a sequence. This function does not verify if the required subsequent bytes exist within the allocated buffer boundaries.

When an attacker supplies a truncated UTF-8 sequence, the parsing logic calculates a sequence length that exceeds the actual remaining buffer size. This incorrect length calculation propagates to downstream components, specifically the segmentation and conversion modules. The resulting memory corruption leads to either a denial-of-service condition or the disclosure of adjacent heap memory.

Root Cause Analysis

Code Analysis

The vulnerable code in the Conversion component uses a standard for loop to iterate through the input string. The pointer pstr is incremented by the value returned from UTF8Util::NextCharLength, without verifying if this increment pushes the pointer beyond the string's null terminator.

// Vulnerable implementation pattern
for (const char* pstr = phrase; *pstr != '\0';) {
    size_t matchedLength = UTF8Util::NextCharLength(pstr);
    // Process character...
    pstr += matchedLength; // Vulnerable: increments past '\0'
}

The patch implemented in OpenCC version 1.2.0 introduces explicit boundary tracking. The fix calculates the end of the input string, phraseEnd, prior to entering the processing loops. During each iteration, the code calculates the exact number of bytes remaining in the buffer.

// Patched implementation
const char* phraseEnd = phrase + strlen(phrase);
for (const char* pstr = phrase; pstr < phraseEnd;) {
    size_t remainingLength = phraseEnd - pstr;
    size_t matchedLength = UTF8Util::NextCharLength(pstr);
    if (matchedLength > remainingLength) {
        matchedLength = remainingLength; // Bounds clamping applied
    }
    // Process character...
    pstr += matchedLength;
}

This clamping mechanism guarantees that the matchedLength never exceeds the valid boundaries of the allocated buffer. Identical defensive bounds checking was added to the dictionary matching routines to prevent the integer underflow vulnerabilities in the segmentation logic.

Exploitation Mechanism

Exploitation of these vulnerabilities requires the ability to supply untrusted input strings to the OpenCC processing pipeline. An attacker constructs a payload ending with a malformed UTF-8 sequence, such as the bytes \xE5\xB9. The byte \xE5 indicates a three-byte UTF-8 sequence, but the payload only provides two bytes before the null terminator.

When the Conversion component processes this input, NextCharLength returns a length of three. The internal pointer, initially positioned at the \xE5 byte, increments by three bytes. This operation advances the pointer directly over the terminating null byte at index two, landing on adjacent heap memory.

Once the pointer bypasses the null terminator, the processing loop continues to read from the heap until it happens to encounter another null byte. The library processes this leaked heap data and appends it to the legitimate conversion output. The attacker receives this output, achieving information disclosure of sensitive data residing in adjacent heap allocations.

Heap Layout Analysis

The exploitability and impact of the information disclosure vulnerability depend heavily on the structure and state of the application's heap allocator. When the pointer bypasses the null terminator, the adjacent memory blocks determine the contents of the leaked data. Heap allocation patterns are dictated by the underlying operating system and the specific memory allocator in use.

In heavily utilized server applications, the heap contains a dense mixture of data structures. These structures include network request buffers, active database connection strings, and cryptographic session keys. The sequential nature of the out-of-bounds read guarantees that contiguous memory chunks following the vulnerable buffer will be disclosed sequentially until a null byte terminates the read.

Attackers manipulate the heap layout prior to exploitation to maximize the value of the disclosed data. By issuing a specific sequence of valid requests, an attacker forces the allocator to place sensitive data structures immediately adjacent to the buffer used for UTF-8 conversion. This technique transforms a random memory leak into a targeted data extraction primitive.

The mitigation introduced in OpenCC version 1.2.0 entirely eliminates this attack vector by constraining memory reads to the bounds of the original allocation. The explicit boundary calculation ensures that the pointer logic remains strictly within the intended buffer, regardless of the underlying heap topology or attacker-controlled allocation patterns.

Impact Assessment

The primary impact of the Conversion component vulnerability is the disclosure of sensitive heap memory. Because the read operation continues until an arbitrary null byte is encountered, the amount of leaked data depends entirely on the state of the heap at the time of exploitation. This memory frequently contains data from other user sessions, application configuration secrets, or internal memory pointers.

The disclosure of internal memory pointers facilitates the bypass of memory layout randomization protections, such as ASLR. This information serves as a critical primitive when chaining vulnerabilities to achieve remote code execution. The attacker effectively gains a reliable memory oracle by observing the converted string output.

The vulnerability in the MaxMatchSegmentation component results in a denial-of-service condition. The integer underflow causes the application to pass a massive length value to the dictionary matching routines. The subsequent out-of-bounds read rapidly encounters unmapped memory pages, triggering a segmentation fault and crashing the host process.

Remediation

The authoritative remediation for this vulnerability is upgrading the OpenCC library to version 1.2.0 or later. This release contains the comprehensive bounds checking and length clamping logic required to safely process malformed UTF-8 sequences. Development teams must recompile statically linked applications to ensure the patched library is fully integrated.

If immediate upgrading is not feasible, organizations must implement strict input validation at the application boundary. All user-supplied strings must be verified as well-formed UTF-8 before being passed to any OpenCC API. Modern programming languages provide standard library functions to efficiently validate UTF-8 encoding prior to processing.

Security teams should also review the deployment architecture of applications utilizing OpenCC. Sandboxing the processing environment or running the conversion logic in an isolated microservice minimizes the impact of potential denial-of-service conditions. These architectural controls limit the scope of heap data accessible during an information disclosure event.

Official Patches

BYVoidOfficial Pull Request and Fix Implementation

BYVoidPatch Diff

BYVoidOfficial Release Tag 1.2.0

Technical Appendix

CVSS Score

7.5/ 10

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:H

Affected Systems

OpenCC (BYVoid/OpenCC)

Affected Versions Detail

Product	Affected Versions	Fixed Version
OpenCC BYVoid	<= 1.1.9	1.2.0

Attribute	Detail
CWE ID	CWE-125
Attack Vector	Network
CVSS Score	7.5
Impact	Denial of Service / Information Disclosure
Exploit Status	Proof of Concept available
KEV Status	Not Listed

MITRE ATT&CK Mapping

T1190Exploit Public-Facing Application

Initial Access

CWE-125

Out-of-bounds Read

The software reads data past the end, or before the beginning, of the intended buffer.

Vulnerability Timeline

Vulnerability Disclosed

2025-10-01

Patch Merged in PR #1005

2025-11-01

OpenCC Version 1.2.0 Released

2025-12-01

Security Advisory Published

2026-01-01

More Reports

•about 12 hours ago•CVE-2026-55699

6.5

CVE-2026-55699: Arbitrary Directory Deletion via Path Traversal in pnpm globalBinDir Resolver

CVE-2026-55699 (also identified as GHSA-4gxm-v5v7-fqc4) is a critical path traversal and arbitrary directory deletion vulnerability in the pnpm package manager. The issue exists because the manifest validation process fails to prevent relative path segments within the package 'bin' keys. When a malicious package containing structured path traversal markers is globally installed and later manipulated, pnpm resolves the target paths through path.join() and passes the resolved paths to a recursive deletion function, resulting in arbitrary directory removal.

Amit Schendel

7 views•6 min read

•about 16 hours ago•CVE-2026-55700

7.1

CVE-2026-55700: Path Traversal and Arbitrary File Write in pnpm stage download

A path traversal vulnerability in pnpm stage download allows malicious registries or compromised package manifests to overwrite arbitrary files on the victim's filesystem via unvalidated package name and version fields.

Alon Barad

8 views•4 min read

•about 18 hours ago•GHSA-WW5P-J6CJ-6MQQ

5.5

GHSA-WW5P-J6CJ-6MQQ: Credential Exposure in Nezha Dashboard DDNS and Notification APIs

GHSA-WW5P-J6CJ-6MQQ is a technical credential exposure vulnerability in Nezha Dashboard prior to version 2.2.5. The vulnerability allows authenticated administrative users or actors possessing scoped read-only Personal Access Tokens (PATs) to exfiltrate plaintext third-party API credentials, secret keys, and webhook authorization headers due to a lack of data redaction during API object serialization.

Amit Schendel

6 views•7 min read

•about 18 hours ago•GHSA-FR4H-3CPH-29XV

7.1

GHSA-FR4H-3CPH-29XV: Path Traversal and Directory Hijacking in pnpm and pacquet Dependency Resolution

GHSA-FR4H-3CPH-29XV is a high-severity path traversal vulnerability in pnpm and its Rust-based port pacquet. The flaw manifests when using the hoisted node-linker configuration, allowing an attacker to manipulate the lockfile to resolve relative traversal sequences or target reserved subdirectories, leading to arbitrary file write or execution hijacking.

Amit Schendel

6 views•8 min read

•about 21 hours ago•GHSA-72R4-9C5J-MJ57

7.1

GHSA-72R4-9C5J-MJ57: Arbitrary File Deletion via Path Traversal in pnpm patch-remove

A path traversal vulnerability in the pnpm package manager's 'patch-remove' command allows an attacker to delete arbitrary files outside the patches directory. By manipulating configuration files like package.json, an attacker can specify a traversal path that the application deletes recursively without validating the path's containment.

Alon Barad

6 views•5 min read

•about 22 hours ago•GHSA-QRV3-253H-G69C

8.3

GHSA-QRV3-253H-G69C: Path Traversal and Arbitrary Symlink Creation via configDependencies in pnpm

A high-severity path traversal vulnerability exists in the pnpm package manager. By crafting a malicious lockfile (pnpm-lock.yaml) with path traversal characters in the configDependencies block, an attacker can create arbitrary directories and symlinks outside the project's node_modules/.pnpm-config directory. This exploitation happens automatically during pnpm installation, even when executing with scripts disabled via the --ignore-scripts flag.

Amit Schendel

6 views•7 min read