Feb 20, 2026·5 min read·12 visits
GuardDog versions < 2.7.1 failed to validate compression ratios in ZIP archives. An attacker can supply a 'Zip Bomb' (e.g., a malicious Python wheel), causing the scanner to exhaust all available disk space and crash the host system.
GuardDog, DataDog's open-source CLI tool for identifying malicious PyPI and npm packages, contained a classic vulnerability: it didn't respect physics. Specifically, the tool's `safe_extract()` function was susceptible to Zip Bombs—maliciously crafted archives that explode from kilobytes to petabytes upon extraction. This vulnerability allows an attacker to crash CI/CD pipelines and developer workstations simply by asking GuardDog to scan a bad package.
GuardDog is a fantastic tool. It prowls through PyPI and npm, sniffing out typosquatting, malicious binaries, and sketchy maintainers. It’s the digital equivalent of a bomb-sniffing dog at the airport. But CVE-2026-22870 reveals a darkly ironic flaw: the bomb-sniffing dog would happily eat a bomb if it smelled like a treat.
The vulnerability lies in guarddog/utils/archives.py. The tool needs to inspect the contents of packages (which are usually just ZIP files with fancy extensions like .whl or .egg) to find malware. To do this, it extracts them.
The problem? It trusted the input. In the security world, trusting input is like accepting a drink from a stranger in a sketch bar—it usually ends with you waking up in a bathtub full of ice minus a kidney. GuardDog assumed that a 50KB file would extract to a reasonable size. It didn't account for the Zip Bomb.
To understand the flaw, we need to revisit the Zip Bomb (or the 'decompression bomb'). This isn't new tech; it's practically ancient history in internet years. The most famous example is 42.zip, a file that is 42 kilobytes on disk but expands to 4.5 petabytes of data.
How? The DEFLATE algorithm works by referencing repeated data. If I write "A" a million times, I don't need to store a million "A"s. I just store "A" and a command that says "repeat 999,999 times."
GuardDog's safe_extract() function (ironically named, as we'll see) used Python's standard zipfile library. While zipfile is robust, it is not defensive by default. It will happily do exactly what the archive tells it to do. If the archive says "write 50 gigabytes of zeros," Python grabs a shovel and starts digging a hole in your hard drive.
Let's look at the code. The vulnerability existed because the extraction loop was purely functional, lacking any resource sanity checks.
The Vulnerable Code:
# guarddog/utils/archives.py (Pre-patch)
def safe_extract(source_archive, target_directory):
with zipfile.ZipFile(source_archive, 'r') as zip_ref:
# No checks. Just vibes.
zip_ref.extractall(target_directory)That's it. extractall() is a loaded gun. It extracts every member of the archive sequentially. An attacker simply needs to construct a .whl file containing a stream of overlapping files or massive repetitive data.
The Fix (Commit c3fb07b):
The DataDog team patched this in version 2.7.1 by introducing a validation layer, _check_compression_bomb. This function pre-calculates the uncompressed size before writing a single byte to disk.
# guarddog/utils/archives.py (Patched)
MAX_FILE_COUNT = 10000
MAX_UNCOMPRESSED_SIZE = 1024 * 1024 * 1024 # 1GB
def _check_compression_bomb(zip_file):
total_size = 0
total_files = 0
for info in zip_file.infolist():
total_files += 1
total_size += info.file_size
# 1. Check total size limit
if total_size > MAX_UNCOMPRESSED_SIZE:
raise ValueError("Archive too large")
# 2. Check compression ratio
# (prevents small files expanding 1000x)
ratio = info.file_size / info.compress_size if info.compress_size > 0 else 0
if ratio > MAX_COMPRESSION_RATIO:
raise ValueError("Compression ratio too high")This is the standard mitigation: trust, but verify (and then throw an exception if the math looks suspicious).
Exploiting this is trivially easy for anyone who knows how to use a terminal. You don't need buffer overflows or heap grooming. You just need a lot of zeros.
The Attack Chain:
Craft the Bomb: Create a text file full of zeros. Compress it. Copy it. Compress the copies. Repeat until you have a dense block of entropy.
Package It: Rename your zip to malicious-package-1.0.0-py3-none-any.whl.
Delivery: Publish this package to a public repo, or simply create a requirements.txt pointing to a URL you control.
The Trigger: Wait for a developer or a CI pipeline to run:
guarddog verify malicious-package-1.0.0-py3-none-any.whlThe Boom: GuardDog begins extraction. The host machine's disk I/O spikes to 100%. The filesystem fills up. If this is a Kubernetes node or a CI runner, the pod gets evicted or the job fails with No space left on device.
> [!NOTE]
> Re-exploitation Potential: While the patch covers the disk usage, the context notes a potential remaining weakness in Symlink handling. The fix reads symlink targets into memory (zip_file.read(zip_info)). If an attacker creates a zip entry that claims to be a symlink but is actually 4GB of text, GuardDog might try to load that entire 4GB into RAM to check the path, swapping the disk exhaustion DoS for a memory exhaustion DoS.
The impact here is purely Availability (High). There is no data exfiltration or remote code execution (RCE). However, in modern DevSecOps, availability is security. If I can crash your scanners, I might be able to sneak a real exploit past them while your ops team is busy debugging the crash.
Remediation:
pip install guarddog>=2.7.1.CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H| Product | Affected Versions | Fixed Version |
|---|---|---|
GuardDog DataDog | < 2.7.1 | 2.7.1 |
| Attribute | Detail |
|---|---|
| CWE ID | CWE-409 |
| CVSS v3.1 | 7.5 (High) |
| Attack Vector | Network (via Malicious Package) |
| Impact | Denial of Service (Disk/Resource Exhaustion) |
| EPSS Score | 0.00054 (Low Probability) |
| Fix Commit | c3fb07b4838945f42497e78b7a02bcfb1e63969b |
Improper Handling of Highly Compressed Data (Data Amplification)