Feb 19, 2026·6 min read·4 visits
The `keccak` crate for Rust contained a critical unsoundness in its optional ARMv8 assembly optimization. Developers used post-indexing assembly instructions that modified registers (x0, x1, x8) but told the compiler these registers were immutable inputs (`in`). This lie to the compiler constitutes Undefined Behavior, potentially causing the optimizer to generate broken code that corrupts memory or miscalculates cryptographic states.
A deep-dive analysis of a technical unsoundness in the Rust `keccak` crate's ARMv8 assembly backend. By misrepresenting register constraints to the LLVM compiler, the implementation created a divergence between the hardware state and the compiler's abstract model, leading to Undefined Behavior (UB) and potential memory corruption scenarios.
Cryptography is an eternal war between mathematical correctness and raw execution speed. In the Rust ecosystem, we pride ourselves on memory safety, borrowing rules, and the borrow checker's iron fist. But when you need to hash gigabytes of data per second, safe Rust sometimes isn't enough. You reach for the unsafe keyword, and occasionally, you drop directly into inline assembly (asm!).
This is exactly what the keccak crate (part of the RustCrypto organization) did. To squeeze every ounce of performance out of ARMv8 processors (like the one in your shiny MacBook or heavy-duty AWS Graviton instance), they implemented the Keccak-f[1600] permutation using hand-tuned assembly. It's a standard move: bypass the compiler's heuristics to utilize specific hardware instructions.
However, writing inline assembly in a high-level language is a handshake agreement with the compiler. You promise to tell the compiler exactly which registers you touch, read, or clobber. If you lie—even accidentally—the compiler's optimizer, which assumes you are a rational actor, will punish you. In this case, the keccak developers made a classic mistake: they modified 'input' registers behind the compiler's back.
To understand this bug, you need to understand how LLVM (the backend for Rust) handles inline assembly constraints. When you define an asm! block, you classify operands. in("reg") tells LLVM: "I am reading from this register. I promise I will not change its value. If I do, I will restore it, or it doesn't matter because it's an input."
The vulnerability lies in src/armv8.rs. The assembly code utilized ARM64's post-indexed addressing mode. Look at this instruction pattern used in the vulnerable code:
st1.1d {v20-v23}, [x0], #32
In plain English, this instruction says: "Store the vector registers v20 through v23 into the memory address at x0, and then increment x0 by 32 bytes." That [x0], #32 syntax is the smoking gun. It is an auto-increment.
The hardware executes this. x0 changes. It physically holds a new memory address. But the Rust code wrapping this assembly defined x0 as an in constraint. The compiler, trusting the developer, assumes x0 is effectively immutable for the duration of that block (or at least, that its mutation is irrelevant to the output). This creates a split reality: the hardware sees x0 + 32, but the compiler's control flow graph believes x0 is still x0.
Let's look at the diff. It is a masterclass in how a few characters can mean the difference between 'secure' and 'undefined behavior'. The issue wasn't just x0 (state pointer); it also affected x1 (constants pointer) and x8 (loop counter).
The Vulnerable Code:
unsafe {
asm!(
// ... instructions omitted ...
"st1.1d {v20-v23}, [x0], #32", // <--- HARDWARE MODIFIES x0
"st1.1d {v24}, [x0]",
// The Lie:
in("x0") state.as_mut_ptr(),
in("x1") crate::RC[24-round_count..].as_ptr(),
in("x8") round_count,
// ...
);
}Because of in("x0"), if the compiler decides to use the value of state.as_mut_ptr() after this assembly block, it might just reload the original value it cached in a register, or assume the register still holds the start of the buffer. It has no idea the assembly code advanced the pointer.
The Fixed Code:
unsafe {
asm!(
// ... same instructions ...
"st1.1d {v20-v23}, [x0], #32",
"st1.1d {v24}, [x0]",
// The Truth:
inout("x0") state.as_mut_ptr() => _,
inout("x1") crate::RC[24-round_count..].as_ptr() => _,
inout("x8") round_count => _,
// ...
);
}The fix changes in to inout. Crucially, it adds => _. This syntax tells Rust: "I am taking this value in, I am modifying it, and the result is garbage/clobbered (_). Do not rely on the value of this register after this block executes." This forces the compiler to reload the pointer if it needs it again, rather than using a stale, corrupted register.
So how do we weaponize this? In the current version of the Rust compiler and LLVM, you might get lucky. The generated code might not reuse x0 immediately after the block, or it might reload it anyway due to register pressure. That is why this is classified as 'Unsoundness' rather than a critical RCE yet.
However, to an attacker or a researcher, this is a Time Bomb.
Imagine a scenario where the compiler optimization level is set to -O3. The optimizer sees that x0 holds the address of state. After the assembly block, the code might do something like state[0] = 0.
x0 is in. It assumes x0 is preserved.state for the subsequent write, the compiler emits instructions to write to [x0], assuming x0 still points to the start of the buffer.x0 by 32 bytes.state[0] = 0 actually writes to state[4] (assuming 64-bit words).In a cryptographic context, this is catastrophic. We aren't just crashing; we are corrupting the internal state of a hash function. If we can control the input to influence the loop count or the flow, we might be able to desynchronize the state enough to weaken the hash, leak key material (if used in a MAC), or cause a buffer overflow if x0 is incremented past the bounds of the valid memory region.
The remediation is straightforward: stop lying to the compiler. The patch applied in RustCrypto/sponges PR #101 correctly identifies the registers as inout.
If you are a user of keccak (or crates that depend on it like sha3), you need to check if you have the asm feature enabled. It is off by default, which saves the vast majority of users. If you do use it for performance on ARMv8:
keccak is at version 0.1.6 or higher.cargo tree | grep keccak to see which version you are pulling in.asm feature. The pure Rust implementation is slower but semantically correct and safe.This incident serves as a reminder: unsafe in Rust transfers the responsibility of correctness from the compiler to the human. And humans are terrible at tracking invisible hardware side-effects like post-increment registers.
Unknown| Product | Affected Versions | Fixed Version |
|---|---|---|
keccak RustCrypto | < 0.1.6 | 0.1.6 |
| Attribute | Detail |
|---|---|
| Vulnerability Type | Undefined Behavior / Unsoundness |
| Language | Rust / AArch64 Assembly |
| Root Cause | Incorrect Inline Assembly Register Constraints |
| Affected Component | keccak crate (armv8.rs) |
| Impact | Potential Memory Corruption / Logic Errors |
| Exploit Status | Theoretical / Compiler-Dependent |
The software relies on the correctness of register constraints in inline assembly. Incorrect constraints lead to a mismatch between compiler assumptions and actual hardware state, resulting in undefined behavior.