Feb 18, 2026·6 min read·1 visit
Devs swapped arguments in `skb_queue_splice()`, causing the kernel to dump a socket's receive queue into a temporary stack list. When the function returns, the list head vanishes, leaving packet buffers pointing to garbage memory. High impact for system stability.
A logic error in the Linux kernel's io_uring networking command path allows a local attacker to corrupt kernel memory and cause a Denial of Service. By triggering a specific timestamp retry condition, the kernel accidentally splices the target socket's entire receive queue onto a temporary stack variable instead of merging the temporary list back into the socket. This results in data loss and potential Use-After-Free scenarios when the stack frame is destroyed.
io_uring is the Linux kernel's answer to the question: "How fast can we go if we stop checking the speed limit?" It's an asynchronous I/O interface that minimizes syscall overhead, making it the darling of high-performance networking applications and the bane of security auditors everywhere. It essentially allows userspace to queue up ring buffers of commands that the kernel executes in bulk.
Because it touches so many complex subsystems—filesystems, networking, polling—it has a massive attack surface. In this specific episode of "Kernel Oops," we are looking at io_uring/cmd_net.c, the code responsible for handling network-specific commands via the ring. This path is critical because it bridges the raw, unprivileged world of ring buffers with the sensitive, complex structures of the kernel's networking stack.
The vulnerability here isn't a complex race condition or an integer overflow. It's a classic case of developer fatigue: a simple argument transposition that turns a routine cleanup operation into a memory corruption primitive.
To understand this bug, you have to look at skb_queue_splice(). This is a standard kernel helper function used to merge two lists of Socket Buffers (SKBs). Its signature looks roughly like this:
void skb_queue_splice(const struct sk_buff_head *list, struct sk_buff_head *head)
The function takes all elements from list and appends them to the start of head. It's a move operation. The logic in io_uring encountered a situation where it needed to retry retrieving a timestamp. It had a local, temporary list of packets (SKBs) that it had already pulled, but since the operation was aborting/retrying, it needed to put those packets back onto the socket's main receive queue.
The developer intended to say: "Take my local temporary items and put them back on the socket."
Instead, they wrote code that said: "Take the entire socket receive queue and dump it onto my local temporary list."
This is the programming equivalent of trying to pour a glass of water into the ocean, but accidentally defining the physics such that the ocean tries to fit inside your glass. The socket's queue is instantly drained, and all its packets are moved to a struct sk_buff_head that lives on the kernel stack.
Let's look at the smoking gun. The vulnerability resides in how the kernel handles the retry path for timestamp retrieval. The fix is embarrassingly simple, highlighting exactly where the logic inverted.
The Vulnerable Code:
// io_uring/net.c or similar path
if (unlikely(ret)) {
// OOPS: Splicing the socket queue (sk->sk_receive_queue)
// INTO the local list (list)
skb_queue_splice(&sk->sk_receive_queue, &list);
return ret;
}The Fix (Commit c85d2cfc5e24):
// The arguments are swapped back to sanity
if (unlikely(ret)) {
// CORRECT: Splicing the local list (list)
// BACK INTO the socket queue (sk->sk_receive_queue)
skb_queue_splice(&list, &sk->sk_receive_queue);
return ret;
}When skb_queue_splice runs in the vulnerable version, it updates the prev and next pointers of the SKBs in the socket's queue to point to the list head. Crucially, list is a local variable declared on the stack of the current function. When the function returns (which it does immediately after the splice), that stack frame is destroyed. The SKBs now have pointers dangling into invalid stack memory.
Exploiting this requires a bit of finesse but is conceptually straightforward. We need to force the kernel into the timestamp retry path while the target socket has a populated receive queue.
sk->sk_receive_queue is non-empty.IORING_OP_URING_CMD that triggers the specific timestamp retrieval logic. This might involve setting specific socket options like SO_TIMESTAMP and crafting a request that causes a temporary failure/retry condition.skb_queue_splice. The socket's queue length drops to zero effectively (or the list head is re-initialized empty), and the packets move to the stack list.list goes out of scope. The SKBs that were moved now have next and prev pointers pointing to a memory address that was just released.If the attacker can trigger a new kernel function call immediately after, overwriting that stack space with controlled data, they might be able to hijack the next/prev pointers of the orphaned SKBs. When the kernel garbage collector or another networking function tries to touch those SKBs later, it follows the corrupted pointers. Best case: Kernel Oops (DoS). Worst case: Arbitrary Write or Use-After-Free leading to privilege escalation.
While this requires local access, io_uring is often available to unprivileged users. This makes it a prime candidate for local privilege escalation (LPE) chains. Even if a full LPE is difficult to stabilize due to the chaotic nature of networking structures, the Denial of Service (DoS) potential is massive.
Imagine a multi-tenant environment (like a Kubernetes node) where a malicious pod shares the kernel. By triggering this bug, the attacker can effectively "blackhole" networking for specific sockets or crash the entire node by corrupting the slab allocator or triggering a panic in the networking stack. In the context of io_uring, which is designed for high-throughput I/O, this bug turns the performance engine into a self-destruct mechanism.
The fix is already merged into stable trees. If you are running a kernel version between 6.17 and 6.17.10, you are likely vulnerable.
Immediate Actions:
io_uring via sysctl. Set kernel.io_uring_disabled = 1 (or 2 to disable for everyone) if your production workload does not strictly require it. This is a heavy hammer but effective.Since this is a logic bug in the code path, no WAF or network rule will save you. This is purely a kernel-space logic failure.
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H| Product | Affected Versions | Fixed Version |
|---|---|---|
Linux Kernel Linux | >= 6.17, < 6.17.10 | 6.17.10 |
| Attribute | Detail |
|---|---|
| Attack Vector | Local (System Call) |
| CVSS v3.1 | 7.8 (High) |
| CWE ID | CWE-682 (Incorrect Calculation / Argument Swap) |
| Impact | Memory Corruption / DoS |
| EPSS Score | 0.00026 (Low Probability) |
| Exploit Status | No Public PoC |