vLLM model deployer (#3032)

dudeperf3ct · strickvl · htahir1 · web-flow · commit 032e695c3eaf · 2024-11-04T14:32:43.000+01:00
* First pass vllm model deployer

* Fix the first pass implementation

* Add first working version

* Remove debugging log statements

* Pin vllm integration packages

* Support for additional arguments from vllm engine

* Remove vllm deployer step code

* Add vllm documentation

* Update vllm documentation

* Update readme

* Update service implementation and logo url

* Use correct endpoint url

* Update vllm endpoint config

* Pin upper bound for vllm package

* Rename vllm endpoint

* Update toc.md with vllm docs entry

* Update model-deployers.md with vllm entry

* Update model-deployers.md with databricks entry

---------

Co-authored-by: Alex Strick van Linschoten &lt;strickvl@users.noreply.github.com&gt;
Co-authored-by: Hamza Tahir &lt;hamza@zenml.io&gt;
Co-authored-by: Safoine El Khabich &lt;34200873+safoinme@users.noreply.github.com&gt;
diff --git a/docs/book/component-guide/model-deployers/model-deployers.md b/docs/book/component-guide/model-deployers/model-deployers.md
@@ -44,6 +44,8 @@ integrations:
 | [BentoML](bentoml.md)              | `bentoml` | `bentoml`     | Build and Deploy ML models locally or for production grade (Cloud, K8s)      |
 | [Seldon Core](seldon.md)           | `seldon`  | `seldon Core` | Built on top of Kubernetes to deploy models for production grade environment |
 | [Hugging Face](huggingface.md) | `huggingface` | `huggingface` | Deploys ML model on Hugging Face Inference Endpoints |
+| [Databricks](databricks.md) | `databricks` | `databricks` | Deploying models to Databricks Inference Endpoints with Databricks |
+| [vLLM](vllm.md)                | `vllm`  | `vllm`      | Deploys LLM using vLLM locally |
 | [Custom Implementation](custom.md) | _custom_  |               | Extend the Artifact Store abstraction and provide your own implementation    |
 
 {% hint style="info" %}
diff --git a/docs/book/component-guide/model-deployers/vllm.md b/docs/book/component-guide/model-deployers/vllm.md
@@ -0,0 +1,74 @@
+---
+description: Deploying your LLM locally with vLLM.
+---
+
+# vLLM
+
+[vLLM](https://docs.vllm.ai/en/latest/) is a fast and easy-to-use library for LLM inference and serving.
+
+## When to use it?
+
+You should use vLLM Model Deployer:
+
+* Deploying Large Language models with state-of-the-art serving throughput creating an OpenAI-compatible API server
+* Continuous batching of incoming requests
+* Quantization: GPTQ, AWQ, INT4, INT8, and FP8
+* Features such as PagedAttention, Speculative decoding, Chunked pre-fill
+
+## How do you deploy it?
+
+The vLLM Model Deployer flavor is provided by the vLLM ZenML integration, so you need to install it on your local machine to be able to deploy your models. You can do this by running the following command:
+
+```bash
+zenml integration install vllm -y
+```
+
+To register the vLLM model deployer with ZenML you need to run the following command:
+
+```bash
+zenml model-deployer register vllm_deployer --flavor=vllm
+```
+
+The ZenML integration will provision a local vLLM deployment server as a daemon process that will continue to run in the background to serve the latest vLLM model.
+
+## How do you use it?
+
+If you'd like to see this in action, check out this example of a [deployment pipeline](https://github.com/zenml-io/zenml-projects/blob/79f67ea52c3908b9b33c9a41eef18cb7d72362e8/llm-vllm-deployer/pipelines/deploy_pipeline.py#L25).
+
+### Deploy an LLM
+
+The [vllm_model_deployer_step](https://github.com/zenml-io/zenml-projects/blob/79f67ea52c3908b9b33c9a41eef18cb7d72362e8/llm-vllm-deployer/steps/vllm_deployer.py#L32) exposes a `VLLMDeploymentService` that you can use in your pipeline. Here is an example snippet:
+
+```python
+
+from zenml import pipeline
+from typing import Annotated
+from steps.vllm_deployer import vllm_model_deployer_step
+from zenml.integrations.vllm.services.vllm_deployment import VLLMDeploymentService
+
+
+@pipeline()
+def deploy_vllm_pipeline(
+    model: str,
+    timeout: int = 1200,
+) -> Annotated[VLLMDeploymentService, "GPT2"]:
+    service = vllm_model_deployer_step(
+        model=model,
+        timeout=timeout,
+    )
+    return service
+```
+
+Here is an [example](https://github.com/zenml-io/zenml-projects/tree/79f67ea52c3908b9b33c9a41eef18cb7d72362e8/llm-vllm-deployer) of running a GPT-2 model using vLLM.
+
+#### Configuration
+
+Within the `VLLMDeploymentService` you can configure:
+
+* `model`: Name or path of the Hugging Face model to use.
+* `tokenizer`: Name or path of the Hugging Face tokenizer to use. If unspecified, model name or path will be used.
+* `served_model_name`: The model name(s) used in the API. If not specified, the model name will be the same as the `model` argument.
+* `trust_remote_code`: Trust remote code from Hugging Face.
+* `tokenizer_mode`: The tokenizer mode. Allowed choices: ['auto', 'slow', 'mistral']
+* `dtype`: Data type for model weights and activations. Allowed choices: ['auto', 'half', 'float16', 'bfloat16', 'float', 'float32']
+* `revision`: The specific model version to use. It can be a branch name, a tag name, or a commit id. If unspecified, will use the default version.
diff --git a/docs/book/toc.md b/docs/book/toc.md
@@ -265,6 +265,7 @@
   * [BentoML](component-guide/model-deployers/bentoml.md)
   * [Hugging Face](component-guide/model-deployers/huggingface.md)
   * [Databricks](component-guide/model-deployers/databricks.md)
+  * [vLLM](component-guide/model-deployers/vllm.md)
   * [Develop a Custom Model Deployer](component-guide/model-deployers/custom.md)
 * [Step Operators](component-guide/step-operators/step-operators.md)
   * [Amazon SageMaker](component-guide/step-operators/sagemaker.md)
@@ -313,4 +314,4 @@
 * [SDK & CLI reference](https://sdkdocs.zenml.io/)
 * [How do I...?](reference/how-do-i.md)
 * [Community & content](reference/community-and-content.md)
-* [FAQ](reference/faq.md)
+* [FAQ](reference/faq.md)
diff --git a/src/zenml/integrations/__init__.py b/src/zenml/integrations/__init__.py
@@ -45,14 +45,15 @@
 from zenml.integrations.label_studio import LabelStudioIntegration  # noqa
 from zenml.integrations.langchain import LangchainIntegration  # noqa
 from zenml.integrations.lightgbm import LightGBMIntegration  # noqa
+
 # from zenml.integrations.llama_index import LlamaIndexIntegration  # noqa
 from zenml.integrations.mlflow import MlflowIntegration  # noqa
 from zenml.integrations.neptune import NeptuneIntegration  # noqa
 from zenml.integrations.neural_prophet import NeuralProphetIntegration  # noqa
 from zenml.integrations.numpy import NumpyIntegration  # noqa
 from zenml.integrations.openai import OpenAIIntegration  # noqa
 from zenml.integrations.pandas import PandasIntegration  # noqa
-from zenml.integrations.pigeon import PigeonIntegration # noqa
+from zenml.integrations.pigeon import PigeonIntegration  # noqa
 from zenml.integrations.pillow import PillowIntegration  # noqa
 from zenml.integrations.polars import PolarsIntegration  # noqa
 from zenml.integrations.prodigy import ProdigyIntegration  # noqa
@@ -78,3 +79,4 @@
 from zenml.integrations.wandb import WandbIntegration  # noqa
 from zenml.integrations.whylogs import WhylogsIntegration  # noqa
 from zenml.integrations.xgboost import XgboostIntegration  # noqa
+from zenml.integrations.vllm import VLLMIntegration  # noqa
diff --git a/src/zenml/integrations/constants.py b/src/zenml/integrations/constants.py
@@ -76,4 +76,5 @@
 VERTEX = "vertex"
 XGBOOST = "xgboost"
 VAULT = "vault"
+VLLM = "vllm"
 LIGHTNING = "lightning"
diff --git a/src/zenml/integrations/vllm/__init__.py b/src/zenml/integrations/vllm/__init__.py
@@ -0,0 +1,50 @@
+#  Copyright (c) ZenML GmbH 2024. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at:
+#
+#       https://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
+#  or implied. See the License for the specific language governing
+#  permissions and limitations under the License.
+"""Initialization for the ZenML vLLM integration."""
+from typing import List, Type
+from zenml.integrations.integration import Integration
+from zenml.stack import Flavor
+from zenml.logger import get_logger
+from zenml.integrations.constants import VLLM
+
+VLLM_MODEL_DEPLOYER = "vllm"
+
+logger = get_logger(__name__)
+
+
+class VLLMIntegration(Integration):
+    """Definition of vLLM integration for ZenML."""
+
+    NAME = VLLM
+
+    REQUIREMENTS = ["vllm>=0.6.0,<0.7.0", "openai>=1.0.0"]
+
+    @classmethod
+    def activate(cls) -> None:
+        """Activates the integration."""
+        from zenml.integrations.vllm import services
+
+    @classmethod
+    def flavors(cls) -> List[Type[Flavor]]:
+        """Declare the stack component flavors for the vLLM integration.
+
+        Returns:
+            List of stack component flavors for this integration.
+        """
+        from zenml.integrations.vllm.flavors import VLLMModelDeployerFlavor
+
+        return [VLLMModelDeployerFlavor]
+
+
+VLLMIntegration.check_installation()
diff --git a/src/zenml/integrations/vllm/flavors/__init__.py b/src/zenml/integrations/vllm/flavors/__init__.py
@@ -0,0 +1,21 @@
+#  Copyright (c) ZenML GmbH 2024. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at:
+#
+#       https://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
+#  or implied. See the License for the specific language governing
+#  permissions and limitations under the License.
+"""vLLM integration flavors."""
+
+from zenml.integrations.vllm.flavors.vllm_model_deployer_flavor import (  # noqa
+    VLLMModelDeployerConfig,
+    VLLMModelDeployerFlavor,
+)
+
+__all__ = ["VLLMModelDeployerConfig", "VLLMModelDeployerFlavor"]
diff --git a/src/zenml/integrations/vllm/flavors/vllm_model_deployer_flavor.py b/src/zenml/integrations/vllm/flavors/vllm_model_deployer_flavor.py
@@ -0,0 +1,91 @@
+#  Copyright (c) ZenML GmbH 2024. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at:
+#
+#       https://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
+#  or implied. See the License for the specific language governing
+#  permissions and limitations under the License.
+"""vLLM model deployer flavor."""
+
+from typing import TYPE_CHECKING, Optional, Type
+
+from zenml.integrations.vllm import VLLM_MODEL_DEPLOYER
+from zenml.model_deployers.base_model_deployer import (
+    BaseModelDeployerConfig,
+    BaseModelDeployerFlavor,
+)
+
+if TYPE_CHECKING:
+    from zenml.integrations.vllm.model_deployers import VLLMModelDeployer
+
+
+class VLLMModelDeployerConfig(BaseModelDeployerConfig):
+    """Configuration for vLLM Inference model deployer."""
+
+    service_path: str = ""
+
+
+class VLLMModelDeployerFlavor(BaseModelDeployerFlavor):
+    """vLLM model deployer flavor."""
+
+    @property
+    def name(self) -> str:
+        """Name of the flavor.
+
+        Returns:
+            The name of the flavor.
+        """
+        return VLLM_MODEL_DEPLOYER
+
+    @property
+    def docs_url(self) -> Optional[str]:
+        """A url to point at docs explaining this flavor.
+
+        Returns:
+            A flavor docs url.
+        """
+        return self.generate_default_docs_url()
+
+    @property
+    def sdk_docs_url(self) -> Optional[str]:
+        """A url to point at SDK docs explaining this flavor.
+
+        Returns:
+            A flavor SDK docs url.
+        """
+        return self.generate_default_sdk_docs_url()
+
+    @property
+    def logo_url(self) -> str:
+        """A url to represent the flavor in the dashboard.
+
+        Returns:
+            The flavor logo.
+        """
+        return "https://public-flavor-logos.s3.eu-central-1.amazonaws.com/model_deployer/vllm.png"
+
+    @property
+    def config_class(self) -> Type[VLLMModelDeployerConfig]:
+        """Returns `VLLMModelDeployerConfig` config class.
+
+        Returns:
+            The config class.
+        """
+        return VLLMModelDeployerConfig
+
+    @property
+    def implementation_class(self) -> Type["VLLMModelDeployer"]:
+        """Implementation class for this flavor.
+
+        Returns:
+            The implementation class.
+        """
+        from zenml.integrations.vllm.model_deployers import VLLMModelDeployer
+
+        return VLLMModelDeployer
diff --git a/src/zenml/integrations/vllm/model_deployers/__init__.py b/src/zenml/integrations/vllm/model_deployers/__init__.py
@@ -0,0 +1,19 @@
+#  Copyright (c) ZenML GmbH 2024. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at:
+#
+#       https://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
+#  or implied. See the License for the specific language governing
+#  permissions and limitations under the License.
+"""Initialization of the vLLM model deployers."""
+from zenml.integrations.vllm.model_deployers.vllm_model_deployer import (  # noqa
+    VLLMModelDeployer,
+)
+
+__all__ = ["VLLMModelDeployer"]
diff --git a/src/zenml/integrations/vllm/model_deployers/vllm_model_deployer.py b/src/zenml/integrations/vllm/model_deployers/vllm_model_deployer.py
diff --git a/src/zenml/integrations/vllm/services/__init__.py b/src/zenml/integrations/vllm/services/__init__.py
diff --git a/src/zenml/integrations/vllm/services/vllm_deployment.py b/src/zenml/integrations/vllm/services/vllm_deployment.py