Skip to content

Commit 032e695

Browse files
dudeperf3ctstrickvlhtahir1safoinme
authored
vLLM model deployer (#3032)
* First pass vllm model deployer * Fix the first pass implementation * Add first working version * Remove debugging log statements * Pin vllm integration packages * Support for additional arguments from vllm engine * Remove vllm deployer step code * Add vllm documentation * Update vllm documentation * Update readme * Update service implementation and logo url * Use correct endpoint url * Update vllm endpoint config * Pin upper bound for vllm package * Rename vllm endpoint * Update toc.md with vllm docs entry * Update model-deployers.md with vllm entry * Update model-deployers.md with databricks entry --------- Co-authored-by: Alex Strick van Linschoten <[email protected]> Co-authored-by: Hamza Tahir <[email protected]> Co-authored-by: Safoine El Khabich <[email protected]>
1 parent 0287a16 commit 032e695

File tree

12 files changed

+742
-2
lines changed

12 files changed

+742
-2
lines changed

docs/book/component-guide/model-deployers/model-deployers.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@ integrations:
4444
| [BentoML](bentoml.md) | `bentoml` | `bentoml` | Build and Deploy ML models locally or for production grade (Cloud, K8s) |
4545
| [Seldon Core](seldon.md) | `seldon` | `seldon Core` | Built on top of Kubernetes to deploy models for production grade environment |
4646
| [Hugging Face](huggingface.md) | `huggingface` | `huggingface` | Deploys ML model on Hugging Face Inference Endpoints |
47+
| [Databricks](databricks.md) | `databricks` | `databricks` | Deploying models to Databricks Inference Endpoints with Databricks |
48+
| [vLLM](vllm.md) | `vllm` | `vllm` | Deploys LLM using vLLM locally |
4749
| [Custom Implementation](custom.md) | _custom_ | | Extend the Artifact Store abstraction and provide your own implementation |
4850

4951
{% hint style="info" %}
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
---
2+
description: Deploying your LLM locally with vLLM.
3+
---
4+
5+
# vLLM
6+
7+
[vLLM](https://docs.vllm.ai/en/latest/) is a fast and easy-to-use library for LLM inference and serving.
8+
9+
## When to use it?
10+
11+
You should use vLLM Model Deployer:
12+
13+
* Deploying Large Language models with state-of-the-art serving throughput creating an OpenAI-compatible API server
14+
* Continuous batching of incoming requests
15+
* Quantization: GPTQ, AWQ, INT4, INT8, and FP8
16+
* Features such as PagedAttention, Speculative decoding, Chunked pre-fill
17+
18+
## How do you deploy it?
19+
20+
The vLLM Model Deployer flavor is provided by the vLLM ZenML integration, so you need to install it on your local machine to be able to deploy your models. You can do this by running the following command:
21+
22+
```bash
23+
zenml integration install vllm -y
24+
```
25+
26+
To register the vLLM model deployer with ZenML you need to run the following command:
27+
28+
```bash
29+
zenml model-deployer register vllm_deployer --flavor=vllm
30+
```
31+
32+
The ZenML integration will provision a local vLLM deployment server as a daemon process that will continue to run in the background to serve the latest vLLM model.
33+
34+
## How do you use it?
35+
36+
If you'd like to see this in action, check out this example of a [deployment pipeline](https://github.com/zenml-io/zenml-projects/blob/79f67ea52c3908b9b33c9a41eef18cb7d72362e8/llm-vllm-deployer/pipelines/deploy_pipeline.py#L25).
37+
38+
### Deploy an LLM
39+
40+
The [vllm_model_deployer_step](https://github.com/zenml-io/zenml-projects/blob/79f67ea52c3908b9b33c9a41eef18cb7d72362e8/llm-vllm-deployer/steps/vllm_deployer.py#L32) exposes a `VLLMDeploymentService` that you can use in your pipeline. Here is an example snippet:
41+
42+
```python
43+
44+
from zenml import pipeline
45+
from typing import Annotated
46+
from steps.vllm_deployer import vllm_model_deployer_step
47+
from zenml.integrations.vllm.services.vllm_deployment import VLLMDeploymentService
48+
49+
50+
@pipeline()
51+
def deploy_vllm_pipeline(
52+
model: str,
53+
timeout: int = 1200,
54+
) -> Annotated[VLLMDeploymentService, "GPT2"]:
55+
service = vllm_model_deployer_step(
56+
model=model,
57+
timeout=timeout,
58+
)
59+
return service
60+
```
61+
62+
Here is an [example](https://github.com/zenml-io/zenml-projects/tree/79f67ea52c3908b9b33c9a41eef18cb7d72362e8/llm-vllm-deployer) of running a GPT-2 model using vLLM.
63+
64+
#### Configuration
65+
66+
Within the `VLLMDeploymentService` you can configure:
67+
68+
* `model`: Name or path of the Hugging Face model to use.
69+
* `tokenizer`: Name or path of the Hugging Face tokenizer to use. If unspecified, model name or path will be used.
70+
* `served_model_name`: The model name(s) used in the API. If not specified, the model name will be the same as the `model` argument.
71+
* `trust_remote_code`: Trust remote code from Hugging Face.
72+
* `tokenizer_mode`: The tokenizer mode. Allowed choices: ['auto', 'slow', 'mistral']
73+
* `dtype`: Data type for model weights and activations. Allowed choices: ['auto', 'half', 'float16', 'bfloat16', 'float', 'float32']
74+
* `revision`: The specific model version to use. It can be a branch name, a tag name, or a commit id. If unspecified, will use the default version.

docs/book/toc.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,7 @@
265265
* [BentoML](component-guide/model-deployers/bentoml.md)
266266
* [Hugging Face](component-guide/model-deployers/huggingface.md)
267267
* [Databricks](component-guide/model-deployers/databricks.md)
268+
* [vLLM](component-guide/model-deployers/vllm.md)
268269
* [Develop a Custom Model Deployer](component-guide/model-deployers/custom.md)
269270
* [Step Operators](component-guide/step-operators/step-operators.md)
270271
* [Amazon SageMaker](component-guide/step-operators/sagemaker.md)
@@ -313,4 +314,4 @@
313314
* [SDK & CLI reference](https://sdkdocs.zenml.io/)
314315
* [How do I...?](reference/how-do-i.md)
315316
* [Community & content](reference/community-and-content.md)
316-
* [FAQ](reference/faq.md)
317+
* [FAQ](reference/faq.md)

src/zenml/integrations/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,14 +45,15 @@
4545
from zenml.integrations.label_studio import LabelStudioIntegration # noqa
4646
from zenml.integrations.langchain import LangchainIntegration # noqa
4747
from zenml.integrations.lightgbm import LightGBMIntegration # noqa
48+
4849
# from zenml.integrations.llama_index import LlamaIndexIntegration # noqa
4950
from zenml.integrations.mlflow import MlflowIntegration # noqa
5051
from zenml.integrations.neptune import NeptuneIntegration # noqa
5152
from zenml.integrations.neural_prophet import NeuralProphetIntegration # noqa
5253
from zenml.integrations.numpy import NumpyIntegration # noqa
5354
from zenml.integrations.openai import OpenAIIntegration # noqa
5455
from zenml.integrations.pandas import PandasIntegration # noqa
55-
from zenml.integrations.pigeon import PigeonIntegration # noqa
56+
from zenml.integrations.pigeon import PigeonIntegration # noqa
5657
from zenml.integrations.pillow import PillowIntegration # noqa
5758
from zenml.integrations.polars import PolarsIntegration # noqa
5859
from zenml.integrations.prodigy import ProdigyIntegration # noqa
@@ -78,3 +79,4 @@
7879
from zenml.integrations.wandb import WandbIntegration # noqa
7980
from zenml.integrations.whylogs import WhylogsIntegration # noqa
8081
from zenml.integrations.xgboost import XgboostIntegration # noqa
82+
from zenml.integrations.vllm import VLLMIntegration # noqa

src/zenml/integrations/constants.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,4 +76,5 @@
7676
VERTEX = "vertex"
7777
XGBOOST = "xgboost"
7878
VAULT = "vault"
79+
VLLM = "vllm"
7980
LIGHTNING = "lightning"
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at:
6+
#
7+
# https://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
12+
# or implied. See the License for the specific language governing
13+
# permissions and limitations under the License.
14+
"""Initialization for the ZenML vLLM integration."""
15+
from typing import List, Type
16+
from zenml.integrations.integration import Integration
17+
from zenml.stack import Flavor
18+
from zenml.logger import get_logger
19+
from zenml.integrations.constants import VLLM
20+
21+
VLLM_MODEL_DEPLOYER = "vllm"
22+
23+
logger = get_logger(__name__)
24+
25+
26+
class VLLMIntegration(Integration):
27+
"""Definition of vLLM integration for ZenML."""
28+
29+
NAME = VLLM
30+
31+
REQUIREMENTS = ["vllm>=0.6.0,<0.7.0", "openai>=1.0.0"]
32+
33+
@classmethod
34+
def activate(cls) -> None:
35+
"""Activates the integration."""
36+
from zenml.integrations.vllm import services
37+
38+
@classmethod
39+
def flavors(cls) -> List[Type[Flavor]]:
40+
"""Declare the stack component flavors for the vLLM integration.
41+
42+
Returns:
43+
List of stack component flavors for this integration.
44+
"""
45+
from zenml.integrations.vllm.flavors import VLLMModelDeployerFlavor
46+
47+
return [VLLMModelDeployerFlavor]
48+
49+
50+
VLLMIntegration.check_installation()
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at:
6+
#
7+
# https://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
12+
# or implied. See the License for the specific language governing
13+
# permissions and limitations under the License.
14+
"""vLLM integration flavors."""
15+
16+
from zenml.integrations.vllm.flavors.vllm_model_deployer_flavor import ( # noqa
17+
VLLMModelDeployerConfig,
18+
VLLMModelDeployerFlavor,
19+
)
20+
21+
__all__ = ["VLLMModelDeployerConfig", "VLLMModelDeployerFlavor"]
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at:
6+
#
7+
# https://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
12+
# or implied. See the License for the specific language governing
13+
# permissions and limitations under the License.
14+
"""vLLM model deployer flavor."""
15+
16+
from typing import TYPE_CHECKING, Optional, Type
17+
18+
from zenml.integrations.vllm import VLLM_MODEL_DEPLOYER
19+
from zenml.model_deployers.base_model_deployer import (
20+
BaseModelDeployerConfig,
21+
BaseModelDeployerFlavor,
22+
)
23+
24+
if TYPE_CHECKING:
25+
from zenml.integrations.vllm.model_deployers import VLLMModelDeployer
26+
27+
28+
class VLLMModelDeployerConfig(BaseModelDeployerConfig):
29+
"""Configuration for vLLM Inference model deployer."""
30+
31+
service_path: str = ""
32+
33+
34+
class VLLMModelDeployerFlavor(BaseModelDeployerFlavor):
35+
"""vLLM model deployer flavor."""
36+
37+
@property
38+
def name(self) -> str:
39+
"""Name of the flavor.
40+
41+
Returns:
42+
The name of the flavor.
43+
"""
44+
return VLLM_MODEL_DEPLOYER
45+
46+
@property
47+
def docs_url(self) -> Optional[str]:
48+
"""A url to point at docs explaining this flavor.
49+
50+
Returns:
51+
A flavor docs url.
52+
"""
53+
return self.generate_default_docs_url()
54+
55+
@property
56+
def sdk_docs_url(self) -> Optional[str]:
57+
"""A url to point at SDK docs explaining this flavor.
58+
59+
Returns:
60+
A flavor SDK docs url.
61+
"""
62+
return self.generate_default_sdk_docs_url()
63+
64+
@property
65+
def logo_url(self) -> str:
66+
"""A url to represent the flavor in the dashboard.
67+
68+
Returns:
69+
The flavor logo.
70+
"""
71+
return "https://public-flavor-logos.s3.eu-central-1.amazonaws.com/model_deployer/vllm.png"
72+
73+
@property
74+
def config_class(self) -> Type[VLLMModelDeployerConfig]:
75+
"""Returns `VLLMModelDeployerConfig` config class.
76+
77+
Returns:
78+
The config class.
79+
"""
80+
return VLLMModelDeployerConfig
81+
82+
@property
83+
def implementation_class(self) -> Type["VLLMModelDeployer"]:
84+
"""Implementation class for this flavor.
85+
86+
Returns:
87+
The implementation class.
88+
"""
89+
from zenml.integrations.vllm.model_deployers import VLLMModelDeployer
90+
91+
return VLLMModelDeployer
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at:
6+
#
7+
# https://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
12+
# or implied. See the License for the specific language governing
13+
# permissions and limitations under the License.
14+
"""Initialization of the vLLM model deployers."""
15+
from zenml.integrations.vllm.model_deployers.vllm_model_deployer import ( # noqa
16+
VLLMModelDeployer,
17+
)
18+
19+
__all__ = ["VLLMModelDeployer"]

0 commit comments

Comments
 (0)