|
| 1 | +# Docker Quickstart Guide |
| 2 | + |
| 3 | +This guide will help you get started with the basics of using ServerlessLLM with Docker. Please make sure you have Docker installed on your system and have installed ServerlessLLM CLI following the [installation guide](./installation.md). |
| 4 | + |
| 5 | +## Local Test Using Docker |
| 6 | + |
| 7 | +First, let's start a local Docker-based ray cluster to test the ServerlessLLM. |
| 8 | + |
| 9 | +### Step 1: Build Docker Images |
| 10 | + |
| 11 | +Run the following commands to build the Docker images: |
| 12 | + |
| 13 | +```bash |
| 14 | +docker build . -t serverlessllm/sllm-serve |
| 15 | +docker build -f Dockerfile.worker . -t serverlessllm/sllm-serve-worker |
| 16 | +``` |
| 17 | + |
| 18 | +### Step 2: Configuration |
| 19 | + |
| 20 | +Ensure that you have a directory for storing your models and set the `MODEL_FOLDER` environment variable to this directory: |
| 21 | + |
| 22 | +```bash |
| 23 | +export MODEL_FOLDER=path/to/models |
| 24 | +``` |
| 25 | + |
| 26 | +Also, check if the Docker network `sllm` exists and create it if it doesn't: |
| 27 | + |
| 28 | +```bash |
| 29 | +if ! docker network ls | grep -q "sllm"; then |
| 30 | + echo "Docker network 'sllm' does not exist. Creating network..." |
| 31 | + docker network create sllm |
| 32 | +else |
| 33 | + echo "Docker network 'sllm' already exists." |
| 34 | +fi |
| 35 | +``` |
| 36 | + |
| 37 | +### Step 3: Start the Ray Head and Worker Nodes |
| 38 | + |
| 39 | +Run the following commands to start the Ray head node and worker nodes: |
| 40 | + |
| 41 | +#### Start Ray Head Node |
| 42 | + |
| 43 | +```bash |
| 44 | +docker run -d --name ray_head \ |
| 45 | + --runtime nvidia \ |
| 46 | + --network sllm \ |
| 47 | + -p 6379:6379 \ |
| 48 | + -p 8343:8343 \ |
| 49 | + --gpus '"device=none"' \ |
| 50 | + serverlessllm/sllm-serve |
| 51 | + |
| 52 | +sleep 5 |
| 53 | +``` |
| 54 | + |
| 55 | +#### Start Ray Worker Nodes |
| 56 | + |
| 57 | +```bash |
| 58 | +docker run -d --name ray_worker_0 \ |
| 59 | + --runtime nvidia \ |
| 60 | + --network sllm \ |
| 61 | + --gpus '"device=0"' \ |
| 62 | + --env WORKER_ID=0 \ |
| 63 | + --mount type=bind,source=$MODEL_FOLDER,target=/models \ |
| 64 | + serverlessllm/sllm-serve-worker |
| 65 | + |
| 66 | +docker run -d --name ray_worker_1 \ |
| 67 | + --runtime nvidia \ |
| 68 | + --network sllm \ |
| 69 | + --gpus '"device=2"' \ |
| 70 | + --env WORKER_ID=1 \ |
| 71 | + --mount type=bind,source=$MODEL_FOLDER,target=/models \ |
| 72 | + serverlessllm/sllm-serve-worker |
| 73 | +``` |
| 74 | + |
| 75 | +### Step 4: Start ServerlessLLM Serve |
| 76 | + |
| 77 | +Run the following command to start the ServerlessLLM serve: |
| 78 | + |
| 79 | +```bash |
| 80 | +docker exec ray_head sh -c "/opt/conda/bin/sllm-serve start" |
| 81 | +``` |
| 82 | + |
| 83 | +### Step 5: Deploy a Model Using sllm-cli |
| 84 | + |
| 85 | +Open a new terminal, activate the `sllm` environment, and set the `LLM_SERVER_URL` environment variable: |
| 86 | + |
| 87 | +```bash |
| 88 | +conda activate sllm |
| 89 | +export LLM_SERVER_URL=http://localhost:8343/ |
| 90 | +``` |
| 91 | + |
| 92 | +Deploy a model to the ServerlessLLM server using the `sllm-cli`: |
| 93 | + |
| 94 | +```bash |
| 95 | +sllm-cli deploy --model "facebook/opt-2.7b" |
| 96 | +``` |
| 97 | +> Note: This command will spend some time downloading the model from the Hugging Face Model Hub. |
| 98 | +> You can use any model from the [Hugging Face Model Hub](https://huggingface.co/models) by specifying the model name in the `--model` argument. |
| 99 | +
|
| 100 | +Expected output: |
| 101 | + |
| 102 | +```plaintext |
| 103 | +INFO xx-xx xx:xx:xx deploy.py:36] Deploying model facebook/opt-1.3b with default configuration. |
| 104 | +INFO xx-xx xx:xx:xx deploy.py:49] Model registered successfully. |
| 105 | +``` |
| 106 | + |
| 107 | +### Step 6: Query the Model Using sllm-cli |
| 108 | + |
| 109 | +Now, you can query the model by any OpenAI API client. For example, you can use the following Python code to query the model: |
| 110 | +```bash |
| 111 | +curl http://localhost:8343/v1/chat/completions \ |
| 112 | +-H "Content-Type: application/json" \ |
| 113 | +-d '{ |
| 114 | + "model": "facebook/opt-2.7b", |
| 115 | + "messages": [ |
| 116 | + {"role": "system", "content": "You are a helpful assistant."}, |
| 117 | + {"role": "user", "content": "What is your name?"} |
| 118 | + ] |
| 119 | + }' |
| 120 | +``` |
| 121 | + |
| 122 | +Expected output: |
| 123 | + |
| 124 | +```plaintext |
| 125 | +{"id":"chatcmpl-8b4773e9-a98b-41db-8163-018ed3dc65e2","object":"chat.completion","created":1720183759,"model":"facebook/opt-2.7b","choices":[{"index":0,"message":{"role":"assistant","content":"system: You are a helpful assistant.\nuser: What is your name?\nsystem: I am a helpful assistant.\n"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":26,"total_tokens":42}}% |
| 126 | +``` |
| 127 | + |
| 128 | +### Cleanup |
| 129 | + |
| 130 | +If you need to stop and remove the containers, you can use the following commands: |
| 131 | + |
| 132 | +```bash |
| 133 | +docker exec ray_head sh -c "ray stop" |
| 134 | +docker exec ray_worker_0 sh -c "ray stop" |
| 135 | +docker exec ray_worker_1 sh -c "ray stop" |
| 136 | + |
| 137 | +docker stop ray_head ray_worker_0 ray_worker_1 |
| 138 | +docker rm ray_head ray_worker_0 ray_worker_1 |
| 139 | +docker network rm sllm |
| 140 | +``` |
0 commit comments