Skip to content

Commit 0f6e3cd

Browse files
committed
Document Sync by Tina
1 parent 52c2382 commit 0f6e3cd

File tree

3 files changed

+189
-6
lines changed

3 files changed

+189
-6
lines changed

docs/stable/cli/sllm_cli_doc.md

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
## ServerlessLLM CLI Documentation
2+
3+
### Overview
4+
`sllm-cli` is a command-line interface (CLI) tool designed for managing and interacting with ServerlessLLM models. This document provides an overview of the available commands and their usage.
5+
6+
### Getting Started
7+
8+
Before using the `sllm-cli` commands, you need to start the ServerlessLLM cluster. Follow the guides below to set up your cluster:
9+
10+
- [Installation Guide](../getting_started/installation.md)
11+
- [Docker Quickstart Guide](../getting_started/docker_quickstart.md)
12+
- [Quickstart Guide](../getting_started/quickstart.md)
13+
14+
After setting up the ServerlessLLM cluster, you can use the commands listed below to manage and interact with your models.
15+
16+
### Example Workflow
17+
18+
1. **Deploy a Model**
19+
> Deploy a model using the model name, which must be a huggingface pretrained model name. i.e. "facebook/opt-1.3b" instead of "opt-1.3b".
20+
```bash
21+
sllm-cli deploy --model facebook/opt-1.3b
22+
```
23+
24+
2. **Generate Output**
25+
```bash
26+
echo '{
27+
"model": "facebook/opt-1.3b",
28+
"messages": [
29+
{
30+
"role": "user",
31+
"content": "Please introduce yourself."
32+
}
33+
],
34+
"temperature": 0.7,
35+
"max_tokens": 50
36+
}' > input.json
37+
sllm-cli generate input.json
38+
```
39+
40+
3. **Delete a Model**
41+
```bash
42+
sllm-cli delete facebook/opt-1.3b
43+
```
44+
45+
### sllm-cli deploy
46+
Deploy a model using a configuration file or model name.
47+
48+
##### Usage
49+
```bash
50+
sllm-cli deploy [OPTIONS]
51+
```
52+
53+
##### Options
54+
- `--model <model_name>`
55+
- Model name to deploy with default configuration. The model name must be a huggingface pretrained model name. You can find the list of available models [here](https://huggingface.co/models).
56+
57+
- `--config <config_path>`
58+
- Path to the JSON configuration file.
59+
60+
##### Example
61+
```bash
62+
sllm-cli deploy --model facebook/opt-1.3b
63+
sllm-cli deploy --config /path/to/config.json
64+
```
65+
66+
##### Example Configuration File (`config.json`)
67+
```json
68+
{
69+
"model": "facebook/opt-1.3b",
70+
"backend": "transformers",
71+
"num_gpus": 1,
72+
"auto_scaling_config": {
73+
"metric": "concurrency",
74+
"target": 1,
75+
"min_instances": 0,
76+
"max_instances": 10
77+
},
78+
"backend_config": {
79+
"pretrained_model_name_or_path": "facebook/opt-1.3b",
80+
"device_map": "auto",
81+
"torch_dtype": "float16"
82+
}
83+
}
84+
```
85+
86+
### sllm-cli delete
87+
Delete deployed models by name.
88+
89+
##### Usage
90+
```bash
91+
sllm-cli delete [MODELS]
92+
```
93+
94+
##### Arguments
95+
- `MODELS`
96+
- Space-separated list of model names to delete.
97+
98+
##### Example
99+
```bash
100+
sllm-cli delete facebook/opt-1.3b facebook/opt-2.7b meta/llama2
101+
```
102+
103+
### sllm-cli generate
104+
Generate outputs using the deployed model.
105+
106+
##### Usage
107+
```bash
108+
sllm-cli generate [OPTIONS] <input_path>
109+
```
110+
111+
##### Options
112+
- `-t`, `--threads <num_threads>`
113+
- Number of parallel generation processes. Default is 1.
114+
115+
##### Arguments
116+
- `input_path`
117+
- Path to the JSON input file.
118+
119+
##### Example
120+
```bash
121+
sllm-cli generate --threads 4 /path/to/request.json
122+
```
123+
124+
##### Example Request File (`request.json`)
125+
```json
126+
{
127+
"model": "facebook/opt-1.3b",
128+
"messages": [
129+
{
130+
"role": "user",
131+
"content": "Please introduce yourself."
132+
}
133+
],
134+
"temperature": 0.3,
135+
"max_tokens": 50
136+
}
137+
```
138+
139+
### sllm-cli replay
140+
Replay requests based on workload and dataset.
141+
142+
##### Usage
143+
```bash
144+
sllm-cli replay [OPTIONS]
145+
```
146+
147+
##### Options
148+
- `--workload <workload_path>`
149+
- Path to the JSON workload file.
150+
151+
- `--dataset <dataset_path>`
152+
- Path to the JSON dataset file.
153+
154+
- `--output <output_path>`
155+
- Path to the output JSON file for latency results. Default is `latency_results.json`.
156+
157+
##### Example
158+
```bash
159+
sllm-cli replay --workload /path/to/workload.json --dataset /path/to/dataset.json --output /path/to/output.json
160+
```
161+
162+
#### sllm-cli update
163+
Update a deployed model using a configuration file or model name.
164+
165+
##### Usage
166+
```bash
167+
sllm-cli update [OPTIONS]
168+
```
169+
170+
##### Options
171+
- `--model <model_name>`
172+
- Model name to update with default configuration.
173+
174+
- `--config <config_path>`
175+
- Path to the JSON configuration file.
176+
177+
##### Example
178+
```bash
179+
sllm-cli update --model facebook/opt-1.3b
180+
sllm-cli update --config /path/to/config.json
181+
```

docs/stable/getting_started/docker_quickstart.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ export LLM_SERVER_URL=http://localhost:8343/
101101
Deploy a model to the ServerlessLLM server using the `sllm-cli`:
102102

103103
```bash
104-
sllm-cli deploy --model facebook/opt-2.7b
104+
sllm-cli deploy --model facebook/opt-1.3b
105105
```
106106
> Note: This command will spend some time downloading the model from the Hugging Face Model Hub.
107107
> You can use any model from the [Hugging Face Model Hub](https://huggingface.co/models) by specifying the model name in the `--model` argument.
@@ -120,7 +120,7 @@ Now, you can query the model by any OpenAI API client. For example, you can use
120120
curl http://localhost:8343/v1/chat/completions \
121121
-H "Content-Type: application/json" \
122122
-d '{
123-
"model": "facebook/opt-2.7b",
123+
"model": "facebook/opt-1.3b",
124124
"messages": [
125125
{"role": "system", "content": "You are a helpful assistant."},
126126
{"role": "user", "content": "What is your name?"}
@@ -131,7 +131,7 @@ curl http://localhost:8343/v1/chat/completions \
131131
Expected output:
132132

133133
```plaintext
134-
{"id":"chatcmpl-8b4773e9-a98b-41db-8163-018ed3dc65e2","object":"chat.completion","created":1720183759,"model":"facebook/opt-2.7b","choices":[{"index":0,"message":{"role":"assistant","content":"system: You are a helpful assistant.\nuser: What is your name?\nsystem: I am a helpful assistant.\n"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":26,"total_tokens":42}}%
134+
{"id":"chatcmpl-8b4773e9-a98b-41db-8163-018ed3dc65e2","object":"chat.completion","created":1720183759,"model":"facebook/opt-1.3b","choices":[{"index":0,"message":{"role":"assistant","content":"system: You are a helpful assistant.\nuser: What is your name?\nsystem: I am a helpful assistant.\n"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":26,"total_tokens":42}}%
135135
```
136136

137137
### Deleting a Model

docs/stable/getting_started/quickstart.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,14 +25,14 @@ ray start --address=localhost:6379 --num-cpus=4 --num-gpus=2 \
2525

2626
Now, let’s start ServerlessLLM.
2727

28-
First, start ServerlessLLM Serve (i.e., `sllm-serve`)
28+
First, in another new terminal, start ServerlessLLM Serve (i.e., `sllm-serve`)
2929

3030
```bash
3131
conda activate sllm
3232
sllm-serve start
3333
```
3434

35-
Next start ServerlessLLM Store server. This server will use `./models` as the storage path by default.
35+
Next, in another new terminal, start ServerlessLLM Store server. This server will use `./models` as the storage path by default.
3636

3737
```bash
3838
conda activate sllm
@@ -41,7 +41,9 @@ sllm-store-server
4141

4242
Everything is set!
4343

44-
Next, let's deploy a model to the ServerlessLLM server. You can deploy a model by running the following command:
44+
Now you have opened 4 terminals: started a local ray cluster(head node and worker node), started the ServerlessLLM Serve, and started the ServerlessLLM Store server.
45+
46+
Next, open another new terminal, let's deploy a model to the ServerlessLLM server. You can deploy a model by running the following command:
4547

4648
```bash
4749
conda activate sllm

0 commit comments

Comments
 (0)