Skip to content

Commit 95c3fc8

Browse files
committed
update: manual sync
1 parent 713eed2 commit 95c3fc8

File tree

7 files changed

+286
-1
lines changed

7 files changed

+286
-1
lines changed
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
## ServerlessLLM CLI Documentation
2+
3+
### Overview
4+
`sllm-cli` is a command-line interface (CLI) tool designed for managing and interacting with ServerlessLLM models. This document provides an overview of the available commands and their usage.
5+
6+
### Getting Started
7+
8+
Before using the `sllm-cli` commands, you need to start the ServerlessLLM cluster. Follow the guides below to set up your cluster:
9+
10+
- [Installation Guide](../getting_started/installation.md)
11+
- [Docker Quickstart Guide](../getting_started/docker_quickstart.md)
12+
- [Quickstart Guide](../getting_started/quickstart.md)
13+
14+
After setting up the ServerlessLLM cluster, you can use the commands listed below to manage and interact with your models.
15+
16+
### sllm-cli deploy
17+
Deploy a model using a configuration file or model name.
18+
19+
##### Usage
20+
```bash
21+
sllm-cli deploy [OPTIONS]
22+
```
23+
24+
##### Options
25+
- `--model <model_name>`
26+
- Model name to deploy with default configuration.
27+
28+
- `--config <config_path>`
29+
- Path to the JSON configuration file.
30+
31+
##### Example
32+
```bash
33+
sllm-cli deploy --model facebook/opt-1.3b
34+
sllm-cli deploy --config /path/to/config.json
35+
```
36+
37+
##### Example Configuration File (`config.json`)
38+
```json
39+
{
40+
"model": "facebook/opt-1.3b",
41+
"backend": "transformers",
42+
"num_gpus": 1,
43+
"auto_scaling_config": {
44+
"metric": "concurrency",
45+
"target": 1,
46+
"min_instances": 0,
47+
"max_instances": 10
48+
},
49+
"backend_config": {
50+
"pretrained_model_name_or_path": "facebook/opt-1.3b",
51+
"device_map": "auto",
52+
"torch_dtype": "float16"
53+
}
54+
}
55+
```
56+
57+
### sllm-cli delete
58+
Delete deployed models by name.
59+
60+
##### Usage
61+
```bash
62+
sllm-cli delete [MODELS]
63+
```
64+
65+
##### Arguments
66+
- `MODELS`
67+
- Space-separated list of model names to delete.
68+
69+
##### Example
70+
```bash
71+
sllm-cli delete facebook/opt-1.3b facebook/opt-2.7b meta/llama2
72+
```
73+
74+
### sllm-cli generate
75+
Generate outputs using the deployed model.
76+
77+
##### Usage
78+
```bash
79+
sllm-cli generate [OPTIONS] <input_path>
80+
```
81+
82+
##### Options
83+
- `-t`, `--threads <num_threads>`
84+
- Number of parallel generation processes. Default is 1.
85+
86+
##### Arguments
87+
- `input_path`
88+
- Path to the JSON input file.
89+
90+
##### Example
91+
```bash
92+
sllm-cli generate --threads 4 /path/to/request.json
93+
```
94+
95+
##### Example Request File (`request.json`)
96+
```json
97+
{
98+
"model": "facebook/opt-1.3b",
99+
"messages": [
100+
{
101+
"role": "user",
102+
"content": "Please introduce yourself."
103+
}
104+
],
105+
"temperature": 0.3,
106+
"max_tokens": 50
107+
}
108+
```
109+
110+
### sllm-cli replay
111+
Replay requests based on workload and dataset.
112+
113+
##### Usage
114+
```bash
115+
sllm-cli replay [OPTIONS]
116+
```
117+
118+
##### Options
119+
- `--workload <workload_path>`
120+
- Path to the JSON workload file.
121+
122+
- `--dataset <dataset_path>`
123+
- Path to the JSON dataset file.
124+
125+
- `--output <output_path>`
126+
- Path to the output JSON file for latency results. Default is `latency_results.json`.
127+
128+
##### Example
129+
```bash
130+
sllm-cli replay --workload /path/to/workload.json --dataset /path/to/dataset.json --output /path/to/output.json
131+
```
132+
133+
#### sllm-cli update
134+
Update a deployed model using a configuration file or model name.
135+
136+
##### Usage
137+
```bash
138+
sllm-cli update [OPTIONS]
139+
```
140+
141+
##### Options
142+
- `--model <model_name>`
143+
- Model name to update with default configuration.
144+
145+
- `--config <config_path>`
146+
- Path to the JSON configuration file.
147+
148+
##### Example
149+
```bash
150+
sllm-cli update --model facebook/opt-1.3b
151+
sllm-cli update --config /path/to/config.json
152+
```
153+
154+
### Example Workflow
155+
156+
1. **Deploy a Model**
157+
```bash
158+
sllm-cli deploy --model facebook/opt-1.3b
159+
```
160+
161+
2. **Generate Output**
162+
```bash
163+
echo '{
164+
"model": "facebook/opt-1.3b",
165+
"messages": [
166+
{
167+
"role": "user",
168+
"content": "Please introduce yourself."
169+
}
170+
],
171+
"temperature": 0.7,
172+
"max_tokens": 50
173+
}' > input.json
174+
sllm-cli generate input.json
175+
```
176+
177+
3. **Delete a Model**
178+
```bash
179+
sllm-cli delete facebook/opt-1.3b
180+
```

docs/stable/getting_started/docker_quickstart.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
---
2+
sidebar_position: 2
3+
---
4+
15
# Docker Quickstart Guide
26

37
This guide will help you get started with the basics of using ServerlessLLM with Docker. Please make sure you have Docker installed on your system and have installed ServerlessLLM CLI following the [installation guide](./installation.md).

docs/stable/getting_started/installation.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
---
2+
sidebar_position: 0
3+
---
4+
15
# Installations
26

37
## Requirements

docs/stable/getting_started/quickstart.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
---
2+
sidebar_position: 1
3+
---
4+
15
# Quickstart Guide
26

37
This guide will help you get started with the basics of using ServerlessLLM. Please make sure you have installed the ServerlessLLM following the [installation guide](./installation.md).

docs/stable/store/_category_.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
"label": "ServerlessLLM Store",
33
"position": 5,
44
"link": {
5-
"type": "generated-index"
5+
"type": "generated-index",
6+
"description": "`sllm-store` is an internal library of ServerlessLLM that provides high-performance model loading from local storage into GPU memory. You can also install and use this library in your own projects."
67
}
78
}

docs/stable/store/checkpoint_store.md

Whitespace-only changes.

docs/stable/store/quickstart.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
---
2+
sidebar_position: 0
3+
---
4+
5+
# Quickstart Guide
6+
7+
ServerlessLLM Store (`sllm-store`) is a Python library that supports fast model checkpoint loading from multi-tier storage (i.e., DRAM, SSD, HDD) into GPUs.
8+
9+
ServerlessLLM Store provides a model manager and two key functions:
10+
- `save_model`: Convert a HuggingFace model into a loading-optimized format and save it to a local path.
11+
- `load_model`: Load a model into given GPUs.
12+
13+
14+
## Requirements
15+
- OS: Ubuntu 20.04
16+
- Python: 3.10
17+
- GPU: compute capability 7.0 or higher
18+
19+
## Installations
20+
21+
### Create a virtual environment
22+
```bash
23+
conda create -n sllm-store python=3.10 -y
24+
conda activate sllm-store
25+
```
26+
27+
### Install with pip
28+
```bash
29+
pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ serverless_llm_store==0.0.1.dev3
30+
```
31+
32+
## Usage Examples
33+
:::tip
34+
We highly recommend using a fast storage device (e.g., NVMe SSD) to store the model files for the best experience.
35+
For example, create a directory `models` on the NVMe SSD and link it to the local path.
36+
```bash
37+
mkdir -p /mnt/nvme/models # Replace '/mnt/nvme' with your NVMe SSD path.
38+
ln -s /mnt/nvme/models ./models
39+
```
40+
:::
41+
42+
1. Convert a model to ServerlessLLM format and save it to a local path:
43+
```python
44+
from serverless_llm_store import save_model
45+
46+
# Load a model from HuggingFace model hub.
47+
import torch
48+
from transformers import AutoModelForCausalLM
49+
model = AutoModelForCausalLM.from_pretrained('facebook/opt-1.3b', torch_dtype=torch.float16)
50+
51+
# Replace './models' with your local path.
52+
save_model(model, './models/facebook/opt-1.3b')
53+
```
54+
55+
2. Launch the checkpoint store server in a separate process:
56+
```bash
57+
# 'mem_pool_size' is the maximum size of the memory pool in GB. It should be larger than the model size.
58+
sllm-store-server --storage_path $PWD/models --mem_pool_size 32
59+
```
60+
61+
<!-- Running the server using a container:
62+
63+
```bash
64+
docker build -t checkpoint_store_server -f Dockerfile .
65+
# Make sure the models have been downloaded using examples/save_model.py script
66+
docker run -it --rm -v $PWD/models:/app/models checkpoint_store_server
67+
``` -->
68+
69+
3. Load model in your project and make inference:
70+
```python
71+
import time
72+
import torch
73+
from serverless_llm_store import load_model
74+
75+
# warm up the GPU
76+
for _ in range(torch.cuda.device_count()):
77+
torch.randn(1).cuda()
78+
79+
start = time.time()
80+
model = load_model("facebook/opt-1.3b", device_map="auto", torch_dtype=torch.float16, storage_path="./models/", fully_parallel=True)
81+
# Please note the loading time depends on the model size and the hardware bandwidth.
82+
print(f"Model loading time: {time.time() - start:.2f}s")
83+
84+
from transformers import AutoTokenizer
85+
86+
tokenizer = AutoTokenizer.from_pretrained('facebook/opt-1.3b')
87+
inputs = tokenizer('Hello, my dog is cute', return_tensors='pt').to("cuda")
88+
outputs = model.generate(**inputs)
89+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
90+
```
91+
92+
4. Clean up by "Ctrl+C" the server process.

0 commit comments

Comments
 (0)