|
| 1 | +## ServerlessLLM CLI Documentation |
| 2 | + |
| 3 | +### Overview |
| 4 | +`sllm-cli` is a command-line interface (CLI) tool designed for managing and interacting with ServerlessLLM models. This document provides an overview of the available commands and their usage. |
| 5 | + |
| 6 | +### Getting Started |
| 7 | + |
| 8 | +Before using the `sllm-cli` commands, you need to start the ServerlessLLM cluster. Follow the guides below to set up your cluster: |
| 9 | + |
| 10 | +- [Installation Guide](../getting_started/installation.md) |
| 11 | +- [Docker Quickstart Guide](../getting_started/docker_quickstart.md) |
| 12 | +- [Quickstart Guide](../getting_started/quickstart.md) |
| 13 | + |
| 14 | +After setting up the ServerlessLLM cluster, you can use the commands listed below to manage and interact with your models. |
| 15 | + |
| 16 | +### Example Workflow |
| 17 | + |
| 18 | +1. **Deploy a Model** |
| 19 | + > Deploy a model using the model name, which must be a huggingface pretrained model name. i.e. "facebook/opt-1.3b" instead of "opt-1.3b". |
| 20 | + ```bash |
| 21 | + sllm-cli deploy --model facebook/opt-1.3b |
| 22 | + ``` |
| 23 | + |
| 24 | +2. **Generate Output** |
| 25 | + ```bash |
| 26 | + echo '{ |
| 27 | + "model": "facebook/opt-1.3b", |
| 28 | + "messages": [ |
| 29 | + { |
| 30 | + "role": "user", |
| 31 | + "content": "Please introduce yourself." |
| 32 | + } |
| 33 | + ], |
| 34 | + "temperature": 0.7, |
| 35 | + "max_tokens": 50 |
| 36 | + }' > input.json |
| 37 | + sllm-cli generate input.json |
| 38 | + ``` |
| 39 | + |
| 40 | +3. **Delete a Model** |
| 41 | + ```bash |
| 42 | + sllm-cli delete facebook/opt-1.3b |
| 43 | + ``` |
| 44 | + |
| 45 | +### sllm-cli deploy |
| 46 | +Deploy a model using a configuration file or model name. |
| 47 | + |
| 48 | +##### Usage |
| 49 | +```bash |
| 50 | +sllm-cli deploy [OPTIONS] |
| 51 | +``` |
| 52 | + |
| 53 | +##### Options |
| 54 | +- `--model <model_name>` |
| 55 | + - Model name to deploy with default configuration. The model name must be a huggingface pretrained model name. You can find the list of available models [here](https://huggingface.co/models). |
| 56 | + |
| 57 | +- `--config <config_path>` |
| 58 | + - Path to the JSON configuration file. |
| 59 | + |
| 60 | +##### Example |
| 61 | +```bash |
| 62 | +sllm-cli deploy --model facebook/opt-1.3b |
| 63 | +sllm-cli deploy --config /path/to/config.json |
| 64 | +``` |
| 65 | + |
| 66 | +##### Example Configuration File (`config.json`) |
| 67 | +```json |
| 68 | +{ |
| 69 | + "model": "facebook/opt-1.3b", |
| 70 | + "backend": "transformers", |
| 71 | + "num_gpus": 1, |
| 72 | + "auto_scaling_config": { |
| 73 | + "metric": "concurrency", |
| 74 | + "target": 1, |
| 75 | + "min_instances": 0, |
| 76 | + "max_instances": 10 |
| 77 | + }, |
| 78 | + "backend_config": { |
| 79 | + "pretrained_model_name_or_path": "facebook/opt-1.3b", |
| 80 | + "device_map": "auto", |
| 81 | + "torch_dtype": "float16" |
| 82 | + } |
| 83 | +} |
| 84 | +``` |
| 85 | + |
| 86 | +### sllm-cli delete |
| 87 | +Delete deployed models by name. |
| 88 | + |
| 89 | +##### Usage |
| 90 | +```bash |
| 91 | +sllm-cli delete [MODELS] |
| 92 | +``` |
| 93 | + |
| 94 | +##### Arguments |
| 95 | +- `MODELS` |
| 96 | + - Space-separated list of model names to delete. |
| 97 | + |
| 98 | +##### Example |
| 99 | +```bash |
| 100 | +sllm-cli delete facebook/opt-1.3b facebook/opt-2.7b meta/llama2 |
| 101 | +``` |
| 102 | + |
| 103 | +### sllm-cli generate |
| 104 | +Generate outputs using the deployed model. |
| 105 | + |
| 106 | +##### Usage |
| 107 | +```bash |
| 108 | +sllm-cli generate [OPTIONS] <input_path> |
| 109 | +``` |
| 110 | + |
| 111 | +##### Options |
| 112 | +- `-t`, `--threads <num_threads>` |
| 113 | + - Number of parallel generation processes. Default is 1. |
| 114 | + |
| 115 | +##### Arguments |
| 116 | +- `input_path` |
| 117 | + - Path to the JSON input file. |
| 118 | + |
| 119 | +##### Example |
| 120 | +```bash |
| 121 | +sllm-cli generate --threads 4 /path/to/request.json |
| 122 | +``` |
| 123 | + |
| 124 | +##### Example Request File (`request.json`) |
| 125 | +```json |
| 126 | +{ |
| 127 | + "model": "facebook/opt-1.3b", |
| 128 | + "messages": [ |
| 129 | + { |
| 130 | + "role": "user", |
| 131 | + "content": "Please introduce yourself." |
| 132 | + } |
| 133 | + ], |
| 134 | + "temperature": 0.3, |
| 135 | + "max_tokens": 50 |
| 136 | +} |
| 137 | +``` |
| 138 | + |
| 139 | +### sllm-cli replay |
| 140 | +Replay requests based on workload and dataset. |
| 141 | + |
| 142 | +##### Usage |
| 143 | +```bash |
| 144 | +sllm-cli replay [OPTIONS] |
| 145 | +``` |
| 146 | + |
| 147 | +##### Options |
| 148 | +- `--workload <workload_path>` |
| 149 | + - Path to the JSON workload file. |
| 150 | + |
| 151 | +- `--dataset <dataset_path>` |
| 152 | + - Path to the JSON dataset file. |
| 153 | + |
| 154 | +- `--output <output_path>` |
| 155 | + - Path to the output JSON file for latency results. Default is `latency_results.json`. |
| 156 | + |
| 157 | +##### Example |
| 158 | +```bash |
| 159 | +sllm-cli replay --workload /path/to/workload.json --dataset /path/to/dataset.json --output /path/to/output.json |
| 160 | +``` |
| 161 | + |
| 162 | +#### sllm-cli update |
| 163 | +Update a deployed model using a configuration file or model name. |
| 164 | + |
| 165 | +##### Usage |
| 166 | +```bash |
| 167 | +sllm-cli update [OPTIONS] |
| 168 | +``` |
| 169 | + |
| 170 | +##### Options |
| 171 | +- `--model <model_name>` |
| 172 | + - Model name to update with default configuration. |
| 173 | + |
| 174 | +- `--config <config_path>` |
| 175 | + - Path to the JSON configuration file. |
| 176 | + |
| 177 | +##### Example |
| 178 | +```bash |
| 179 | +sllm-cli update --model facebook/opt-1.3b |
| 180 | +sllm-cli update --config /path/to/config.json |
| 181 | +``` |
0 commit comments