Document Sync by Tina

Chivier · Chivier · commit 0f6e3cd789fb · 2024-07-30T04:05:33.000Z
diff --git a/docs/stable/cli/sllm_cli_doc.md b/docs/stable/cli/sllm_cli_doc.md
@@ -0,0 +1,181 @@
+## ServerlessLLM CLI Documentation
+
+### Overview
+`sllm-cli` is a command-line interface (CLI) tool designed for managing and interacting with ServerlessLLM models. This document provides an overview of the available commands and their usage.
+
+### Getting Started
+
+Before using the `sllm-cli` commands, you need to start the ServerlessLLM cluster. Follow the guides below to set up your cluster:
+
+- [Installation Guide](../getting_started/installation.md)
+- [Docker Quickstart Guide](../getting_started/docker_quickstart.md)
+- [Quickstart Guide](../getting_started/quickstart.md)
+
+After setting up the ServerlessLLM cluster, you can use the commands listed below to manage and interact with your models.
+
+### Example Workflow
+
+1. **Deploy a Model**
+    > Deploy a model using the model name, which must be a huggingface pretrained model name. i.e. "facebook/opt-1.3b" instead of "opt-1.3b".
+    ```bash
+    sllm-cli deploy --model facebook/opt-1.3b
+    ```
+
+2. **Generate Output**
+    ```bash
+    echo '{
+      "model": "facebook/opt-1.3b",
+      "messages": [
+        {
+          "role": "user",
+          "content": "Please introduce yourself."
+        }
+      ],
+      "temperature": 0.7,
+      "max_tokens": 50
+    }' > input.json
+    sllm-cli generate input.json
+    ```
+
+3. **Delete a Model**
+    ```bash
+    sllm-cli delete facebook/opt-1.3b
+    ```
+
+### sllm-cli deploy
+Deploy a model using a configuration file or model name.
+
+##### Usage
+```bash
+sllm-cli deploy [OPTIONS]
+```
+
+##### Options
+- `--model <model_name>`
+  - Model name to deploy with default configuration. The model name must be a huggingface pretrained model name. You can find the list of available models [here](https://huggingface.co/models).
+
+- `--config <config_path>`
+  - Path to the JSON configuration file.
+
+##### Example
+```bash
+sllm-cli deploy --model facebook/opt-1.3b
+sllm-cli deploy --config /path/to/config.json
+```
+
+##### Example Configuration File (`config.json`)
+```json
+{
+    "model": "facebook/opt-1.3b",
+    "backend": "transformers",
+    "num_gpus": 1,
+    "auto_scaling_config": {
+        "metric": "concurrency",
+        "target": 1,
+        "min_instances": 0,
+        "max_instances": 10
+    },
+    "backend_config": {
+        "pretrained_model_name_or_path": "facebook/opt-1.3b",
+        "device_map": "auto",
+        "torch_dtype": "float16"
+    }
+}
+```
+
+### sllm-cli delete
+Delete deployed models by name.
+
+##### Usage
+```bash
+sllm-cli delete [MODELS]
+```
+
+##### Arguments
+- `MODELS`
+  - Space-separated list of model names to delete.
+
+##### Example
+```bash
+sllm-cli delete facebook/opt-1.3b facebook/opt-2.7b meta/llama2
+```
+
+### sllm-cli generate
+Generate outputs using the deployed model.
+
+##### Usage
+```bash
+sllm-cli generate [OPTIONS] <input_path>
+```
+
+##### Options
+- `-t`, `--threads <num_threads>`
+  - Number of parallel generation processes. Default is 1.
+
+##### Arguments
+- `input_path`
+  - Path to the JSON input file.
+
+##### Example
+```bash
+sllm-cli generate --threads 4 /path/to/request.json
+```
+
+##### Example Request File (`request.json`)
+```json
+{
+    "model": "facebook/opt-1.3b",
+    "messages": [
+        {
+            "role": "user",
+            "content": "Please introduce yourself."
+        }
+    ],
+    "temperature": 0.3,
+    "max_tokens": 50
+}
+```
+
+### sllm-cli replay
+Replay requests based on workload and dataset.
+
+##### Usage
+```bash
+sllm-cli replay [OPTIONS]
+```
+
+##### Options
+- `--workload <workload_path>`
+  - Path to the JSON workload file.
+
+- `--dataset <dataset_path>`
+  - Path to the JSON dataset file.
+
+- `--output <output_path>`
+  - Path to the output JSON file for latency results. Default is `latency_results.json`.
+
+##### Example
+```bash
+sllm-cli replay --workload /path/to/workload.json --dataset /path/to/dataset.json --output /path/to/output.json
+```
+
+#### sllm-cli update
+Update a deployed model using a configuration file or model name.
+
+##### Usage
+```bash
+sllm-cli update [OPTIONS]
+```
+
+##### Options
+- `--model <model_name>`
+  - Model name to update with default configuration.
+
+- `--config <config_path>`
+  - Path to the JSON configuration file.
+
+##### Example
+```bash
+sllm-cli update --model facebook/opt-1.3b
+sllm-cli update --config /path/to/config.json
+```
diff --git a/docs/stable/getting_started/docker_quickstart.md b/docs/stable/getting_started/docker_quickstart.md
@@ -101,7 +101,7 @@ export LLM_SERVER_URL=http://localhost:8343/
 Deploy a model to the ServerlessLLM server using the `sllm-cli`:
 
 ```bash
-sllm-cli deploy --model facebook/opt-2.7b
+sllm-cli deploy --model facebook/opt-1.3b
 ```
 > Note: This command will spend some time downloading the model from the Hugging Face Model Hub.
 > You can use any model from the [Hugging Face Model Hub](https://huggingface.co/models) by specifying the model name in the `--model` argument.
@@ -120,7 +120,7 @@ Now, you can query the model by any OpenAI API client. For example, you can use
 curl http://localhost:8343/v1/chat/completions \
 -H "Content-Type: application/json" \
 -d '{
-        "model": "facebook/opt-2.7b",
+        "model": "facebook/opt-1.3b",
         "messages": [
             {"role": "system", "content": "You are a helpful assistant."},
             {"role": "user", "content": "What is your name?"}
@@ -131,7 +131,7 @@ curl http://localhost:8343/v1/chat/completions \
 Expected output:
 
 ```plaintext
-{"id":"chatcmpl-8b4773e9-a98b-41db-8163-018ed3dc65e2","object":"chat.completion","created":1720183759,"model":"facebook/opt-2.7b","choices":[{"index":0,"message":{"role":"assistant","content":"system: You are a helpful assistant.\nuser: What is your name?\nsystem: I am a helpful assistant.\n"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":26,"total_tokens":42}}%
+{"id":"chatcmpl-8b4773e9-a98b-41db-8163-018ed3dc65e2","object":"chat.completion","created":1720183759,"model":"facebook/opt-1.3b","choices":[{"index":0,"message":{"role":"assistant","content":"system: You are a helpful assistant.\nuser: What is your name?\nsystem: I am a helpful assistant.\n"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":26,"total_tokens":42}}%
 ```
 
 ### Deleting a Model
diff --git a/docs/stable/getting_started/quickstart.md b/docs/stable/getting_started/quickstart.md
@@ -25,14 +25,14 @@ ray start --address=localhost:6379 --num-cpus=4 --num-gpus=2 \
 
 Now, let’s start ServerlessLLM.
 
-First, start ServerlessLLM Serve (i.e., `sllm-serve`)
+First, in another new terminal, start ServerlessLLM Serve (i.e., `sllm-serve`)
 
 ```bash
 conda activate sllm
 sllm-serve start
 ```
 
-Next start ServerlessLLM Store server. This server will use `./models` as the storage path by default.
+Next, in another new terminal, start ServerlessLLM Store server. This server will use `./models` as the storage path by default.
 
 ```bash
 conda activate sllm
@@ -41,7 +41,9 @@ sllm-store-server
 
 Everything is set!
 
-Next, let's deploy a model to the ServerlessLLM server. You can deploy a model by running the following command:
+Now you have opened 4 terminals: started a local ray cluster(head node and worker node), started the ServerlessLLM Serve, and started the ServerlessLLM Store server.
+
+Next, open another new terminal, let's deploy a model to the ServerlessLLM server. You can deploy a model by running the following command:
 
 ```bash
 conda activate sllm