diff --git a/docs/docs/concepts/dev-environments.md b/docs/docs/concepts/dev-environments.md index 1c4a45571..145316ad5 100644 --- a/docs/docs/concepts/dev-environments.md +++ b/docs/docs/concepts/dev-environments.md @@ -212,6 +212,9 @@ and their quantity. Examples: `nvidia` (one NVIDIA GPU), `A100` (one A100), `A10 If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure `shm_size`, e.g. set it to `16GB`. +> If you’re unsure which offers (hardware configurations) are available from the configured backends, use the +> [`dstack offer`](../reference/cli/dstack/offer.md#list-gpu-offers) command to list them. + ### Python version If you don't specify `image`, `dstack` uses its base Docker image pre-configured with diff --git a/docs/docs/concepts/fleets.md b/docs/docs/concepts/fleets.md index 6dcd82df3..b59848ab7 100644 --- a/docs/docs/concepts/fleets.md +++ b/docs/docs/concepts/fleets.md @@ -121,6 +121,9 @@ and their quantity. Examples: `nvidia` (one NVIDIA GPU), `A100` (one A100), `A10 Currently, only 8 TPU cores can be specified, supporting single TPU device workloads. Multi-TPU support is coming soon. +> If you’re unsure which offers (hardware configurations) are available from the configured backends, use the +> [`dstack offer`](../reference/cli/dstack/offer.md#list-gpu-offers) command to list them. + #### Blocks { #cloud-blocks } For cloud fleets, `blocks` function the same way as in SSH fleets. diff --git a/docs/docs/concepts/services.md b/docs/docs/concepts/services.md index 3abdd50d2..a75cd19b3 100644 --- a/docs/docs/concepts/services.md +++ b/docs/docs/concepts/services.md @@ -359,6 +359,9 @@ and their quantity. Examples: `nvidia` (one NVIDIA GPU), `A100` (one A100), `A10 If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure `shm_size`, e.g. set it to `16GB`. +> If you’re unsure which offers (hardware configurations) are available from the configured backends, use the +> [`dstack offer`](../reference/cli/dstack/offer.md#list-gpu-offers) command to list them. + ### Python version If you don't specify `image`, `dstack` uses its base Docker image pre-configured with diff --git a/docs/docs/concepts/tasks.md b/docs/docs/concepts/tasks.md index 8d0739d34..237de649b 100644 --- a/docs/docs/concepts/tasks.md +++ b/docs/docs/concepts/tasks.md @@ -234,6 +234,9 @@ and their quantity. Examples: `nvidia` (one NVIDIA GPU), `A100` (one A100), `A10 If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure `shm_size`, e.g. set it to `16GB`. +> If you’re unsure which offers (hardware configurations) are available from the configured backends, use the +> [`dstack offer`](../reference/cli/dstack/offer.md#list-gpu-offers) command to list them. + ### Python version If you don't specify `image`, `dstack` uses its base Docker image pre-configured with diff --git a/docs/docs/guides/protips.md b/docs/docs/guides/protips.md index 98af5495f..ddadbac22 100644 --- a/docs/docs/guides/protips.md +++ b/docs/docs/guides/protips.md @@ -274,13 +274,24 @@ To run in detached mode, use `-d` with `dstack apply`. > If you detached the CLI, you can always re-attach to a run via [`dstack attach`](../reference/cli/dstack/attach.md). -## GPU +## GPU specification `dstack` natively supports NVIDIA GPU, AMD GPU, and Google Cloud TPU accelerator chips. -The `gpu` property within [`resources`](../reference/dstack.yml/dev-environment.md#resources) (or the `--gpu` option with `dstack apply`) +The `gpu` property within [`resources`](../reference/dstack.yml/dev-environment.md#resources) (or the `--gpu` option with [`dstack apply`](../reference/cli/dstack/apply.md) or +[`dstack offer`](../reference/cli/dstack/offer.md)) allows specifying not only memory size but also GPU vendor, names, their memory, and quantity. +The general format is: `:::`. + +Each component is optional. + +Ranges can be: + +* **Closed** (e.g. `24GB..80GB` or `1..8`) +* **Open** (e.g. `24GB..` or `1..`) +* **Single values** (e.g. `1` or `24GB`). + Examples: - `1` (any GPU) @@ -308,7 +319,36 @@ The GPU vendor is indicated by one of the following case-insensitive values: Currently, you can't specify other than 8 TPU cores. This means only single host workloads are supported. Support for multiple hosts is coming soon. -## Monitoring metrics +## Offers + +If you're not sure which offers (hardware configurations) are available with the configured backends, use the +[`dstack offer`](../reference/cli/dstack/offer.md#list-gpu-offers) command. + +
+ +```shell +$ dstack offer --gpu H100:1.. --max-offers 10 +Getting offers... +---> 100% + + # BACKEND REGION INSTANCE TYPE RESOURCES SPOT PRICE + 1 datacrunch FIN-01 1H100.80S.30V 30xCPU, 120GB, 1xH100 (80GB), 100.0GB (disk) no $2.19 + 2 datacrunch FIN-02 1H100.80S.30V 30xCPU, 120GB, 1xH100 (80GB), 100.0GB (disk) no $2.19 + 3 datacrunch FIN-02 1H100.80S.32V 32xCPU, 185GB, 1xH100 (80GB), 100.0GB (disk) no $2.19 + 4 datacrunch ICE-01 1H100.80S.32V 32xCPU, 185GB, 1xH100 (80GB), 100.0GB (disk) no $2.19 + 5 runpod US-KS-2 NVIDIA H100 PCIe 16xCPU, 251GB, 1xH100 (80GB), 100.0GB (disk) no $2.39 + 6 runpod CA NVIDIA H100 80GB HBM3 24xCPU, 251GB, 1xH100 (80GB), 100.0GB (disk) no $2.69 + 7 nebius eu-north1 gpu-h100-sxm 16xCPU, 200GB, 1xH100 (80GB), 100.0GB (disk) no $2.95 + 8 runpod AP-JP-1 NVIDIA H100 80GB HBM3 20xCPU, 251GB, 1xH100 (80GB), 100.0GB (disk) no $2.99 + 9 runpod CA-MTL-1 NVIDIA H100 80GB HBM3 28xCPU, 251GB, 1xH100 (80GB), 100.0GB (disk) no $2.99 + 10 runpod CA-MTL-2 NVIDIA H100 80GB HBM3 26xCPU, 125GB, 1xH100 (80GB), 100.0GB (disk) no $2.99 + ... + Shown 10 of 99 offers, $127.816 max +``` + +
+ +## Metrics While `dstack` allows the use of any third-party monitoring tools (e.g., Weights and Biases), you can also monitor container metrics such as CPU, memory, and GPU usage using the [built-in diff --git a/docs/docs/installation/index.md b/docs/docs/installation/index.md index 5632cef06..92c6723b6 100644 --- a/docs/docs/installation/index.md +++ b/docs/docs/installation/index.md @@ -80,6 +80,8 @@ The server can run on your laptop or any environment with access to the cloud an +To verify that backends are properly configured, use the [`dstack offer`](../reference/cli/dstack/offer.md#list-gpu-offers) command to list available GPU offers. + !!! info "Server deployment" For more details on server deployment options, see the [Server deployment](../guides/server-deployment.md) guide. diff --git a/docs/docs/reference/cli/dstack/offer.md b/docs/docs/reference/cli/dstack/offer.md new file mode 100644 index 000000000..13299b8ef --- /dev/null +++ b/docs/docs/reference/cli/dstack/offer.md @@ -0,0 +1,146 @@ +# dstack offer + +Displays available offers (hardware configurations) with the configured backends (or offers that match already provisioned fleets). + +The output includes backend, region, instance type, resources, spot availability, and pricing details. + +## Usage + +This command accepts most of the same arguments as [`dstack apply`](apply.md). + +
+ +```shell +$ dstack offer --help +#GENERATE# +``` + +
+ +## Examples + +### List GPU offers + +The `--gpu` flag accepts the same specification format as the `gpu` property in [`dev environment`](../../../concepts/dev-environments.md), [`task`](../../../concepts/tasks.md), +[`service`](../../../concepts/services.md), and [`fleet`](../../../concepts/fleets.md) configurations. + +The general format is: `:::`. + +Each component is optional. + +Ranges can be: + +* **Closed** (e.g. `24GB..80GB` or `1..8`) +* **Open** (e.g. `24GB..` or `1..`) +* **Single values** (e.g. `1` or `24GB`). + +Examples: + +* `--gpu nvidia` (any NVIDIA GPU) +* `--gpu nvidia:1..8` (from one to eigth NVIDIA GPUs) +* `--gpu A10,A100` (single NVIDIA A10 or A100 GPU) +* `--gpu A100:80GB` (single NVIDIA A100 with 80GB vRAM) +* `--gpu 24GB..80GB` (any GPU with 24GB to 80GB vRAM) + + + + +The following example lists offers with one or more H100 GPUs: + +
+ +```shell +$ dstack offer --gpu H100:1.. --max-offers 10 +Getting offers... +---> 100% + + # BACKEND REGION INSTANCE TYPE RESOURCES SPOT PRICE + 1 datacrunch FIN-01 1H100.80S.30V 30xCPU, 120GB, 1xH100 (80GB), 100.0GB (disk) no $2.19 + 2 datacrunch FIN-02 1H100.80S.30V 30xCPU, 120GB, 1xH100 (80GB), 100.0GB (disk) no $2.19 + 3 datacrunch FIN-02 1H100.80S.32V 32xCPU, 185GB, 1xH100 (80GB), 100.0GB (disk) no $2.19 + 4 datacrunch ICE-01 1H100.80S.32V 32xCPU, 185GB, 1xH100 (80GB), 100.0GB (disk) no $2.19 + 5 runpod US-KS-2 NVIDIA H100 PCIe 16xCPU, 251GB, 1xH100 (80GB), 100.0GB (disk) no $2.39 + 6 runpod CA NVIDIA H100 80GB HBM3 24xCPU, 251GB, 1xH100 (80GB), 100.0GB (disk) no $2.69 + 7 nebius eu-north1 gpu-h100-sxm 16xCPU, 200GB, 1xH100 (80GB), 100.0GB (disk) no $2.95 + 8 runpod AP-JP-1 NVIDIA H100 80GB HBM3 20xCPU, 251GB, 1xH100 (80GB), 100.0GB (disk) no $2.99 + 9 runpod CA-MTL-1 NVIDIA H100 80GB HBM3 28xCPU, 251GB, 1xH100 (80GB), 100.0GB (disk) no $2.99 + 10 runpod CA-MTL-2 NVIDIA H100 80GB HBM3 26xCPU, 125GB, 1xH100 (80GB), 100.0GB (disk) no $2.99 + ... + Shown 10 of 99 offers, $127.816 max +``` + +
+ +### JSON format + +Use `--json` to output offers in the JSON format. + +
+ +```shell +$ dstack offer --gpu amd --json +{ + "project": "main", + "user": "admin", + "resources": { + "cpu": { + "min": 2, + "max": null + }, + "memory": { + "min": 8.0, + "max": null + }, + "shm_size": null, + "gpu": { + "vendor": "amd", + "name": null, + "count": { + "min": 1, + "max": 1 + }, + "memory": null, + "total_memory": null, + "compute_capability": null + }, + "disk": { + "size": { + "min": 100.0, + "max": null + } + } + }, + "max_price": null, + "spot": null, + "reservation": null, + "offers": [ + { + "backend": "runpod", + "region": "EU-RO-1", + "instance_type": "AMD Instinct MI300X OAM", + "resources": { + "cpus": 24, + "memory_mib": 289792, + "gpus": [ + { + "name": "MI300X", + "memory_mib": 196608, + "vendor": "amd" + } + ], + "spot": false, + "disk": { + "size_mib": 102400 + }, + "description": "24xCPU, 283GB, 1xMI300X (192GB), 100.0GB (disk)" + }, + "spot": false, + "price": 2.49, + "availability": "available" + } + ], + "total_offers": 1 +} +``` + +
diff --git a/mkdocs.yml b/mkdocs.yml index 51039885e..6f90fdb54 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -231,6 +231,7 @@ nav: - dstack metrics: docs/reference/cli/dstack/metrics.md - dstack config: docs/reference/cli/dstack/config.md - dstack fleet: docs/reference/cli/dstack/fleet.md + - dstack offer: docs/reference/cli/dstack/offer.md - dstack volume: docs/reference/cli/dstack/volume.md - dstack gateway: docs/reference/cli/dstack/gateway.md - API: