TKE AI Playbook

Project Overview

This project provides Kubernetes-based scripts for AI large language model (LLM) operations, including model downloading, inference service deployment, and performance benchmarking, enabling end-to-end AI workflows on Tencent Kubernetes Engine (TKE).

Prerequisites

Kubernetes cluster (recommended v1.28+)
Tencent Cloud CFS storage (or compatible storage solution)
GPU nodes (3 * H20 nodes used in this project)

Capabilities

Model Download

Use the Model Download Utility to download models to CFS storage for reuse across inference services.

Inference Service Deployment

dynamo: NVIDIA's distributed inference framework (open-sourced at GTC 2025), supporting multiple inference engines including TRT-LLM, vLLM, and SGLang.

Performance Benchmarking

LLM Benchmark Suite

Examples

dynamo

Single-Node PD Disaggregation

Prerequisites:

1 x 8 GPU Node.

Deploys:

1 x VllmWorker (4 GPUs for decode phase)
4 x PrefillWorker (1 GPU each for prefill phase)

bash examples/dynamo/single-node/deploy.sh

3 Nodes PD Disaggregation

[TODO]

Scripts

Quick Test for OpenAI-format API Endpoint

Script: test-openai-api
Usage:

API_ENDPOINT=<your-api-endpoint> bash scripts/test-openai-api.sh

# Test localhost:8080
API_ENDPOINT=http://localhost:8080 bash scripts/test-openai-api.sh

Generate Model Download Job

Script: tke-llm-downloader
Usage:

# Download 'deepseek-ai/DeepSeek-R1' model to PVC 'ai-model' with 6 concurrency
bash scripts/tke-llm-downloader.sh --pvc ai-model --completions 6 --parallelism 6 --model deepseek-ai/DeepSeek-R1

# Download 'Qwen/Qwen3-32B' model to PVC 'ai-model' with 3 concurrency
bash scripts/tke-llm-downloader.sh --pvc ai-model --completions 3 --parallelism 3 --model Qwen/Qwen3-32B

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
dockerfiles		dockerfiles
examples		examples
helm-charts		helm-charts
scripts		scripts
.gitignore		.gitignore
CodeOfConduct.md		CodeOfConduct.md
CodeOfConduct_zh.md		CodeOfConduct_zh.md
Contributing.md		Contributing.md
Contributing_zh.md		Contributing_zh.md
README.md		README.md
README_zh.md		README_zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TKE AI Playbook

Project Overview

Prerequisites

Capabilities

Model Download

Inference Service Deployment

Performance Benchmarking

Examples

dynamo

Single-Node PD Disaggregation

3 Nodes PD Disaggregation

Scripts

Quick Test for OpenAI-format API Endpoint

Generate Model Download Job

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Languages

tkestack/tke-ai-playbook

Folders and files

Latest commit

History

Repository files navigation

TKE AI Playbook

Project Overview

Prerequisites

Capabilities

Model Download

Inference Service Deployment

Performance Benchmarking

Examples

dynamo

Single-Node PD Disaggregation

3 Nodes PD Disaggregation

Scripts

Quick Test for OpenAI-format API Endpoint

Generate Model Download Job

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Languages

Packages