Labels
Labels
44 labels
- Something isn't working
- automated tests, build checks, github actions, system stability & efficiency.
- help/insights needed from community
- PRs initiated from Community
- Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.
- Pull requests that update a dependency file
- Deploying TRTLLM with separated, distributed components (params, kv-cache, compute). Arch & perf.
- TRTLLM's textual/illustrative materials: API refs, guides, tutorials. Improvement & clarity.
- This issue or pull request already exists
- Items about improving or complaints about TRTLLM ease of use
- New feature or request. This includes new model, dtype, functionality support
- General operational aspects of TRTLLM execution not in other categories.
- Extra attention is needed
- Setting up and building TRTLLM: compilation, pip install, dependencies, env config, CMake.
- kv-cache management for efficient LLM inference
- High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.
- Parameter-Efficient Fine-Tuning (PEFT) like LoRA/P-tuning in TRTLLM: adapter use & perf.
- Lower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).
- Memory utilization in TRTLLM: leak/OOM handling, footprint optimization, memory profiling.
- Further info is required from the requester for devs to help
- Request to add a new model
- Some known limitation, but not a bug.
- trtllm-serve's OpenAI-compatible API: endpoint behavior, req/resp formats, feature parity.
- TRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.