Inference Scheduler

This scheduler makes optimized routing decisions for inference requests to the llm-d inference framework.

About

This provides an "Endpoint Picker (EPP)" component to the llm-d inference framework which schedules incoming inference requests to the platform via a Kubernetes Gateway according to scheduler plugins (for more details, see the Architecture Documentation).

The EPP extends the Gateway API Inference Extension (GIE) project, which provides the API resources and machinery for scheduling. We add some custom features that are specific to llm-d here, such as P/D Disaggregation.

A compatible Gateway API implementation is used as the Gateway. The Gateway API implementation must utilize Envoy and support ext-proc, as this is the callback mechanism the EPP relies on to make routing decisions to model serving workloads currently.

Contributing

Contributions are welcome!

For large changes please create an issue first describing the change so the maintainers can do an assessment, and work on the details with you. See DEVELOPMENT.md for details on how to work with the codebase.

Note that in general features should go to the upstream Gateway API Inference Extension (GIE) project first if applicable. The GIE is a major dependency of ours, and where most general purpose inference features live. If you have something that you feel is general purpose or use, it probably should go to the GIE. If you have something that's llm-d specific then it should go here. If you're not sure whether your feature belongs here or in the GIE, feel free to create a discussion or ask on Slack.

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
.github		.github
cmd/epp		cmd/epp
deploy		deploy
docs		docs
hooks		hooks
internal/controller		internal/controller
pkg		pkg
scripts		scripts
test/integration		test/integration
.gitignore		.gitignore
.golangci.yml		.golangci.yml
DEVELOPMENT.md		DEVELOPMENT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Inference Scheduler

About

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 14

Languages

License

llm-d/llm-d-inference-scheduler

Folders and files

Latest commit

History

Repository files navigation

Inference Scheduler

About

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 14

Languages

Packages