Open
Description
Gloo Edge Product
Open Source
Gloo Edge Version
main
Is your feature request related to a problem? Please describe.
Gateway API Inference Extension (formerly llm-instance-gateway) is a project that originated from wg-serving and is sponsored by SIG Network. The project provides APIs, a load balancing algorithm, ext-proc code, and controllers to support advanced routing of LLM traffic.
Describe the solution you'd like
Add the following support:
- Create an enhancement proposal that provides the API design and implementation details: Adds EP-10411: Gateway API Inference Extension Support #10420
-
Update the GatewayClassParameters API to surface user-facing configuration for supported inference extensions: Adds InferenceExtension to GatewayParameters #10601. ATM not needed since the feature is auto-enabled if the inference extension CRDs are present in the cluster. -
Update the configuration API to enable/disable the gateway-api-inference-extension feature. ATM not needed since the feature is auto-enabled if the inference extension CRDs are present in the cluster. -
Update Helm charts to install k8sgateway with the gateway-api-inference-extension feature based on the provided configuration. ATM not needed since the feature is auto-enabled if the inference extension CRDs are present in the cluster. - Add gateway-api-inference-extension as a supported extension.
- Add controllers that reconcile gateway-api-inference-extension custom resources, e.g. InferencePool. The controller should be optional, i.e. only run if the configuration option is enabled and the gateway-api-inference-extension CRDs exist.
- Update RBAC rules to allow gateway-api-inference-extension controllers to get, list, watch, etc. gateway-api-inference-extension custom resources.
- Update the deployer pkg to manage the required gateway-api-inference-extension resources, e.g. Deployment, to run the ext-proc server.
- Add InferencePool as a supported HTTPRoute backend reference.
- Update the translator pkg to translate HTTPRoutes referencing an InferencePool resource.
- Update the proxy_syncer pkg to translate gateway-api-inference-extension CRs into Gloo Proxies and sync the proxy client with the newly translated proxies.
- Update the reporter pkg to support reporting gateway-api-inference-extension CRD status.
- Add initial e2e tests for this feature.
- Update CI to run e2e tests.
- Add initial user docs: InferenceExtension in kgateway kgateway.dev#70. Owner: @artberger.
- Add
failureMode
support as a follow-up to Add Initial Gateway API Inference Extension Support #10411. - Update the deployer to support an HTTPRoute switching between Service and InferencePool backendRefs (xref).
- Improve EPP RBAC based on EPP: Use Dedicated Service Account kubernetes-sigs/gateway-api-inference-extension#224. Either change ClusterRole and ClusterRoleBinding turn into Role and RoleBinding or the first EPP creates a common CR/CRB if it does not exist, additional InferencePools add their ServiceAccount to the
subjects
in the common ClusterRoleBinding and remove their entry upon InferencePool deletion. This additional complexity may not be worth the benefit of having a common CR/CRB for all EPPs. - Track Extension Auto-Provisioning kubernetes-sigs/gateway-api-inference-extension#507 for the status of auto-provisioning InferencePool infra and adjust the deployer accordingly.
- For multiple backends on one route, investigate using
RouteAction_WeightedClusters
which may use the currentExtProcPerRoute
approach or put the ExtProc as an upstream Cluster filter. (xref). - Investigate using the standard EDS cluster created for the ext-proc service instead of creating a separate one (xref). Consider changing the model from endpoint picker per upstream, to per GW.
- Investigate whether or not to remove finalizer from inferencepool controller.
- The
usedPools
field ofendpointPickerPass
should bemap[string]map[types.NamespacedName]*ir.InferencePool
to support per-filter-chain (xref). - Implement Override Host LB policy (Envoy PR and Infer Ext PR).
- Run benchmarks and publish results.
Describe alternatives you've considered
Do not support the Gateway API Inference Extension project.
Additional Context
No response