Skip to content

Add Initial Gateway API Inference Extension Support #10411

@danehans

Description

@danehans

Gloo Edge Product

Open Source

Gloo Edge Version

main

Is your feature request related to a problem? Please describe.

Gateway API Inference Extension (formerly llm-instance-gateway) is a project that originated from wg-serving and is sponsored by SIG Network. The project provides APIs, a load balancing algorithm, ext-proc code, and controllers to support advanced routing of LLM traffic.

Describe the solution you'd like

Add the following support:

  • Create an enhancement proposal that provides the API design and implementation details: Adds EP-10411: Gateway API Inference Extension Support #10420
  • Update the GatewayClassParameters API to surface user-facing configuration for supported inference extensions: Adds InferenceExtension to GatewayParameters #10601. ATM not needed since the feature is auto-enabled if the inference extension CRDs are present in the cluster.
  • Update the configuration API to enable/disable the gateway-api-inference-extension feature. ATM not needed since the feature is auto-enabled if the inference extension CRDs are present in the cluster.
  • Update Helm charts to install k8sgateway with the gateway-api-inference-extension feature based on the provided configuration. ATM not needed since the feature is auto-enabled if the inference extension CRDs are present in the cluster.
  • Add gateway-api-inference-extension as a supported extension.
  • Add controllers that reconcile gateway-api-inference-extension custom resources, e.g. InferencePool. The controller should be optional, i.e. only run if the configuration option is enabled and the gateway-api-inference-extension CRDs exist.
  • Update RBAC rules to allow gateway-api-inference-extension controllers to get, list, watch, etc. gateway-api-inference-extension custom resources.
  • Update the deployer pkg to manage the required gateway-api-inference-extension resources, e.g. Deployment, to run the ext-proc server.
  • Add InferencePool as a supported HTTPRoute backend reference.
  • Update the translator pkg to translate HTTPRoutes referencing an InferencePool resource.
  • Update the proxy_syncer pkg to translate gateway-api-inference-extension CRs into Gloo Proxies and sync the proxy client with the newly translated proxies.
  • Update the reporter pkg to support reporting gateway-api-inference-extension CRD status.
  • Add initial e2e tests for this feature.
  • Update CI to run e2e tests.
  • Add initial user docs: InferenceExtension in kgateway kgateway.dev#70. Owner: @artberger.
  • Add failureMode support as a follow-up to Add Initial Gateway API Inference Extension Support #10411.
  • Update the deployer to support an HTTPRoute switching between Service and InferencePool backendRefs (xref).
  • Improve EPP RBAC based on EPP: Use Dedicated Service Account kubernetes-sigs/gateway-api-inference-extension#224. Either change ClusterRole and ClusterRoleBinding turn into Role and RoleBinding or the first EPP creates a common CR/CRB if it does not exist, additional InferencePools add their ServiceAccount to the subjects in the common ClusterRoleBinding and remove their entry upon InferencePool deletion. This additional complexity may not be worth the benefit of having a common CR/CRB for all EPPs.
  • Track Extension Auto-Provisioning kubernetes-sigs/gateway-api-inference-extension#507 for the status of auto-provisioning InferencePool infra and adjust the deployer accordingly.
  • For multiple backends on one route, investigate using RouteAction_WeightedClusters which may use the current ExtProcPerRoute approach or put the ExtProc as an upstream Cluster filter. (xref).
  • Investigate using the standard EDS cluster created for the ext-proc service instead of creating a separate one (xref). Consider changing the model from endpoint picker per upstream, to per GW.
  • Investigate whether or not to remove finalizer from inferencepool controller.
  • The usedPools field of endpointPickerPass should be map[string]map[types.NamespacedName]*ir.InferencePool to support per-filter-chain (xref).
  • Implement Override Host LB policy (Envoy PR and Infer Ext PR).
  • Run benchmarks and publish results.

Describe alternatives you've considered

Do not support the Gateway API Inference Extension project.

Additional Context

No response

Metadata

Metadata

Assignees

Labels

Area: K8S Gateway APIIssues related to the Kubernetes Gateway APIPrioritizedIndicating issue prioritized to be worked on in RFE streamPriority: HighRequired in next 3 months to make progress, bugs that affect multiple users, or very bad UXSize: XL>2 weeksType: EnhancementNew feature or requestscope/2.0

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions