Skip to content

add flag to paginate list calls #177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

oliviassss
Copy link
Contributor

What type of PR is this?

Which issue does this PR fix:
When the pod scale is large, the current unpaginated list pod call will heavy load apiserver and lead to apiserver timeout.
Add a flag --list-page-size to paginate k8s list pod calls, default value is 1000.

What does this PR do / Why do we need it:
Avoid apiserver timeout issue that may lead to controller crash when pod scale is large.

If an issue # is not available please add steps to reproduce and the controller logs:

Testing done on this change:

Automation added to e2e:

Will this PR introduce any new dependencies?:

Will this break upgrades or downgrades. Has updating a running cluster been tested?:

Does this PR introduce any user-facing change?:


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@oliviassss oliviassss requested a review from a team as a code owner May 16, 2025 23:27
Copy link

@M00nF1sh M00nF1sh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm for this code alone,but i have below concerns:

  1. have we tested that this pagination is really happening in apiServer from audit logs? are we use real client instead of cached one?
    • edit: just read through the entire codebase, from the code seems we are using the cached client, i'm wondering how this would work at all..
  2. i'm curious that there isn't pagination support in client-go or controller-runtime?

@M00nF1sh M00nF1sh self-requested a review May 16, 2025 23:42
Copy link

@M00nF1sh M00nF1sh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k8sClient shall be a real client to make this work, and seems we are using cached one, unless i made some mistake here.
https://github.com/aws/amazon-network-policy-controller-k8s/blob/main/pkg/config/runtime_config.go#L96

@oliviassss
Copy link
Contributor Author

@M00nF1sh thanks, yes it's a cached client. I'm running the controller in a cluster with 130k pods to confirm from apiserver side.

@yash97 yash97 changed the title add flag to paginzate list calls add flag to paginate list calls May 17, 2025
@oliviassss
Copy link
Contributor Author

oliviassss commented May 17, 2025

i'm curious that there isn't pagination support in client-go or controller-runtime

client-go enforces a default page size as 500, but for init list call, with RV=0, the limit is ignored. so apiserver ignores page size when RV=0. saw this from source code comment
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go#L693

// computeListLimit determines whether the cacher should
// apply a limit to an incoming LIST request and returns its value.
//
// note that this function doesn't check RVM nor the Continuation token.
// these parameters are validated by the shouldDelegateList function.
//
// as of today, the limit is ignored for requests that set RV == 0
func computeListLimit(opts storage.ListOptions) int64 {
	if opts.Predicate.Limit <= 0 || opts.ResourceVersion == "0" {
		return 0
	}
	return opts.Predicate.Limit
}

have we tested that this pagination is really happening in apiServer from audit logs? are we use real client instead of cached one?

I confirmed this behavior from 130k-pod cluster

  1. During startup only 1 list pods call is made (although the limit=500, I don't see pagination due to the reason above)
  2. During reconciliation, only watch calls are made, no list calls (should because it's list from a cached client). Also, in reconciliation the list pod calls have selector, so it's not listing all pods, which is less concerned. I will do more scale test to confirm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants