-
Notifications
You must be signed in to change notification settings - Fork 362
Retry initializing informers to allow for network instability on node restart #3688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
var informer cache.Informer | ||
var err error | ||
i := 0 | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd prefer an indexed for loop here instead of if i==2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented Kalle's recommendation to use the dskit backoff package instead.
var informer cache.Informer | ||
var err error | ||
i := 0 | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we could try to use github.com/grafana/dskit/backoff for retires like we do in a lot of other places
… restart (#3688) * Retry initializing informers to allow for network instability on node restart * use dskit backoff * return error on failure
* Add livedebugging service to validator services list (#3785) * Add livedebugging service to validator list * Update changelog * Accept optional strings as number arguments (#3766) * Accept optional strings * Use "tryCapsuleConvert" instead of "ConvertInto" * fix(ebpf): python typecheck type before reading type name (#3797) * Retry initializing informers to allow for network instability on node restart (#3688) * Retry initializing informers to allow for network instability on node restart * use dskit backoff * return error on failure * Run go mod tidy --------- Co-authored-by: Sam DeHaan <[email protected]> Co-authored-by: Tolya Korniltsev <[email protected]>
PR Description
Immediately after a node restart, the Alloy pod is not always able to access the api server due to network initialization. While there may be a more correct way to handle this, to implement a quick solution I've just added a retry mechanism. This solved the problem for me when testing using a local kind cluster, but I would like interested parties to test the fix in their environments as well.
Which issue(s) this PR fixes
Fixes #1853
Notes to the Reviewer
PR Checklist