Retry initializing informers to allow for network instability on node restart #3688

dehaansa · 2025-05-24T04:24:05Z

PR Description

Immediately after a node restart, the Alloy pod is not always able to access the api server due to network initialization. While there may be a more correct way to handle this, to implement a quick solution I've just added a retry mechanism. This solved the problem for me when testing using a local kind cluster, but I would like interested parties to test the fix in their environments as well.

Which issue(s) this PR fixes

Fixes #1853

Notes to the Reviewer

PR Checklist

CHANGELOG.md updated

… restart

thampiotr · 2025-06-10T13:54:30Z

internal/component/prometheus/operator/common/crdmanager.go

+	var informer cache.Informer
+	var err error
+	i := 0
+	for {


Nit: I'd prefer an indexed for loop here instead of if i==2.

Implemented Kalle's recommendation to use the dskit backoff package instead.

kalleep · 2025-06-10T13:57:38Z

internal/component/prometheus/operator/common/crdmanager.go

+	var informer cache.Informer
+	var err error
+	i := 0
+	for {


Nit: we could try to use github.com/grafana/dskit/backoff for retires like we do in a lot of other places

… restart (#3688) * Retry initializing informers to allow for network instability on node restart * use dskit backoff * return error on failure

* Add livedebugging service to validator services list (#3785) * Add livedebugging service to validator list * Update changelog * Accept optional strings as number arguments (#3766) * Accept optional strings * Use "tryCapsuleConvert" instead of "ConvertInto" * fix(ebpf): python typecheck type before reading type name (#3797) * Retry initializing informers to allow for network instability on node restart (#3688) * Retry initializing informers to allow for network instability on node restart * use dskit backoff * return error on failure * Run go mod tidy --------- Co-authored-by: Sam DeHaan <[email protected]> Co-authored-by: Tolya Korniltsev <[email protected]>

Retry initializing informers to allow for network instability on node…

efa2cc9

… restart

dehaansa mentioned this pull request May 24, 2025

Kubernetes node reboot prometheus operator CRDs not monitored on restart #1853

Open

Merge branch 'main' into retry-get-informer

3aec10d

dehaansa marked this pull request as ready for review June 10, 2025 13:46

dehaansa requested a review from a team as a code owner June 10, 2025 13:46

thampiotr approved these changes Jun 10, 2025

View reviewed changes

kalleep reviewed Jun 10, 2025

View reviewed changes

kalleep approved these changes Jun 10, 2025

View reviewed changes

dehaansa added 2 commits June 10, 2025 10:05

Merge branch 'main' into retry-get-informer

8ec30b2

use dskit backoff

7421be1

dehaansa requested a review from thampiotr June 10, 2025 14:14

return error on failure

c20db84

dehaansa enabled auto-merge (squash) June 10, 2025 15:09

dehaansa merged commit 501a307 into main Jun 10, 2025
40 checks passed

dehaansa deleted the retry-get-informer branch June 10, 2025 15:10

ptodev mentioned this pull request Jun 25, 2025

Cherry pick bugfixes to the 1.9 branch #3896

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retry initializing informers to allow for network instability on node restart #3688

Retry initializing informers to allow for network instability on node restart #3688

Uh oh!

dehaansa commented May 24, 2025

Uh oh!

thampiotr Jun 10, 2025

Uh oh!

dehaansa Jun 10, 2025

Uh oh!

kalleep Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!

Retry initializing informers to allow for network instability on node restart #3688

Retry initializing informers to allow for network instability on node restart #3688

Uh oh!

Conversation

dehaansa commented May 24, 2025

PR Description

Which issue(s) this PR fixes

Notes to the Reviewer

PR Checklist

Uh oh!

thampiotr Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

dehaansa Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

kalleep Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!