Use exponentially increasing retry delays for pending runs #2519
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #2420
Previously pending runs were retried with 15s delay which caused lots of job submissions being created in case of constant no capacity. The new logic is to use exponential retry delays with 10m max delay. As a result, there will be 144 retries/day max as compared to the previous 5760 retires/day (40 times diff). The retry latency will not change for failed and retried jobs unless there is long no capacity. Runs that wait for capacity for hours/days can wait additional 5-10m.
There is still an issue of too many job submissions being returned in the API, e.g. a run that sits in pending for months. This is to be addressed in a separate issue if proved to be necessary.