Skip to content

Backoff retry delay in status check is increasing to hours rendering services offline #2897

Open
@eugene-sadovsky

Description

@eugene-sadovsky

Spring Boot Admin Server information

  • Version:
    3.1.4

  • Spring Boot version:
    3.1.0

Client information

  • Used discovery mechanism:
    Consul

Description

Exponential back-off delay in de.codecentric.boot.admin.server.services.IntervalCheck is increasing to hours. I noticed that after I run SBA for 2+ weeks, previously registered services go offline for hours and then they become available again. Restarting SBA helps right away. This is always accompanied by the error message: Unexpected error in status-check: reactor.core.Exceptions$OverflowException: Could not emit tick NN due to lack of requests (interval doesn't support small downstream requests that replenish slower than the ticks)
After some investigation it looks like this happens when checkAllInstances method times-out (takes longer to complete than the interval check) and it triggers a retry. The back-off interval keeps increasing with each failure during the life-time of the SBA and eventually grows to hours. I actually takes about 12+ retries, The situation improved by lowering spring.boot.admin.timeout.health to 3 seconds. By default health endpoint timeout is equal to spring.boot.admin.status-interval (10s).

Here's the code snippet that reproduces this behavior. It will slow-down with each retry

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions