Description
I have been detecting that occasionally, as in 200 times in the past 6 hours for example, the scale up is returning the No runner will be created, job is not queued.
message.
This is an issue because I'm using ephemeral runners, so, if a runner is not created for a given job (A
), it keeps waiting until a runner with the same labels is available. However, if a runner with such labels is available, it means that it was actually created by another job (B
), so either A
or B
will run but the other will not. This means that for each runner that was not created, this issue gets exponentially worse.
It's happening in all my instances of this module (37 right now) and in all configurations (2-8 per module), be it spot or on-demand. The common factor is that everything is ephemeral.
An example of a runner config is below:
"default" = {
matcherConfig = {
labelMatchers = [["self-hosted", var.project, "default"]]
exactMatch = true
}
runner_config = {
delay_webhook_event = 0
enable_ephemeral_runners = true
enable_job_queued_check = true
minimum_running_time_in_minutes = 5
enable_organization_runners = true
enable_on_demand_failover_for_errors = ["InsufficientInstanceCapacity"]
instance_allocation_strategy = "price-capacity-optimized"
instance_target_capacity_type = "spot"
instance_types = ["m6i.large", "m5.large"]
runner_architecture = "x64"
runner_as_root = true
runner_extra_labels = [var.project, "default"]
runner_group_name = var.project
runner_os = "linux"
runners_maximum_count = 30
userdata_post_install = "docker login -u ${local.docker_hub_user} -p ${local.docker_hub_password}"
}
}