-
Notifications
You must be signed in to change notification settings - Fork 2.3k
docker_image_availability: timeout skopeo inspect #5228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker_image_availability: timeout skopeo inspect #5228
Conversation
aos-ci-test |
@@ -168,7 +168,7 @@ def is_available_skopeo_image(self, image, default_registries): | |||
registries = [registry] | |||
|
|||
for registry in registries: | |||
args = {"_raw_params": "skopeo inspect --tls-verify=false docker://{}/{}".format(registry, image)} | |||
args = {"_raw_params": "timeout 10 skopeo inspect --tls-verify=false docker://{}/{}".format(registry, image)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would have been nice if a timeout
flag was supported directly in the inspect subcommand :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Or in the ansible command module. Alas.
However "timeout" comes from coreutils and does the job fine.
Set a 10 second timeout when using skopeo to inspect remote registries, so that it does not wait for a tcp timeout to fail if they are unreachable.
aos-ci-test |
[merge] |
[test]ing while waiting on the merge queue |
Evaluated for openshift ansible test up to acf014f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/546/) (Base Commit: f51bcec) (PR Branch Commit: acf014f) |
Flake openshift/origin#16013 and unexpected
Let's try again, [merge] |
Evaluated for openshift ansible merge up to acf014f |
continuous-integration/openshift-jenkins/merge FAILURE (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/943/) (Base Commit: 248f75d) (PR Branch Commit: acf014f) |
https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/943/ Looks like only flakes openshift/origin#12137 and openshift/origin#10773 @sdodson or @rhcarvalho can we just merge this and not waste more queue time... |
Could this timeout value be a parameter somehow? we have some deployment failures "image not available" in our CI due to network or upstream congestion. It especially happens with a lot of concurrent deployments. |
Could this timeout value be a parameter somehow?
It could be. But a skopeo lookup should normally be almost instantaneous,
so if it's not, that indicates you're likely to have a pretty hard time
actually getting anything from your registry. If thirty seconds of timeout
(3 tries * 10s) isn't enough, how much is going to be reasonable? If
that's a frequent occurrence I might say your network is flaky enough that
you should disable the check in CI, or your CI is always going to be flaky.
But you're the user, you may well know better than I do. And it's not like
it would be hard to add parameters for this. Open an issue for it if you
want and we can make it happen.
|
Set a 10 second timeout when using skopeo to inspect remote registries,
so that it does not wait for a tcp timeout to fail if they are unreachable.
(I am hesitant to set it any lower although most of the time this should return in a couple seconds... but I do not want to add more opportunities for network flakiness to break things.)
Aimed at improving UX for disconnected installs, per https://bugzilla.redhat.com/show_bug.cgi?id=1480195