docker_image_availability: timeout skopeo inspect #5228

sosiouxme · 2017-08-25T20:33:03Z

Set a 10 second timeout when using skopeo to inspect remote registries,
so that it does not wait for a tcp timeout to fail if they are unreachable.

(I am hesitant to set it any lower although most of the time this should return in a couple seconds... but I do not want to add more opportunities for network flakiness to break things.)

Aimed at improving UX for disconnected installs, per https://bugzilla.redhat.com/show_bug.cgi?id=1480195

sosiouxme · 2017-08-25T20:36:46Z

aos-ci-test

juanvallejo · 2017-08-25T20:45:48Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

@@ -168,7 +168,7 @@ def is_available_skopeo_image(self, image, default_registries):
            registries = [registry]

        for registry in registries:
-            args = {"_raw_params": "skopeo inspect --tls-verify=false docker://{}/{}".format(registry, image)}
+            args = {"_raw_params": "timeout 10 skopeo inspect --tls-verify=false docker://{}/{}".format(registry, image)}


would have been nice if a timeout flag was supported directly in the inspect subcommand :)

Yeah. Or in the ansible command module. Alas.
However "timeout" comes from coreutils and does the job fine.

openshift-bot · 2017-08-26T05:03:58Z

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 337003d (logs)

openshift-bot · 2017-08-26T05:06:23Z

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 337003d (logs)

Set a 10 second timeout when using skopeo to inspect remote registries, so that it does not wait for a tcp timeout to fail if they are unreachable.

sosiouxme · 2017-08-28T13:42:13Z

aos-ci-test

openshift-bot · 2017-08-28T14:21:12Z

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for acf014f (logs)

openshift-bot · 2017-08-28T14:23:45Z

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for acf014f (logs)

sosiouxme · 2017-08-28T14:41:59Z

[merge]

openshift-bot · 2017-08-28T14:47:47Z

[test]ing while waiting on the merge queue

openshift-bot · 2017-08-28T14:55:45Z

Evaluated for openshift ansible test up to acf014f

rhcarvalho

LGTM

openshift-bot · 2017-08-28T17:02:42Z

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/546/) (Base Commit: f51bcec) (PR Branch Commit: acf014f)

rhcarvalho · 2017-08-29T12:08:42Z

Flake openshift/origin#16013 and unexpected tox error.

...
py35-flake8 create: /data/src/github.com/openshift/openshift-ansible/.tox/py35-flake8
Traceback (most recent call last):
  File "/usr/bin/tox", line 11, in <module>
    sys.exit(cmdline())
  File "/usr/lib/python2.7/site-packages/tox/session.py", line 39, in main
    retcode = Session(config).runcommand()
  File "/usr/lib/python2.7/site-packages/tox/session.py", line 392, in runcommand
    return self.subcommand_test()
  File "/usr/lib/python2.7/site-packages/tox/session.py", line 543, in subcommand_test
    if self.setupenv(venv):
  File "/usr/lib/python2.7/site-packages/tox/session.py", line 451, in setupenv
    status = venv.update(action=action)
  File "/usr/lib/python2.7/site-packages/tox/venv.py", line 167, in update
    self.hook.tox_testenv_create(action=action, venv=self)
  File "/usr/lib/python2.7/site-packages/pluggy/__init__.py", line 680, in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
  File "/usr/lib/python2.7/site-packages/pluggy/__init__.py", line 240, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/usr/lib/python2.7/site-packages/pluggy/__init__.py", line 234, in <lambda>
    methods, kwargs, specopts=hook.spec_opts, hook=hook
  File "/usr/lib/python2.7/site-packages/pluggy/callers.py", line 110, in execute
    return outcome.get_result()[0]
  File "/usr/lib/python2.7/site-packages/pluggy/callers.py", line 53, in get_result
    _reraise(*ex)  # noqa
  File "/usr/lib/python2.7/site-packages/pluggy/callers.py", line 91, in execute
    res = hook_impl.function(*args)
  File "/usr/lib/python2.7/site-packages/tox/venv.py", line 426, in tox_testenv_create
    config_interpreter = venv.getsupportedinterpreter()
  File "/usr/lib/python2.7/site-packages/tox/venv.py", line 206, in getsupportedinterpreter
    return self.envconfig.getsupportedinterpreter()
  File "/usr/lib/python2.7/site-packages/tox/config.py", line 653, in getsupportedinterpreter
    info = self.config.interpreters.get_info(envconfig=self)
  File "/usr/lib/python2.7/site-packages/tox/interpreters.py", line 28, in get_info
    executable = self.get_executable(envconfig)
  File "/usr/lib/python2.7/site-packages/tox/interpreters.py", line 23, in get_executable
    exe = self.hook.tox_get_python_executable(envconfig=envconfig)
  File "/usr/lib/python2.7/site-packages/pluggy/__init__.py", line 680, in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
  File "/usr/lib/python2.7/site-packages/pluggy/__init__.py", line 240, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/usr/lib/python2.7/site-packages/pluggy/__init__.py", line 234, in <lambda>
    methods, kwargs, specopts=hook.spec_opts, hook=hook
  File "/usr/lib/python2.7/site-packages/pluggy/callers.py", line 110, in execute
    return outcome.get_result()[0]
IndexError: list index out of range
++ export status=FAILURE
++ status=FAILURE
+ set +o xtrace
########## FINISHED STAGE: FAILURE: RUN UNIT TESTS [00h 02m 26s] ##########

Let's try again, [merge]

openshift-bot · 2017-08-29T12:15:30Z

Evaluated for openshift ansible merge up to acf014f

openshift-bot · 2017-08-29T21:07:43Z

continuous-integration/openshift-jenkins/merge FAILURE (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/943/) (Base Commit: 248f75d) (PR Branch Commit: acf014f)

sosiouxme · 2017-08-30T13:48:45Z

https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/943/

Looks like only flakes openshift/origin#12137 and openshift/origin#10773

@sdodson or @rhcarvalho can we just merge this and not waste more queue time...

fridim · 2018-01-19T16:44:38Z

 "_raw_params": "timeout 10 skopeo inspect --tls-verify=false "

Could this timeout value be a parameter somehow? we have some deployment failures "image not available" in our CI due to network or upstream congestion. It especially happens with a lot of concurrent deployments.
IMO it would be better to be able to increase the timeout or the number of retries instead of deactivating the check, wdty?

sosiouxme · 2018-01-19T19:47:42Z

Could this timeout value be a parameter somehow?

It could be. But a skopeo lookup should normally be almost instantaneous, so if it's not, that indicates you're likely to have a pretty hard time actually getting anything from your registry. If thirty seconds of timeout (3 tries * 10s) isn't enough, how much is going to be reasonable? If that's a frequent occurrence I might say your network is flaky enough that you should disable the check in CI, or your CI is always going to be flaky. But you're the user, you may well know better than I do. And it's not like it would be hard to add parameters for this. Open an issue for it if you want and we can make it happen.

sosiouxme requested a review from rhcarvalho August 25, 2017 20:33

sosiouxme requested a review from juanvallejo August 25, 2017 20:37

juanvallejo approved these changes Aug 25, 2017

View reviewed changes

juanvallejo reviewed Aug 25, 2017

View reviewed changes

docker_image_availability: timeout skopeo inspect

acf014f

Set a 10 second timeout when using skopeo to inspect remote registries, so that it does not wait for a tcp timeout to fail if they are unreachable.

rhcarvalho approved these changes Aug 28, 2017

View reviewed changes

rhcarvalho merged commit c749ed9 into openshift:master Aug 30, 2017

docker_image_availability: timeout skopeo inspect #5228

docker_image_availability: timeout skopeo inspect #5228

Uh oh!

Conversation

sosiouxme commented Aug 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sosiouxme commented Aug 25, 2017

Uh oh!

juanvallejo Aug 25, 2017

Choose a reason for hiding this comment

Uh oh!

sosiouxme Aug 25, 2017

Choose a reason for hiding this comment

Uh oh!

openshift-bot commented Aug 26, 2017

Uh oh!

openshift-bot commented Aug 26, 2017

Uh oh!

sosiouxme commented Aug 28, 2017

Uh oh!

openshift-bot commented Aug 28, 2017

Uh oh!

openshift-bot commented Aug 28, 2017

Uh oh!

sosiouxme commented Aug 28, 2017

Uh oh!

openshift-bot commented Aug 28, 2017

Uh oh!

openshift-bot commented Aug 28, 2017

Uh oh!

rhcarvalho left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-bot commented Aug 28, 2017

Uh oh!

rhcarvalho commented Aug 29, 2017

Uh oh!

openshift-bot commented Aug 29, 2017

Uh oh!

openshift-bot commented Aug 29, 2017

Uh oh!

sosiouxme commented Aug 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fridim commented Jan 19, 2018

Uh oh!

sosiouxme commented Jan 19, 2018 via email

Uh oh!

Uh oh!

sosiouxme commented Aug 25, 2017 •

edited

Loading

sosiouxme commented Aug 30, 2017 •

edited

Loading