Skip to content

docker_image_availability: timeout skopeo inspect #5228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 30, 2017
Merged

docker_image_availability: timeout skopeo inspect #5228

merged 1 commit into from
Aug 30, 2017

Conversation

sosiouxme
Copy link
Member

@sosiouxme sosiouxme commented Aug 25, 2017

Set a 10 second timeout when using skopeo to inspect remote registries,
so that it does not wait for a tcp timeout to fail if they are unreachable.

(I am hesitant to set it any lower although most of the time this should return in a couple seconds... but I do not want to add more opportunities for network flakiness to break things.)

Aimed at improving UX for disconnected installs, per https://bugzilla.redhat.com/show_bug.cgi?id=1480195

@sosiouxme sosiouxme requested a review from rhcarvalho August 25, 2017 20:33
@sosiouxme
Copy link
Member Author

aos-ci-test

@sosiouxme sosiouxme requested a review from juanvallejo August 25, 2017 20:37
@@ -168,7 +168,7 @@ def is_available_skopeo_image(self, image, default_registries):
registries = [registry]

for registry in registries:
args = {"_raw_params": "skopeo inspect --tls-verify=false docker://{}/{}".format(registry, image)}
args = {"_raw_params": "timeout 10 skopeo inspect --tls-verify=false docker://{}/{}".format(registry, image)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would have been nice if a timeout flag was supported directly in the inspect subcommand :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Or in the ansible command module. Alas.
However "timeout" comes from coreutils and does the job fine.

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 337003d (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 337003d (logs)

Set a 10 second timeout when using skopeo to inspect remote registries,
so that it does not wait for a tcp timeout to fail if they are unreachable.
@sosiouxme
Copy link
Member Author

aos-ci-test

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for acf014f (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for acf014f (logs)

@sosiouxme
Copy link
Member Author

[merge]

@openshift-bot
Copy link

[test]ing while waiting on the merge queue

@openshift-bot
Copy link

Evaluated for openshift ansible test up to acf014f

Copy link
Contributor

@rhcarvalho rhcarvalho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-bot
Copy link

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/546/) (Base Commit: f51bcec) (PR Branch Commit: acf014f)

@rhcarvalho
Copy link
Contributor

Flake openshift/origin#16013 and unexpected tox error.

...
py35-flake8 create: /data/src/github.com/openshift/openshift-ansible/.tox/py35-flake8
Traceback (most recent call last):
  File "/usr/bin/tox", line 11, in <module>
    sys.exit(cmdline())
  File "/usr/lib/python2.7/site-packages/tox/session.py", line 39, in main
    retcode = Session(config).runcommand()
  File "/usr/lib/python2.7/site-packages/tox/session.py", line 392, in runcommand
    return self.subcommand_test()
  File "/usr/lib/python2.7/site-packages/tox/session.py", line 543, in subcommand_test
    if self.setupenv(venv):
  File "/usr/lib/python2.7/site-packages/tox/session.py", line 451, in setupenv
    status = venv.update(action=action)
  File "/usr/lib/python2.7/site-packages/tox/venv.py", line 167, in update
    self.hook.tox_testenv_create(action=action, venv=self)
  File "/usr/lib/python2.7/site-packages/pluggy/__init__.py", line 680, in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
  File "/usr/lib/python2.7/site-packages/pluggy/__init__.py", line 240, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/usr/lib/python2.7/site-packages/pluggy/__init__.py", line 234, in <lambda>
    methods, kwargs, specopts=hook.spec_opts, hook=hook
  File "/usr/lib/python2.7/site-packages/pluggy/callers.py", line 110, in execute
    return outcome.get_result()[0]
  File "/usr/lib/python2.7/site-packages/pluggy/callers.py", line 53, in get_result
    _reraise(*ex)  # noqa
  File "/usr/lib/python2.7/site-packages/pluggy/callers.py", line 91, in execute
    res = hook_impl.function(*args)
  File "/usr/lib/python2.7/site-packages/tox/venv.py", line 426, in tox_testenv_create
    config_interpreter = venv.getsupportedinterpreter()
  File "/usr/lib/python2.7/site-packages/tox/venv.py", line 206, in getsupportedinterpreter
    return self.envconfig.getsupportedinterpreter()
  File "/usr/lib/python2.7/site-packages/tox/config.py", line 653, in getsupportedinterpreter
    info = self.config.interpreters.get_info(envconfig=self)
  File "/usr/lib/python2.7/site-packages/tox/interpreters.py", line 28, in get_info
    executable = self.get_executable(envconfig)
  File "/usr/lib/python2.7/site-packages/tox/interpreters.py", line 23, in get_executable
    exe = self.hook.tox_get_python_executable(envconfig=envconfig)
  File "/usr/lib/python2.7/site-packages/pluggy/__init__.py", line 680, in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
  File "/usr/lib/python2.7/site-packages/pluggy/__init__.py", line 240, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/usr/lib/python2.7/site-packages/pluggy/__init__.py", line 234, in <lambda>
    methods, kwargs, specopts=hook.spec_opts, hook=hook
  File "/usr/lib/python2.7/site-packages/pluggy/callers.py", line 110, in execute
    return outcome.get_result()[0]
IndexError: list index out of range
++ export status=FAILURE
++ status=FAILURE
+ set +o xtrace
########## FINISHED STAGE: FAILURE: RUN UNIT TESTS [00h 02m 26s] ##########

Let's try again, [merge]

@openshift-bot
Copy link

Evaluated for openshift ansible merge up to acf014f

@openshift-bot
Copy link

continuous-integration/openshift-jenkins/merge FAILURE (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/943/) (Base Commit: 248f75d) (PR Branch Commit: acf014f)

@sosiouxme
Copy link
Member Author

sosiouxme commented Aug 30, 2017

https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/943/

Looks like only flakes openshift/origin#12137 and openshift/origin#10773

@sdodson or @rhcarvalho can we just merge this and not waste more queue time...

@rhcarvalho rhcarvalho merged commit c749ed9 into openshift:master Aug 30, 2017
@fridim
Copy link

fridim commented Jan 19, 2018

 "_raw_params": "timeout 10 skopeo inspect --tls-verify=false "

Could this timeout value be a parameter somehow? we have some deployment failures "image not available" in our CI due to network or upstream congestion. It especially happens with a lot of concurrent deployments.
IMO it would be better to be able to increase the timeout or the number of retries instead of deactivating the check, wdty?

@sosiouxme
Copy link
Member Author

sosiouxme commented Jan 19, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants