Skip to content

Disable oc start-build timeout #12438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
f-w opened this issue Jan 10, 2017 · 4 comments · Fixed by #12484
Closed

Disable oc start-build timeout #12438

f-w opened this issue Jan 10, 2017 · 4 comments · Fixed by #12484
Assignees
Labels
component/build kind/bug Categorizes issue or PR as related to a bug. priority/P1

Comments

@f-w
Copy link

f-w commented Jan 10, 2017

Hi,
Our on-premise OS is experiencing extended build start delay waiting for pod allocation after a recent upgrade. Because oc start-build from binary source has a time-out of 5min (bumped up from 1min thanks to #7489), when the build finally started on server, the oc client has quit, unaware of by server, which continues to wait for client to upload the binary indefinitely. In my case oc runs from Jenkins as part of CI. This leads to a chain of failures because the build strategy is set to serial and this first failed build due to timeout blocks subsequent builds.

In the above scenario, setting a hard-coded timeout on oc client-side only doesn't seem needed in first place. User can go to OS console to find out what get stuck. I suggest to make one of the following changes:

  1. Disable timeout all together
  2. Allow change by making it a oc start-build optional parameter
  3. Enforce the timeout on both client and server-side.
Version
$ oc version
oc v1.3.1
kubernetes v1.3.0+52492b4
features: Basic-Auth

Server https://console.pathfinder.gov.bc.ca:8443
openshift v3.3.1.7
kubernetes v1.3.0+52492b4
@pweil- pweil- added the kind/bug Categorizes issue or PR as related to a bug. label Jan 11, 2017
@bparees
Copy link
Contributor

bparees commented Jan 11, 2017

i'm leaning towards updating the binary instantiate logic to, upon timing out while waiting for the build/pod, cancel the build it created.

@bparees
Copy link
Contributor

bparees commented Jan 11, 2017

(I should also note that a cluster that takes 5 minutes to start a pod would seem to be unhealthy or under provisioned and should be investigated from that perspective as well).

@f-w
Copy link
Author

f-w commented Jan 11, 2017

The delay start is a separate issue being investigated. run oc cancel-build when timeout is a good idea I can adopt. But whether the benefit of imposing a timeout on client-side outweighing the harm is still questionable, especially if it's hard-coded. Had the timeout been removed, I wouldn't have had the build failures. I don't mind to wait a few extra minutes, knowing the pods always get allocated eventually. It's also trivial to use unix timeout if needed to avoid hard-coding.

@bparees
Copy link
Contributor

bparees commented Jan 11, 2017

the 5 minute timeout is a server side timeout. it's the api server waiting 5 minutes for the pod to start so it can stream the content coming from the client, into the pod.

i'm not sure exactly what's timing out on the client side (I haven't dug into the code) but it's probably a result of the api server closing the connection after it spends 5 minutes waiting for the pod to be available. The fundamental problem is that the api server doesn't cancel the build when it abandons it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/build kind/bug Categorizes issue or PR as related to a bug. priority/P1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants