Disable oc start-build timeout #12438

f-w · 2017-01-10T18:32:43Z

Hi,
Our on-premise OS is experiencing extended build start delay waiting for pod allocation after a recent upgrade. Because oc start-build from binary source has a time-out of 5min (bumped up from 1min thanks to #7489), when the build finally started on server, the oc client has quit, unaware of by server, which continues to wait for client to upload the binary indefinitely. In my case oc runs from Jenkins as part of CI. This leads to a chain of failures because the build strategy is set to serial and this first failed build due to timeout blocks subsequent builds.

In the above scenario, setting a hard-coded timeout on oc client-side only doesn't seem needed in first place. User can go to OS console to find out what get stuck. I suggest to make one of the following changes:

Disable timeout all together
Allow change by making it a oc start-build optional parameter
Enforce the timeout on both client and server-side.

Version

$ oc version
oc v1.3.1
kubernetes v1.3.0+52492b4
features: Basic-Auth

Server https://console.pathfinder.gov.bc.ca:8443
openshift v3.3.1.7
kubernetes v1.3.0+52492b4

The text was updated successfully, but these errors were encountered:

bparees · 2017-01-11T15:21:20Z

i'm leaning towards updating the binary instantiate logic to, upon timing out while waiting for the build/pod, cancel the build it created.

bparees · 2017-01-11T15:23:44Z

(I should also note that a cluster that takes 5 minutes to start a pod would seem to be unhealthy or under provisioned and should be investigated from that perspective as well).

f-w · 2017-01-11T17:32:39Z

The delay start is a separate issue being investigated. run oc cancel-build when timeout is a good idea I can adopt. But whether the benefit of imposing a timeout on client-side outweighing the harm is still questionable, especially if it's hard-coded. Had the timeout been removed, I wouldn't have had the build failures. I don't mind to wait a few extra minutes, knowing the pods always get allocated eventually. It's also trivial to use unix timeout if needed to avoid hard-coding.

bparees · 2017-01-11T17:54:07Z

the 5 minute timeout is a server side timeout. it's the api server waiting 5 minutes for the pod to start so it can stream the content coming from the client, into the pod.

i'm not sure exactly what's timing out on the client side (I haven't dug into the code) but it's probably a result of the api server closing the connection after it spends 5 minutes waiting for the pod to be available. The fundamental problem is that the api server doesn't cancel the build when it abandons it.

pweil- added component/build priority/P1 labels Jan 11, 2017

pweil- assigned bparees Jan 11, 2017

pweil- added the kind/bug Categorizes issue or PR as related to a bug. label Jan 11, 2017

bparees mentioned this issue Jan 13, 2017

cancel binary builds if they hang #12484

Merged

openshift-bot closed this as completed in #12484 Jan 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Disable oc start-build timeout #12438

Disable oc start-build timeout #12438

f-w commented Jan 10, 2017 •

edited

Loading

bparees commented Jan 11, 2017

Uh oh!

bparees commented Jan 11, 2017

Uh oh!

f-w commented Jan 11, 2017 •

edited

Loading

Uh oh!

bparees commented Jan 11, 2017

Uh oh!

Disable oc start-build timeout #12438

Disable oc start-build timeout #12438

Comments

f-w commented Jan 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Version

bparees commented Jan 11, 2017

Uh oh!

bparees commented Jan 11, 2017

Uh oh!

f-w commented Jan 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bparees commented Jan 11, 2017

Uh oh!

f-w commented Jan 10, 2017 •

edited

Loading

f-w commented Jan 11, 2017 •

edited

Loading