You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Our on-premise OS is experiencing extended build start delay waiting for pod allocation after a recent upgrade. Because oc start-build from binary source has a time-out of 5min (bumped up from 1min thanks to #7489), when the build finally started on server, the oc client has quit, unaware of by server, which continues to wait for client to upload the binary indefinitely. In my case oc runs from Jenkins as part of CI. This leads to a chain of failures because the build strategy is set to serial and this first failed build due to timeout blocks subsequent builds.
In the above scenario, setting a hard-coded timeout on oc client-side only doesn't seem needed in first place. User can go to OS console to find out what get stuck. I suggest to make one of the following changes:
Disable timeout all together
Allow change by making it a oc start-build optional parameter
Enforce the timeout on both client and server-side.
Version
$ oc version
oc v1.3.1
kubernetes v1.3.0+52492b4
features: Basic-Auth
Server https://console.pathfinder.gov.bc.ca:8443
openshift v3.3.1.7
kubernetes v1.3.0+52492b4
The text was updated successfully, but these errors were encountered:
(I should also note that a cluster that takes 5 minutes to start a pod would seem to be unhealthy or under provisioned and should be investigated from that perspective as well).
The delay start is a separate issue being investigated. run oc cancel-build when timeout is a good idea I can adopt. But whether the benefit of imposing a timeout on client-side outweighing the harm is still questionable, especially if it's hard-coded. Had the timeout been removed, I wouldn't have had the build failures. I don't mind to wait a few extra minutes, knowing the pods always get allocated eventually. It's also trivial to use unix timeout if needed to avoid hard-coding.
the 5 minute timeout is a server side timeout. it's the api server waiting 5 minutes for the pod to start so it can stream the content coming from the client, into the pod.
i'm not sure exactly what's timing out on the client side (I haven't dug into the code) but it's probably a result of the api server closing the connection after it spends 5 minutes waiting for the pod to be available. The fundamental problem is that the api server doesn't cancel the build when it abandons it.
Uh oh!
There was an error while loading. Please reload this page.
Hi,
Our on-premise OS is experiencing extended build start delay waiting for pod allocation after a recent upgrade. Because
oc start-build
from binary source has a time-out of 5min (bumped up from 1min thanks to #7489), when the build finally started on server, the oc client has quit, unaware of by server, which continues to wait for client to upload the binary indefinitely. In my case oc runs from Jenkins as part of CI. This leads to a chain of failures because the build strategy is set to serial and this first failed build due to timeout blocks subsequent builds.In the above scenario, setting a hard-coded timeout on oc client-side only doesn't seem needed in first place. User can go to OS console to find out what get stuck. I suggest to make one of the following changes:
oc start-build
optional parameterVersion
The text was updated successfully, but these errors were encountered: