Router fails to get endpoints during e2e #8403

smarterclayton · 2016-04-07T18:00:20Z

https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/5537/

Happening fairly frequently

There was no error output from the command.
[INFO] Back to 'default' project with 'admin' user...
Running test/end-to-end/core.sh:286: executing 'oc project default/172-18-4-219:8443/system:admin' expecting success...
SUCCESS after 0.140s: test/end-to-end/core.sh:286: executing 'oc project default/172-18-4-219:8443/system:admin' expecting success
Standard output from the command:
Now using project "default" from context named "default/172-18-4-219:8443/system:admin" on server "https://172.18.4.219:8443".
There was no error output from the command.
Running test/end-to-end/core.sh:290: executing 'oc get endpoints router --output-version=v1beta3 --template='{{ if .subsets }}{{ len .subsets }}{{ else }}0{{ end }}'' expecting any result and text '[1-9]+'; re-trying every 0.2s until completion or 300.000s...
FAILURE after 299.580s: test/end-to-end/core.sh:290: executing 'oc get endpoints router --output-version=v1beta3 --template='{{ if .subsets }}{{ len .subsets }}{{ else }}0{{ end }}'' expecting any result and text '[1-9]+'; re-trying every 0.2s until completion or 300.000s: the command timed out
There was no output from the command.
There was no error output from the command.

The text was updated successfully, but these errors were encountered:

smarterclayton · 2016-04-07T18:02:31Z

Early in the run:

Using project "default".
There was no error output from the command.
Running test/end-to-end/core.sh:108: executing 'oc whoami' expecting success and text 'e2e-user'...
SUCCESS after 0.095s: test/end-to-end/core.sh:108: executing 'oc whoami' expecting success and text 'e2e-user'
Standard output from the command:
e2e-user
There was no error output from the command.
Running test/end-to-end/core.sh:111: executing 'oc status -n default' expecting success...
SUCCESS after 0.252s: test/end-to-end/core.sh:111: executing 'oc status -n default' expecting success
Standard output from the command:
In project default on server https://172.18.4.219:8443

svc/docker-registry - 172.30.48.168:5000
  dc/docker-registry deploys docker.io/openshift/origin-docker-registry:53ac3df 
    deployment #1 deployed 53 seconds ago - 1 pod

svc/kubernetes - 172.30.0.1 ports 443, 53, 53

svc/router - 172.30.18.235 ports 80, 443, 1936
  dc/router deploys docker.io/openshift/origin-haproxy-router:53ac3df 
    deployment #2 deployed 53 seconds ago - 0 pods
    deployment #1 failed 53 seconds ago: cancelled as a newer deployment was found running - 1 pod

I0407 17:38:27.261965    4763 prober.go:104] Liveness probe for "router-1-0nvm4_default(52d2f7a7-fce7-11e5-b5a6-0e870c302687):router" failed (failure): Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused
I0407 17:38:27.262359    4763 server.go:606] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"router-1-0nvm4", UID:"52d2f7a7-fce7-11e5-b5a6-0e870c302687", APIVersion:"v1", ResourceVersion:"607", FieldPath:"spec.containers{router}"}): typ
e: 'Warning' reason: 'Unhealthy' Liveness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused
I0407 17:38:27.262463    4763 prober.go:104] Readiness probe for "router-1-0nvm4_default(52d2f7a7-fce7-11e5-b5a6-0e870c302687):router" failed (failure): Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused
I0407 17:38:27.262497    4763 server.go:606] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"router-1-0nvm4", UID:"52d2f7a7-fce7-11e5-b5a6-0e870c302687", APIVersion:"v1", ResourceVersion:"607", FieldPath:"spec.containers{router}"}): typ
e: 'Warning' reason: 'Unhealthy' Readiness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused
I0407 17:38:28.005261    4763 manager.go:1400] Container "a97340019c2b7329c7a8ecbe7f65407b7838b4fe72657b3f6e3eefdfa763d707 router default/router-1-0nvm4" exited after 929.656383ms
I0407 17:38:28.005746    4763 server.go:606] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"router-1-0nvm4", UID:"52d2f7a7-fce7-11e5-b5a6-0e870c302687", APIVersion:"v1", ResourceVersion:"607", FieldPath:"spec.containers{router}"}): typ
e: 'Normal' reason: 'Killing' Killing container with docker id a97340019c2b: Need to kill pod.
I0407 17:38:28.322312    4763 manager.go:1368] Killing container "32fa2c8be13a0c35bf120ba21264667bd75e5dbc29035a55dcb552e656457c3f default/router-1-0nvm4" with 30 second grace period
I0407 17:38:28.540623    4763 helpers.go:101] Unable to get network stats from pid 12216: couldn't read network stats: failure opening /rootfs/proc/12216/net/dev: open /rootfs/proc/12216/net/dev: no such file or directory
I0407 17:38:28.691022    4763 generic.go:138] GenericPLEG: 52d2f7a7-fce7-11e5-b5a6-0e870c302687/a97340019c2b7329c7a8ecbe7f65407b7838b4fe72657b3f6e3eefdfa763d707: running -> exited
I0407 17:38:28.751013    4763 manager.go:855] Destroyed container: "/system.slice/docker-32fa2c8be13a0c35bf120ba21264667bd75e5dbc29035a55dcb552e656457c3f.scope" (aliases: [k8s_POD.fe97dd62_router-1-0nvm4_default_52d2f7a7-fce7-11e5-b5a6-0e870c302687_e892
7fd6 32fa2c8be13a0c35bf120ba21264667bd75e5dbc29035a55dcb552e656457c3f], namespace: "docker")
I0407 17:38:28.751104    4763 handler.go:320] Added event &{/system.slice/docker-32fa2c8be13a0c35bf120ba21264667bd75e5dbc29035a55dcb552e656457c3f.scope 2016-04-07 17:38:28.751090997 +0000 UTC containerDeletion {<nil>}}
I0407 17:38:28.931523    4763 manager.go:1400] Container "32fa2c8be13a0c35bf120ba21264667bd75e5dbc29035a55dcb552e656457c3f default/router-1-0nvm4" exited after 609.176107ms

@Kargakis @ironcladlou should this happen in e2e? Getting two deployments of the router?

smarterclayton · 2016-04-07T18:04:38Z

NM, that's because of how the router install works. It's triggering a reload by changing the env for the syn eater.

smarterclayton · 2016-04-07T18:09:47Z

This seems wrong:

I0407 17:36:47.737760    4763 factory.go:272] About to try and schedule pod router-2-kna27
...
I0407 17:36:56.208057    4763 factory.go:272] About to try and schedule pod router-1-0nvm4

smarterclayton · 2016-04-07T18:14:57Z

I0407 17:36:33.449502    4763 factory.go:272] About to try and schedule pod router-1-deploy
I0407 17:36:33.744055    4763 factory.go:272] About to try and schedule pod router-2-deploy
I0407 17:36:34.531094    4763 factory.go:272] About to try and schedule pod router-1-deploy
I0407 17:36:34.748442    4763 factory.go:272] About to try and schedule pod router-2-deploy
I0407 17:36:36.534555    4763 factory.go:272] About to try and schedule pod router-1-deploy
I0407 17:36:36.752137    4763 factory.go:272] About to try and schedule pod router-2-deploy
I0407 17:36:40.537860    4763 factory.go:272] About to try and schedule pod router-1-deploy
I0407 17:36:40.758626    4763 factory.go:272] About to try and schedule pod router-2-deploy
I0407 17:36:47.737760    4763 factory.go:272] About to try and schedule pod router-2-kna27
I0407 17:36:56.208057    4763 factory.go:272] About to try and schedule pod router-1-0nvm4
I0407 17:36:57.211333    4763 factory.go:272] About to try and schedule pod router-1-0nvm4
I0407 17:36:59.214447    4763 factory.go:272] About to try and schedule pod router-1-0nvm4
I0407 17:37:03.217359    4763 factory.go:272] About to try and schedule pod router-1-0nvm4
I0407 17:37:05.825139    4763 kubelet.go:2430] SyncLoop (PLEG): "router-2-deploy_default(456e572f-fce7-11e5-b5a6-0e870c302687)", event: &pleg.PodLifecycleEvent{ID:"456e572f-fce7-11e5-b5a6-0e870c302687", Type:"ContainerStarted", Data:"1f02c778a6f5825547b
I0407 17:37:06.354535    4763 kubelet.go:2430] SyncLoop (PLEG): "router-1-deploy_default(45435414-fce7-11e5-b5a6-0e870c302687)", event: &pleg.PodLifecycleEvent{ID:"45435414-fce7-11e5-b5a6-0e870c302687", Type:"ContainerStarted", Data:"03befa80c633bc4a41b
I0407 17:37:11.225722    4763 factory.go:272] About to try and schedule pod router-1-0nvm4
I0407 17:37:12.443523    4763 kubelet.go:2430] SyncLoop (PLEG): "router-2-deploy_default(456e572f-fce7-11e5-b5a6-0e870c302687)", event: &pleg.PodLifecycleEvent{ID:"456e572f-fce7-11e5-b5a6-0e870c302687", Type:"ContainerDied", Data:"d41e3a850987f70cdf9db6
I0407 17:37:27.229644    4763 factory.go:272] About to try and schedule pod router-1-0nvm4
I0407 17:37:49.132053    4763 kubelet.go:2430] SyncLoop (PLEG): "router-1-deploy_default(45435414-fce7-11e5-b5a6-0e870c302687)", event: &pleg.PodLifecycleEvent{ID:"45435414-fce7-11e5-b5a6-0e870c302687", Type:"ContainerDied",

smarterclayton · 2016-04-07T18:22:30Z

So it looks like we schedule both deployment pods. The second one? starts running, and creates at least one pod in the second group. It scales up, then for some reason gets scaled down (failed, death?) at

I0407 17:36:54.011987    4763 replication_controller.go:451] Too many "default"/"router-2" replicas, need 0, deleting 1

It is incredibly worrying to me that we don't see the logs of the failed pods - it may be because they were empty, but not seeing how the deployment logs would be empty at all.

smarterclayton · 2016-04-07T18:41:33Z

Looks like we had two deployment pods running at the same time. That shouldn't be happening?

smarterclayton · 2016-04-07T18:41:47Z

Moving this to @pweil-'s team for triage

smarterclayton · 2016-04-07T18:44:13Z

deployment-failure-logs-20160407.zip

pweil- · 2016-04-07T23:40:58Z

@Kargakis - I haven't been able to reproduce this yet. Of note in the logs, there were some scheduling issues due to the node not being registered. I tried to reproduce that by deleting my node and re-adding right after a back off for scheduling deploy-1 after I triggered a config change but that didn't seem to work.

Sending your way to continue looking. Will check in in the morning

smarterclayton · 2016-04-07T23:44:01Z

May be useful to run only install_router and then wait for endpoints, then tear it down over and over. The deploy race doesn't seem to be correlated with other workloads.

0xmichalis · 2016-04-08T07:15:50Z

The second deployment comes from the config generator

17:36:33.557873    4763 controller.go:89] Updated DeploymentConfig default/router:1 from version 1 to 2 for existing deployment default/router-1

https://github.com/openshift/origin/blob/28e2e753a278d5c07191f8aeae19273fa282b7c5/pkg/deploy/controller/configchange/controller.go#L80-l89

What might have changed the router template between 1 and 2? Or is it probably a race? Looking

0xmichalis · 2016-04-08T07:41:54Z

Updated deployment default/router-1 status from Failed to Complete (scale: 1)

ouch

0xmichalis · 2016-04-08T08:19:12Z

It seems that the first deployment is cancelled by the deploymentconfig controller because it noticed the second version:

origin/pkg/deploy/controller/deploymentconfig/controller.go

Line 94 in 28e2e75

    
           deployment.Annotations[deployapi.DeploymentCancelledAnnotation] = deployapi.DeploymentCancelledAnnotationValue

The deployment correctly switches from New to Failed:

controller.go:234] Updated deployment default/router-1 status from New to Failed (scale: 0)

Later, the already existing deployer is being scheduled

About to try and schedule pod router-1-deploy

The deployer controller and the deployer don't care a lot about the deployment phase so eventually the deployer pod will complete successfully and switch the deployment from Failed to Complete.

Still not sure why the second deployment started - we probably were hitting it since ever but after #8163 we started cleaning up cancel annotation on successful deployers - I think we should ensure the deployment phase direction for now and fail deployers if a deployment is already failed.

0xmichalis · 2016-04-11T11:21:10Z

The second deployment happens because of

origin/hack/util.sh

Line 628 in 28e2e75

oc set env dc/router -c router DROP_SYN_DURING_RESTART=true

I think 8417 and 8418 should be enough for this

smarterclayton · 2016-04-12T00:07:21Z

Saw this happen again https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin/14441/artifact/origin/artifacts/test-end-to-end-docker/logs/container-router-1-deploy-deployment.log

I0411 23:36:00.351914       1 deployer.go:194] Scaled down prior deployment default/router-2
I0411 23:36:00.353956       1 deployer.go:200] Deploying default/router-1 for the first time (replicas: 1)
I0411 23:36:00.354481       1 recreate.go:126] Scaling default/router-1 to 1 before performing acceptance check
I0411 23:36:02.385391       1 recreate.go:131] Performing acceptance check of default/router-1
I0411 23:36:02.385576       1 lifecycle.go:445] Waiting 600 seconds for pods owned by deployment "default/router-1" to become ready (checking every 1 seconds; 0 pods previously accepted)

ironcladlou · 2016-04-12T00:23:14Z

@Kargakis One thing I don't think we accounted for is the strategy code itself checking to ensure the deployment it's about to execute is still valid. Even though we take steps to prevent their scheduling in the first place, and handle the outcomes more correctly, it's still possible the invalidated deployer pod gets scheduled and then spins up before deadlines or deletions are effective. I'm thinking the strategy code itself needs to inspect the state of things and bail.

This is evidenced by the output @smarterclayton brought: the strategy code knew about the later deployment which should invalidate the executing code, but instead just scales it down and keeps going.

ironcladlou · 2016-04-12T00:25:22Z

I think by this point we have more than enough context to exit early if we detect the deployment is superceded.

smarterclayton · 2016-04-12T02:05:58Z

Question is - are we running multiple here, or just one? And what changed that suddenly we're running parallel deployments? Is this new? Old? Why did this start happening now?

ironcladlou · 2016-04-12T11:27:21Z

Looked to me like router-2 deployed successfully, and then the router-1-deploy pod got scheduled and executed. The common deployer logic which detects the "from" deployment and "old" deployments has a problem: it considers the latest of all completed deployments not equal to the target (router-1 in this case) to be the "from" and everything else to be "old". So if router-2 is completed before router-1-deploy is scheduled and router-1-deploy gets an opportunity to execute, router-2 is considered "from". I think version checking here will correct it.

I'm not sure it has anything to do with parallelism- it still seems to be related to a pending deployer pod getting hung up in scheduling and getting a chance to run after a newer deployment was already created and deployed.

@Kargakis something else I noticed after a fresh look this morning is that the deploymentConfigController should be waiting for cancelled deployments to have an outcome before creating the new one. If the old deployment has a deployer pod hanging out in pending awaiting scheduling, I'm not sure why a new version would even be created leading to a new version/deployer pod unless the old deployment erroneously got updated to failed/complete before its deployer pod finished.

So in addition to the common deployer code not doing the right thing when we get into this situation, it's not yet clear to me how router-2 was allowed to exist to begin with prior to the completion of router-1.

ironcladlou · 2016-04-12T12:18:50Z

Didn't realize @Kargakis is out of the office- I'll be digging into this later in the morning.

0xmichalis · 2016-04-12T14:10:07Z

Instead of waiting for previous cancelled deployments, can't we just delete
any older deployers if we find out there is a newer deployment? Or cleanup
on failed cancelled:
#8439 (comment)

Also, the deployer code seems broken.

On Tue, Apr 12, 2016 at 2:27 PM, Dan Mace [email protected] wrote:

Looked to me like router-2 deployed successfully, and then the
router-1-deploy pod got scheduled and executed. The common deployer logic
which detects the "from" deployment and "old" deployments has a problem: it
considers the latest of all completed deployments not equal to the target
(router-1 in this case) to be the "from" and everything else to be "old".
So if router-2 is completed before router-1-deploy is scheduled and
router-1-deploy gets an opportunity to execute, router-2 is considered
"from". I think version checking here will correct it.

I'm not sure it has anything to do with parallelism- it still seems to be
related to a pending deployer pod getting hung up in scheduling and getting
a chance to run after a newer deployment was already created and deployed.

@Kargakis https://github.com/kargakis something else I noticed after a
fresh look this morning is that the deploymentConfigController should be
waiting for cancelled deployments to have an outcome
https://github.com/openshift/origin/blob/master/pkg/deploy/controller/deploymentconfig/controller.go#L88
before creating the new one. If the old deployment has a deployer pod
hanging out in pending awaiting scheduling, I'm not sure why a new version
would even be created leading to a new version/deployer pod unless the old
deployment erroneously got updated to failed/complete before its deployer
pod finished.

So in addition to the common deployer code not doing the right thing when
we get into this situation, it's not yet clear to me how router-2 was
allowed to exist to begin with prior to the completion of router-1.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8403 (comment)

ironcladlou · 2016-04-12T14:11:45Z

Thinking about this yet again, patching the deployer CLI is only a bandaid. The router-2 RC shouldn't exist since router-1-deploy exists/is non-terminal and router-1 is presumably not complete/failed. That seems to be the root issue from what I can see.

0xmichalis · 2016-04-12T14:13:41Z

Wouldn't it be confusing if a second deployment is expected but we still
hang on the first?

On Tue, Apr 12, 2016 at 5:11 PM, Dan Mace [email protected] wrote:

Thinking about this yet again, patching the deployer CLI is only a
bandaid. The router-2 RC shouldn't exist since router-1-deploy exists/is
non-terminal and router-1 is presumably not complete/failed. That seems to
be the root issue from what I can see.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8403 (comment)

smarterclayton · 2016-04-12T14:14:42Z

We should NEVER have more than one deployer pod running at a time. Ever.

On Tue, Apr 12, 2016 at 10:13 AM, Michail Kargakis <[email protected]

wrote:

Wouldn't it be confusing if a second deployment is expected but we still
hang on the first?

On Tue, Apr 12, 2016 at 5:11 PM, Dan Mace [email protected]
wrote:

Thinking about this yet again, patching the deployer CLI is only a
bandaid. The router-2 RC shouldn't exist since router-1-deploy exists/is
non-terminal and router-1 is presumably not complete/failed. That seems
to
be the root issue from what I can see.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8403 (comment)

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8403 (comment)

0xmichalis · 2016-04-12T14:15:46Z

Agreed. That;s why I am proposing deletion of older deployers when a newer
deployment is detected.

On Tue, Apr 12, 2016 at 5:14 PM, Clayton Coleman [email protected]
wrote:

We should NEVER have more than one deployer pod running at a time. Ever.

On Tue, Apr 12, 2016 at 10:13 AM, Michail Kargakis <
[email protected]

wrote:

Wouldn't it be confusing if a second deployment is expected but we still
hang on the first?

On Tue, Apr 12, 2016 at 5:11 PM, Dan Mace [email protected]
wrote:

Thinking about this yet again, patching the deployer CLI is only a
bandaid. The router-2 RC shouldn't exist since router-1-deploy
exists/is
non-terminal and router-1 is presumably not complete/failed. That seems
to
be the root issue from what I can see.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
<
https://github.com/openshift/origin/issues/8403#issuecomment-208925491>

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8403 (comment)

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8403 (comment)

ironcladlou · 2016-04-12T14:31:02Z

This is a problem: https://github.com/openshift/origin/blob/master/pkg/deploy/controller/deployment/controller.go#L61

If we set the RC phase to failed without waiting for the deployer pods to resolve to a terminal phase, the deployment config controller will see the RC in a terminal phase and decide that there are no outstanding deployments, and make a new one.

I don't think we should ever transition the RC phase to a terminal phase unless:

We observed completion of all deployer pods, or
The deployer pods vanished

Otherwise we can't rely on the RC terminal phases as being true in terms of deployer pods.

0xmichalis · 2016-04-12T14:34:47Z

Good catch, I think this is the only place we do such a thing and never
felt good about it.

On Tue, Apr 12, 2016 at 5:31 PM, Dan Mace [email protected] wrote:

This is a problem:
https://github.com/openshift/origin/blob/master/pkg/deploy/controller/deployment/controller.go#L61

If we set the RC phase to failed without waiting for the deployer pods to
resolve to a terminal phase, the deployment config controller will see the
RC in a terminal phase and decide that there are no outstanding
deployments, and make a new one.

I don't think we should ever transition the RC phase to a terminal phase
unless:

We observed completion of all deployer pods, or

The deployer pods vanished

Otherwise we can't rely on the RC terminal phases as being true in terms
of deployer pods.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8403 (comment)

ironcladlou · 2016-04-12T14:35:41Z

Actually, that failed update is only happening when the RC is in New, in which case how would any deployer pods have been created to cause an issue? Maybe I jumped the gun...

0xmichalis · 2016-04-12T14:39:02Z

We create the deployer when the deployment is New, right?

On Tue, Apr 12, 2016 at 5:35 PM, Dan Mace [email protected] wrote:

Actually, that failed update is only happening when the RC is in New, in
which case how would any deployer pods have been created to cause an issue?
Maybe I jumped the gun...

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8403 (comment)

ironcladlou · 2016-04-12T16:20:20Z

We create the deployer when the deployment is New, right?

I just noticed that we create the pod and then break without transitioning out of New, so if before the next sync there's a cancellation, we'll enter the scenario I describe. So yeah, I think my analysis still stands.

pweil- · 2016-04-14T12:16:30Z

@ironcladlou - assigning this to you since you're already working it and @Kargakis is on PTO until the end of the week.

ironcladlou · 2016-04-14T12:26:11Z

@smarterclayton is this now considered fixed as of #8478?

0xmichalis · 2016-04-14T19:41:30Z

@ironcladlou can we have an issue to fix the deployer too? Not a blocker but I think we should fix it for 1.3

ironcladlou · 2016-04-14T19:48:20Z

@Kargakis

can we have an issue to fix the deployer too? Not a blocker but I think we should fix it for 1.3

It'll always be racing with something (cancellation, other deployments it might just miss when looking, etc.)

Seems more valuable to keep on top of enforcing the "only 1 deployer pod should exist at a time" invariant externally.

It's still probably useful to have a "last line of defense" in the deployer code. Not sure about 1.3- I'll let @pweil- and @smarterclayton chime in there.

0xmichalis · 2016-04-14T19:51:46Z

Seems more valuable to keep on top of enforcing the "only 1 deployer pod should exist at a time" invariant externally.

Related: #8500

smarterclayton · 2016-04-14T23:59:45Z

Deployers should bail if their invariants are violated and they detect it. We can do more in 1.3. Leaving open to ensure its well and truly gone.

pweil- · 2016-04-15T12:34:06Z

We can do more in 1.3

removing from the blocker list for now then

0xmichalis · 2016-05-21T10:31:00Z

Opened #8972 for the deployer. Is there anything else tbd here?

smarterclayton · 2016-05-22T23:03:00Z

No.

smarterclayton added priority/P0 component/routing labels Apr 7, 2016

smarterclayton assigned knobunc Apr 7, 2016

smarterclayton assigned pweil- and unassigned knobunc Apr 7, 2016

pweil- assigned 0xmichalis and unassigned pweil- Apr 7, 2016

This was referenced Apr 8, 2016

deployer controller: ensure phase direction #8417

Merged

deployment controller: cancel deployers on new cancelled deployments #8418

Merged

openshift-bot closed this as completed in #8417 Apr 11, 2016

liggitt mentioned this issue Apr 11, 2016

Wrap oauth/login requests to clear in-memory session #8435

Merged

smarterclayton reopened this Apr 12, 2016

ironcladlou mentioned this issue Apr 12, 2016

Prevent concurrent deployer pod creation #8478

Merged

pweil- assigned ironcladlou and unassigned 0xmichalis Apr 14, 2016

pweil- added priority/P2 and removed priority/P0 labels Apr 15, 2016

smarterclayton added component/apps and removed component/routing labels Apr 15, 2016

0xmichalis mentioned this issue May 21, 2016

Version checking in the deployer #8972

Closed

smarterclayton closed this as completed May 22, 2016

Router fails to get endpoints during e2e #8403

Router fails to get endpoints during e2e #8403

Comments

smarterclayton commented Apr 7, 2016

smarterclayton commented Apr 7, 2016

Uh oh!

smarterclayton commented Apr 7, 2016

Uh oh!

smarterclayton commented Apr 7, 2016

Uh oh!

smarterclayton commented Apr 7, 2016

Uh oh!

smarterclayton commented Apr 7, 2016

Uh oh!

smarterclayton commented Apr 7, 2016

Uh oh!

smarterclayton commented Apr 7, 2016

Uh oh!

smarterclayton commented Apr 7, 2016

Uh oh!

pweil- commented Apr 7, 2016

Uh oh!

smarterclayton commented Apr 7, 2016 via email

Uh oh!

0xmichalis commented Apr 8, 2016

Uh oh!

0xmichalis commented Apr 8, 2016

Uh oh!

0xmichalis commented Apr 8, 2016

Uh oh!

0xmichalis commented Apr 11, 2016

Uh oh!

smarterclayton commented Apr 12, 2016

Uh oh!

ironcladlou commented Apr 12, 2016

Uh oh!

ironcladlou commented Apr 12, 2016

Uh oh!

smarterclayton commented Apr 12, 2016 via email

Uh oh!

ironcladlou commented Apr 12, 2016

Uh oh!

ironcladlou commented Apr 12, 2016

Uh oh!

0xmichalis commented Apr 12, 2016

Uh oh!

ironcladlou commented Apr 12, 2016

Uh oh!

0xmichalis commented Apr 12, 2016

Uh oh!

smarterclayton commented Apr 12, 2016

Uh oh!

0xmichalis commented Apr 12, 2016

Uh oh!

ironcladlou commented Apr 12, 2016

Uh oh!

0xmichalis commented Apr 12, 2016

Uh oh!

ironcladlou commented Apr 12, 2016

Uh oh!

0xmichalis commented Apr 12, 2016

Uh oh!

ironcladlou commented Apr 12, 2016

Uh oh!

pweil- commented Apr 14, 2016

Uh oh!

ironcladlou commented Apr 14, 2016

Uh oh!

0xmichalis commented Apr 14, 2016

Uh oh!

ironcladlou commented Apr 14, 2016

Uh oh!

0xmichalis commented Apr 14, 2016

Uh oh!

smarterclayton commented Apr 14, 2016

Uh oh!

pweil- commented Apr 15, 2016

Uh oh!

0xmichalis commented May 21, 2016

Uh oh!