-
Notifications
You must be signed in to change notification settings - Fork 306
Can't install OKD4.5 on UPI (static pods not created?) #275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
etcd-member on bootstrap has a lot of messages like:
Seems its not happy that the node registers itself with a shortname. Could you check DHCP settings? |
Well, it looked correctly:
Although I've changed DHCP to serve hostname as FQDN (like openshift 3.11 required; there's no mention of that in OKD4's docs). And installation seem to be progressing! Thanks. |
Interesting, I didn't expect this to be a requirement. @hexfusion OCP4 still requires nodes to have a FQDN as a hostname for etcd cluster to assemble, right? |
need a little bit to understand logs |
Hello I am facing very similar issue, where the etcd cluster is never fully formed during bootstrapping. Using okd ga but real bare metal servers.
Version
How reproducible 100% |
@mliker is it reproducible on latest stable release - https://origin-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4-stable/release/4.5.0-0.okd-2020-08-12-020541? |
@vrutkovs yes, pretty much $ oc get pods --all-namespaces -owide|grep etcd
openshift-etcd-operator etcd-operator-86658dff85-z4kz8 0/1 CrashLoopBackOff 5 17m 10.128.0.2 n1.devopsie.cloud <none> <none>
openshift-etcd etcd-n2.devopsie.cloud 3/4 CrashLoopBackOff 6 11m 135.181.21.52 n2.devopsie.cloud <none> <none>
openshift-etcd etcd-n3.devopsie.cloud 4/4 Running 2 13m 135.181.21.53 n3.devopsie.cloud <none> <none>
openshift-etcd installer-2-n2.devopsie.cloud 0/1 Completed 0 11m 10.130.0.12 n2.devopsie.cloud <none> <none>
openshift-etcd installer-2-n3.devopsie.cloud 0/1 Completed 0 13m 10.129.0.6 n3.devopsie.cloud <none> <none>
openshift-machine-config-operator etcd-quorum-guard-7bb76959df-cj28g 0/1 Running 0 13m 135.181.21.51 n1.devopsie.cloud <none> <none>
openshift-machine-config-operator etcd-quorum-guard-7bb76959df-w529d 0/1 Running 0 13m 135.181.21.52 n2.devopsie.cloud <none> <none>
openshift-machine-config-operator etcd-quorum-guard-7bb76959df-w7fc9 1/1 Running 0 13m 135.181.21.53 n3.devopsie.cloud <none> <none> Version $ openshift-install version
openshift-install 4.5.0-0.okd-2020-08-12-020541
built from commit 699277bb61706731d687b9e40700ebf4630b0851
release image quay.io/openshift/okd@sha256:6974c414be62aee4fde24fe47ccfff97c2854ddc37eb196f3f3bcda2fdec17b4
$ oc version
Client Version: 4.5.0-0.okd-2020-08-12-020541
Server Version: 4.5.0-0.okd-2020-08-12-020541
Kubernetes Version: v1.18.3 |
@mliker this is a different bug - etcd storage is not sufficient fast:
|
@vrutkovs right, sorry about that I used a slightly less beefy bootstrap machine. Re-ran it again and the result is the same as with GA version, two master etcd instances start but not the third one. |
on bootstrap and
first master. Seems |
Right ... How can I apply a static IP configuration to the masters (via machine config i presume)? There is a limitation in the server providers network where a point-to-point route needs to be injected or /32 netmask specified for the servers to be able to talk to one another; which will not be the case when the servers use dhcp ... I completely forgot about this since i never use dhcp with this provider ... |
Alright, the coreos-installer accepts kernel parameters to configure static network. This got me over the bootstrap part, however the install process does not want to complete. $ openshift-install --dir=. wait-for install-complete --log-level=debug
DEBUG OpenShift Installer 4.5.0-0.okd-2020-08-12-020541
DEBUG Built from commit 699277bb61706731d687b9e40700ebf4630b0851
DEBUG Fetching Install Config...
DEBUG Loading Install Config...
DEBUG Loading SSH Key...
DEBUG Loading Base Domain...
DEBUG Loading Platform...
DEBUG Loading Cluster Name...
DEBUG Loading Base Domain...
DEBUG Loading Platform...
DEBUG Loading Pull Secret...
DEBUG Loading Platform...
DEBUG Using Install Config loaded from state file
DEBUG Reusing previously-fetched Install Config
INFO Waiting up to 30m0s for the cluster at https://api.okd.devopsie.cloud:6443 to initialize...
DEBUG Still waiting for the cluster to initialize: Working towards 4.5.0-0.okd-2020-08-12-020541: 49% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.5.0-0.okd-2020-08-12-020541: 63% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.5.0-0.okd-2020-08-12-020541: 68% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.5.0-0.okd-2020-08-12-020541: 69% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.5.0-0.okd-2020-08-12-020541: 70% complete
DEBUG Still waiting for the cluster to initialize: Multiple errors are preventing progress:
* Could not update oauthclient "console" (352 of 584): the server does not recognize this resource, check extension API servers
* Could not update prometheusrule "openshift-cloud-credential-operator/cloud-credential-operator-alerts" (187 of 584): the server does not recognize this resource, check extension API servers
* Could not update prometheusrule "openshift-cluster-samples-operator/samples-operator-alerts" (304 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (499 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-machine-approver/cluster-machine-approver" (509 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-version/cluster-version-operator" (8 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-config-operator/config-operator" (114 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-image-registry/image-registry" (505 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-machine-api/cluster-autoscaler-operator" (219 of 584): the server does not recognize this resource, check extension API servers
DEBUG Still waiting for the cluster to initialize: Multiple errors are preventing progress:
* Cluster operator dns is reporting a failure: Not all desired DNS DaemonSets available
* Could not update oauthclient "console" (352 of 584): the server does not recognize this resource, check extension API servers
* Could not update prometheusrule "openshift-cloud-credential-operator/cloud-credential-operator-alerts" (187 of 584): the server does not recognize this resource, check extension API servers
* Could not update prometheusrule "openshift-cluster-samples-operator/samples-operator-alerts" (304 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (499 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-machine-approver/cluster-machine-approver" (509 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-version/cluster-version-operator" (8 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-config-operator/config-operator" (114 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-image-registry/image-registry" (505 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-machine-api/cluster-autoscaler-operator" (219 of 584): the server does not recognize this resource, check extension API servers
* deployment openshift-cluster-version/cluster-version-operator is not available MinimumReplicasUnavailable: Deployment does not have minimum availability.
DEBUG Still waiting for the cluster to initialize: Multiple errors are preventing progress:
* Cluster operator dns is reporting a failure: Not all desired DNS DaemonSets available
* Could not update oauthclient "console" (352 of 584): the server does not recognize this resource, check extension API servers
* Could not update prometheusrule "openshift-cloud-credential-operator/cloud-credential-operator-alerts" (187 of 584): the server does not recognize this resource, check extension API servers
* Could not update prometheusrule "openshift-cluster-samples-operator/samples-operator-alerts" (304 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (499 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-machine-approver/cluster-machine-approver" (509 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-version/cluster-version-operator" (8 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-config-operator/config-operator" (114 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-image-registry/image-registry" (505 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-machine-api/cluster-autoscaler-operator" (219 of 584): the server does not recognize this resource, check extension API servers
* deployment openshift-cluster-version/cluster-version-operator is not available MinimumReplicasUnavailable: Deployment does not have minimum availability. $ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication Unknown Unknown True 23m
cloud-credential True False False 26m
cluster-autoscaler
config-operator
console
csi-snapshot-controller 4.5.0-0.okd-2020-08-12-020541 False True False 21m
dns 4.5.0-0.okd-2020-08-12-020541 False True True 17m
etcd 4.5.0-0.okd-2020-08-12-020541 True False True 21m
image-registry
ingress
insights
kube-apiserver 4.5.0-0.okd-2020-08-12-020541 True True True 20m
kube-controller-manager 4.5.0-0.okd-2020-08-12-020541 True True True 21m
kube-scheduler 4.5.0-0.okd-2020-08-12-020541 True True True 20m
kube-storage-version-migrator 4.5.0-0.okd-2020-08-12-020541 False False False 17m
machine-api
machine-approver 4.5.0-0.okd-2020-08-12-020541 True False False 19m
machine-config 4.5.0-0.okd-2020-08-12-020541 False False True 6m34s
marketplace
monitoring
network 4.5.0-0.okd-2020-08-12-020541 True True True 23m
node-tuning 4.5.0-0.okd-2020-08-12-020541 False False False 17m
openshift-apiserver 4.5.0-0.okd-2020-08-12-020541 False False True 23m
openshift-controller-manager False True False 23m
openshift-samples
operator-lifecycle-manager 4.5.0-0.okd-2020-08-12-020541 True False False 22m
operator-lifecycle-manager-catalog 4.5.0-0.okd-2020-08-12-020541 True False False 22m
operator-lifecycle-manager-packageserver False True False 22m
service-ca 4.5.0-0.okd-2020-08-12-020541 True True False 23m
storage $ oc get pod --all-namespaces -owide|grep -v Completed
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
openshift-apiserver-operator openshift-apiserver-operator-d8f6754bc-w8fpb 1/1 Running 1 27m 10.128.0.10 n1.devopsie.cloud <none> <none>
openshift-apiserver apiserver-85f8557969-4dngw 1/1 Running 0 20m 10.129.0.18 n2.devopsie.cloud <none> <none>
openshift-apiserver apiserver-85f8557969-cnjdc 1/1 Running 0 20m 10.128.0.23 n1.devopsie.cloud <none> <none>
openshift-apiserver apiserver-85f8557969-wsb9l 1/1 Running 0 20m 10.130.0.15 n3.devopsie.cloud <none> <none>
openshift-authentication-operator authentication-operator-59b89c9558-m6859 1/1 Running 1 27m 10.129.0.2 n2.devopsie.cloud <none> <none>
openshift-cluster-machine-approver machine-approver-ff454df65-h2mc2 2/2 Running 0 27m 135.181.21.51 n1.devopsie.cloud <none> <none>
openshift-cluster-node-tuning-operator cluster-node-tuning-operator-5db74f66c9-4mcqk 1/1 Running 0 27m 10.128.0.9 n1.devopsie.cloud <none> <none>
openshift-cluster-node-tuning-operator tuned-gdp7n 1/1 Running 0 23m 135.181.21.51 n1.devopsie.cloud <none> <none>
openshift-cluster-node-tuning-operator tuned-wlwtv 1/1 Running 0 23m 135.181.21.53 n3.devopsie.cloud <none> <none>
openshift-cluster-node-tuning-operator tuned-x4b5g 1/1 Running 0 23m 135.181.21.52 n2.devopsie.cloud <none> <none>
openshift-cluster-storage-operator csi-snapshot-controller-operator-d75bd9698-gv5xc 1/1 Running 0 27m 10.130.0.8 n3.devopsie.cloud <none> <none>
openshift-cluster-version cluster-version-operator-59cf8c9687-lswcc 1/1 Running 0 27m 135.181.21.52 n2.devopsie.cloud <none> <none>
openshift-controller-manager-operator openshift-controller-manager-operator-6cc65588dc-9hwvc 1/1 Running 1 27m 10.128.0.7 n1.devopsie.cloud <none> <none>
openshift-dns-operator dns-operator-5644c89bf5-cqmgj 2/2 Running 0 27m 10.128.0.12 n1.devopsie.cloud <none> <none>
openshift-dns dns-default-bjx74 3/3 Running 0 22m 10.129.0.7 n2.devopsie.cloud <none> <none>
openshift-dns dns-default-j6bpx 3/3 Running 0 22m 10.128.0.16 n1.devopsie.cloud <none> <none>
openshift-dns dns-default-s6qdk 3/3 Running 0 22m 10.130.0.5 n3.devopsie.cloud <none> <none>
openshift-etcd-operator etcd-operator-86658dff85-5qfh9 1/1 Running 1 27m 10.130.0.2 n3.devopsie.cloud <none> <none>
openshift-etcd etcd-n1.devopsie.cloud 4/4 Running 2 22m 135.181.21.51 n1.devopsie.cloud <none> <none>
openshift-etcd etcd-n2.devopsie.cloud 4/4 Running 0 21m 135.181.21.52 n2.devopsie.cloud <none> <none>
openshift-etcd etcd-n3.devopsie.cloud 4/4 Running 0 21m 135.181.21.53 n3.devopsie.cloud <none> <none>
openshift-etcd revision-pruner-2-n2.devopsie.cloud 0/1 ContainerCreating 0 20m <none> n2.devopsie.cloud <none> <none>
openshift-etcd revision-pruner-2-n3.devopsie.cloud 0/1 ContainerCreating 0 20m <none> n3.devopsie.cloud <none> <none>
openshift-kube-apiserver-operator kube-apiserver-operator-75965db895-v6tc9 1/1 Running 1 27m 10.128.0.4 n1.devopsie.cloud <none> <none>
openshift-kube-apiserver installer-4-n2.devopsie.cloud 0/1 Pending 0 17m <none> n2.devopsie.cloud <none> <none>
openshift-kube-apiserver kube-apiserver-n1.devopsie.cloud 4/4 Running 0 20m 135.181.21.51 n1.devopsie.cloud <none> <none>
openshift-kube-apiserver kube-apiserver-n2.devopsie.cloud 4/4 Running 1 21m 135.181.21.52 n2.devopsie.cloud <none> <none>
openshift-kube-apiserver kube-apiserver-n3.devopsie.cloud 4/4 Running 0 21m 135.181.21.53 n3.devopsie.cloud <none> <none>
openshift-kube-apiserver revision-pruner-3-n1.devopsie.cloud 0/1 Pending 0 17m <none> n1.devopsie.cloud <none> <none>
openshift-kube-controller-manager-operator kube-controller-manager-operator-64c9c96f99-bjzw6 1/1 Running 1 27m 10.128.0.6 n1.devopsie.cloud <none> <none>
openshift-kube-controller-manager installer-6-n2.devopsie.cloud 0/1 Pending 0 18m <none> n2.devopsie.cloud <none> <none>
openshift-kube-controller-manager kube-controller-manager-n1.devopsie.cloud 4/4 Running 0 21m 135.181.21.51 n1.devopsie.cloud <none> <none>
openshift-kube-controller-manager kube-controller-manager-n2.devopsie.cloud 4/4 Running 0 20m 135.181.21.52 n2.devopsie.cloud <none> <none>
openshift-kube-controller-manager kube-controller-manager-n3.devopsie.cloud 4/4 Running 0 22m 135.181.21.53 n3.devopsie.cloud <none> <none>
openshift-kube-controller-manager revision-pruner-5-n2.devopsie.cloud 0/1 ContainerCreating 0 20m <none> n2.devopsie.cloud <none> <none>
openshift-kube-scheduler-operator openshift-kube-scheduler-operator-7b7c8dc4ff-nwblc 1/1 Running 1 27m 10.128.0.5 n1.devopsie.cloud <none> <none>
openshift-kube-scheduler installer-6-n3.devopsie.cloud 0/1 Pending 0 17m <none> n3.devopsie.cloud <none> <none>
openshift-kube-scheduler openshift-kube-scheduler-n1.devopsie.cloud 2/2 Running 0 22m 135.181.21.51 n1.devopsie.cloud <none> <none>
openshift-kube-scheduler openshift-kube-scheduler-n2.devopsie.cloud 1/2 Running 0 20m 135.181.21.52 n2.devopsie.cloud <none> <none>
openshift-kube-scheduler openshift-kube-scheduler-n3.devopsie.cloud 2/2 Running 0 22m 135.181.21.53 n3.devopsie.cloud <none> <none>
openshift-kube-scheduler revision-pruner-5-n2.devopsie.cloud 0/1 Pending 0 18m <none> n2.devopsie.cloud <none> <none>
openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-8649ff8f6b-q527q 1/1 Running 1 27m 10.128.0.8 n1.devopsie.cloud <none> <none>
openshift-kube-storage-version-migrator migrator-6879df9b64-27652 1/1 Running 0 23m 10.129.0.9 n2.devopsie.cloud <none> <none>
openshift-machine-config-operator etcd-quorum-guard-7bb76959df-2wvg2 1/1 Running 0 23m 135.181.21.51 n1.devopsie.cloud <none> <none>
openshift-machine-config-operator etcd-quorum-guard-7bb76959df-4b48w 1/1 Running 0 23m 135.181.21.53 n3.devopsie.cloud <none> <none>
openshift-machine-config-operator etcd-quorum-guard-7bb76959df-xl26r 1/1 Running 0 23m 135.181.21.52 n2.devopsie.cloud <none> <none>
openshift-machine-config-operator machine-config-controller-58466596b7-klqzk 1/1 Running 0 22m 10.130.0.6 n3.devopsie.cloud <none> <none>
openshift-machine-config-operator machine-config-daemon-2zmvz 2/2 Running 0 23m 135.181.21.51 n1.devopsie.cloud <none> <none>
openshift-machine-config-operator machine-config-daemon-b5nqv 2/2 Running 0 23m 135.181.21.52 n2.devopsie.cloud <none> <none>
openshift-machine-config-operator machine-config-daemon-psb65 2/2 Running 0 23m 135.181.21.53 n3.devopsie.cloud <none> <none>
openshift-machine-config-operator machine-config-operator-dc7b774db-lttmc 1/1 Running 0 27m 10.128.0.3 n1.devopsie.cloud <none> <none>
openshift-machine-config-operator machine-config-server-2qznq 1/1 Running 0 22m 135.181.21.51 n1.devopsie.cloud <none> <none>
openshift-machine-config-operator machine-config-server-qwcsc 1/1 Running 0 22m 135.181.21.53 n3.devopsie.cloud <none> <none>
openshift-machine-config-operator machine-config-server-zq8tg 1/1 Running 0 22m 135.181.21.52 n2.devopsie.cloud <none> <none>
openshift-multus multus-admission-controller-bpv9n 2/2 Running 0 23m 10.128.0.15 n1.devopsie.cloud <none> <none>
openshift-multus multus-admission-controller-gcstm 2/2 Running 0 23m 10.130.0.3 n3.devopsie.cloud <none> <none>
openshift-multus multus-admission-controller-zw4rh 2/2 Running 0 23m 10.129.0.6 n2.devopsie.cloud <none> <none>
openshift-multus multus-l7tf8 1/1 Running 0 24m 135.181.21.53 n3.devopsie.cloud <none> <none>
openshift-multus multus-ngx9z 1/1 Running 0 24m 135.181.21.51 n1.devopsie.cloud <none> <none>
openshift-multus multus-pqmpt 1/1 Running 0 24m 135.181.21.52 n2.devopsie.cloud <none> <none>
openshift-network-operator network-operator-5fd9fc7877-h8n28 1/1 Running 0 27m 135.181.21.52 n2.devopsie.cloud <none> <none>
openshift-operator-lifecycle-manager catalog-operator-7c69897695-ncgcj 1/1 Running 0 27m 10.128.0.13 n1.devopsie.cloud <none> <none>
openshift-operator-lifecycle-manager olm-operator-5dccd555b9-jvvmk 1/1 Running 0 27m 10.128.0.14 n1.devopsie.cloud <none> <none>
openshift-operator-lifecycle-manager packageserver-575b5f889f-fmbrh 0/1 Pending 0 2m36s <none> <none> <none> <none>
openshift-operator-lifecycle-manager packageserver-5bf96f9f4-hvxvv 0/1 Pending 0 7m37s <none> <none> <none> <none>
openshift-operator-lifecycle-manager packageserver-664d5f495b-btppw 1/1 Terminating 0 22m 10.128.0.17 n1.devopsie.cloud <none> <none>
openshift-operator-lifecycle-manager packageserver-664d5f495b-vs6tw 1/1 Terminating 0 22m 10.129.0.11 n2.devopsie.cloud <none> <none>
openshift-operator-lifecycle-manager packageserver-f55776dbf-dv96t 0/1 Pending 0 2m37s <none> <none> <none> <none>
openshift-sdn ovs-7gxz8 1/1 Running 0 24m 135.181.21.51 n1.devopsie.cloud <none> <none>
openshift-sdn ovs-wphnw 1/1 Running 0 24m 135.181.21.53 n3.devopsie.cloud <none> <none>
openshift-sdn ovs-zlqtn 1/1 Running 0 24m 135.181.21.52 n2.devopsie.cloud <none> <none>
openshift-sdn sdn-b2c6s 1/1 Running 0 24m 135.181.21.51 n1.devopsie.cloud <none> <none>
openshift-sdn sdn-controller-82kns 1/1 Running 0 24m 135.181.21.51 n1.devopsie.cloud <none> <none>
openshift-sdn sdn-controller-89vcr 1/1 Running 0 24m 135.181.21.52 n2.devopsie.cloud <none> <none>
openshift-sdn sdn-controller-zl692 1/1 Running 0 24m 135.181.21.53 n3.devopsie.cloud <none> <none>
openshift-sdn sdn-mw4lq 1/1 Running 0 24m 135.181.21.52 n2.devopsie.cloud <none> <none>
openshift-sdn sdn-xcqcr 1/1 Running 0 24m 135.181.21.53 n3.devopsie.cloud <none> <none>
openshift-service-ca-operator service-ca-operator-5bd75ff5fd-zk6br 1/1 Running 1 27m 10.128.0.2 n1.devopsie.cloud <none> <none>
openshift-service-ca service-ca-6d97bcb4b6-28w2n 1/1 Running 0 23m 10.129.0.3 n2.devopsie.cloud <none> <none> Not sure if the bootsrap log bundle will help but here it is log bundle |
Here is the complete log from DEBUG Loading Base Domain...
DEBUG Loading Platform...
DEBUG Loading Cluster Name...
DEBUG Loading Base Domain...
DEBUG Loading Platform...
DEBUG Loading Pull Secret...
DEBUG Loading Platform...
DEBUG Using Install Config loaded from state file
DEBUG Reusing previously-fetched Install Config
INFO Waiting up to 30m0s for the cluster at https://api.okd.devopsie.cloud:6443 to initialize...
DEBUG Still waiting for the cluster to initialize: Working towards 4.5.0-0.okd-2020-08-12-020541: 49% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.5.0-0.okd-2020-08-12-020541: 63% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.5.0-0.okd-2020-08-12-020541: 68% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.5.0-0.okd-2020-08-12-020541: 69% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.5.0-0.okd-2020-08-12-020541: 70% complete
DEBUG Still waiting for the cluster to initialize: Multiple errors are preventing progress:
* Could not update oauthclient "console" (352 of 584): the server does not recognize this resource, check extension API servers
* Could not update prometheusrule "openshift-cloud-credential-operator/cloud-credential-operator-alerts" (187 of 584): the server does not recognize this resource, check extension API servers
* Could not update prometheusrule "openshift-cluster-samples-operator/samples-operator-alerts" (304 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (499 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-machine-approver/cluster-machine-approver" (509 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-version/cluster-version-operator" (8 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-config-operator/config-operator" (114 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-image-registry/image-registry" (505 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-machine-api/cluster-autoscaler-operator" (219 of 584): the server does not recognize this resource, check extension API servers
DEBUG Still waiting for the cluster to initialize: Multiple errors are preventing progress:
* Cluster operator dns is reporting a failure: Not all desired DNS DaemonSets available
* Could not update oauthclient "console" (352 of 584): the server does not recognize this resource, check extension API servers
* Could not update prometheusrule "openshift-cloud-credential-operator/cloud-credential-operator-alerts" (187 of 584): the server does not recognize this resource, check extension API servers
* Could not update prometheusrule "openshift-cluster-samples-operator/samples-operator-alerts" (304 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (499 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-machine-approver/cluster-machine-approver" (509 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-version/cluster-version-operator" (8 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-config-operator/config-operator" (114 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-image-registry/image-registry" (505 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-machine-api/cluster-autoscaler-operator" (219 of 584): the server does not recognize this resource, check extension API servers
* deployment openshift-cluster-version/cluster-version-operator is not available MinimumReplicasUnavailable: Deployment does not have minimum availability.
DEBUG Still waiting for the cluster to initialize: Multiple errors are preventing progress:
* Cluster operator dns is reporting a failure: Not all desired DNS DaemonSets available
* Could not update oauthclient "console" (352 of 584): the server does not recognize this resource, check extension API servers
* Could not update prometheusrule "openshift-cloud-credential-operator/cloud-credential-operator-alerts" (187 of 584): the server does not recognize this resource, check extension API servers
* Could not update prometheusrule "openshift-cluster-samples-operator/samples-operator-alerts" (304 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (499 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-machine-approver/cluster-machine-approver" (509 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-version/cluster-version-operator" (8 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-config-operator/config-operator" (114 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-image-registry/image-registry" (505 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-machine-api/cluster-autoscaler-operator" (219 of 584): the server does not recognize this resource, check extension API servers
* deployment openshift-cluster-version/cluster-version-operator is not available MinimumReplicasUnavailable: Deployment does not have minimum availability.
DEBUG Still waiting for the cluster to initialize: Multiple errors are preventing progress:
* Cluster operator dns is reporting a failure: Not all desired DNS DaemonSets available
* Cluster operator machine-config is reporting a failure: Failed to resync 4.5.0-0.okd-2020-08-12-020541 because: timed out waiting for the condition during waitForDaemonsetRollout: Daemonset machine-config-daemon is not ready. status: (desired: 3, updated: 3, ready: 0, unavailable: 3)
* Could not update oauthclient "console" (352 of 584): the server does not recognize this resource, check extension API servers
* Could not update prometheusrule "openshift-cloud-credential-operator/cloud-credential-operator-alerts" (187 of 584): the server does not recognize this resource, check extension API servers
* Could not update prometheusrule "openshift-cluster-samples-operator/samples-operator-alerts" (304 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (499 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-machine-approver/cluster-machine-approver" (509 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-version/cluster-version-operator" (8 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-config-operator/config-operator" (114 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-image-registry/image-registry" (505 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-machine-api/cluster-autoscaler-operator" (219 of 584): the server does not recognize this resource, check extension API servers
* deployment openshift-cluster-version/cluster-version-operator is not available MinimumReplicasUnavailable: Deployment does not have minimum availability.
ERROR Cluster operator authentication Degraded is True with ConfigObservation_Error::IngressStateEndpoints_MissingEndpoints::RouterCerts_NoRouterCertSecret: ConfigObservationDegraded: secret "v4-0-config-system-router-certs" not found
RouterCertsDegraded: secret/v4-0-config-system-router-certs -n openshift-authentication: could not be retrieved: secret "v4-0-config-system-router-certs" not found
IngressStateEndpointsDegraded: No endpoints found for oauth-server
INFO Cluster operator authentication Progressing is Unknown with NoData:
INFO Cluster operator authentication Available is Unknown with NoData:
INFO Cluster operator csi-snapshot-controller Progressing is True with _AsExpected: Progressing: Waiting for Deployment to deploy csi-snapshot-controller pods
INFO Cluster operator csi-snapshot-controller Available is False with _AsExpected: Available: Waiting for Deployment to deploy csi-snapshot-controller pods
ERROR Cluster operator dns Degraded is True with NotAllDNSesAvailable: Not all desired DNS DaemonSets available
INFO Cluster operator dns Progressing is True with Reconciling: At least 1 DNS DaemonSet is progressing.
INFO Cluster operator dns Available is False with DNSUnavailable: No DNS DaemonSets available
ERROR Cluster operator etcd Degraded is True with NodeController_MasterNodesReady: NodeControllerDegraded: The master nodes not ready: node "n1.devopsie.cloud" not ready since 2020-09-01 12:50:50 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.), node "n2.devopsie.cloud" not ready since 2020-09-01 12:50:50 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.), node "n3.devopsie.cloud" not ready since 2020-09-01 12:50:50 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
ERROR Cluster operator kube-apiserver Degraded is True with NodeController_MasterNodesReady::NodeInstaller_InstallerPodFailed: NodeControllerDegraded: The master nodes not ready: node "n1.devopsie.cloud" not ready since 2020-09-01 12:50:50 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.), node "n2.devopsie.cloud" not ready since 2020-09-01 12:50:50 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.), node "n3.devopsie.cloud" not ready since 2020-09-01 12:50:50 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
NodeInstallerDegraded: 1 nodes are failing on revision 2:
NodeInstallerDegraded: ; 1 nodes are failing on revision 3:
NodeInstallerDegraded: static pod of revision 3 has been installed, but is not ready while new revision 4 is pending
INFO Cluster operator kube-apiserver Progressing is True with NodeInstaller: NodeInstallerProgressing: 2 nodes are at revision 0; 1 nodes are at revision 3; 0 nodes have achieved new revision 4
ERROR Cluster operator kube-controller-manager Degraded is True with NodeController_MasterNodesReady: NodeControllerDegraded: The master nodes not ready: node "n1.devopsie.cloud" not ready since 2020-09-01 12:50:50 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.), node "n2.devopsie.cloud" not ready since 2020-09-01 12:50:50 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.), node "n3.devopsie.cloud" not ready since 2020-09-01 12:50:50 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
INFO Cluster operator kube-controller-manager Progressing is True with NodeInstaller: NodeInstallerProgressing: 3 nodes are at revision 5; 0 nodes have achieved new revision 7
ERROR Cluster operator kube-scheduler Degraded is True with NodeController_MasterNodesReady::NodeInstaller_InstallerPodFailed: NodeControllerDegraded: The master nodes not ready: node "n1.devopsie.cloud" not ready since 2020-09-01 12:50:50 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.), node "n2.devopsie.cloud" not ready since 2020-09-01 12:50:50 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.), node "n3.devopsie.cloud" not ready since 2020-09-01 12:50:50 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
NodeInstallerDegraded: 1 nodes are failing on revision 4:
NodeInstallerDegraded: ; 1 nodes are failing on revision 5:
NodeInstallerDegraded: static pod of revision 5 has been installed, but is not ready while new revision 6 is pending
INFO Cluster operator kube-scheduler Progressing is True with NodeInstaller: NodeInstallerProgressing: 2 nodes are at revision 0; 1 nodes are at revision 5; 0 nodes have achieved new revision 6
INFO Cluster operator kube-storage-version-migrator Available is False with _NoMigratorPod: Available: deployment/migrator.openshift-kube-storage-version-migrator: no replicas are available
ERROR Cluster operator machine-config Degraded is True with MachineConfigDaemonFailed: Failed to resync 4.5.0-0.okd-2020-08-12-020541 because: timed out waiting for the condition during waitForDaemonsetRollout: Daemonset machine-config-daemon is not ready. status: (desired: 3, updated: 3, ready: 0, unavailable: 3)
INFO Cluster operator machine-config Available is False with : Cluster not available for 4.5.0-0.okd-2020-08-12-020541
ERROR Cluster operator network Degraded is True with RolloutHung: DaemonSet "openshift-multus/multus" rollout is not making progress - last change 2020-09-01T12:51:50Z
DaemonSet "openshift-sdn/sdn-controller" rollout is not making progress - last change 2020-09-01T12:51:50Z
DaemonSet "openshift-sdn/ovs" rollout is not making progress - last change 2020-09-01T12:51:50Z
DaemonSet "openshift-sdn/sdn" rollout is not making progress - last change 2020-09-01T12:51:50Z
INFO Cluster operator network Progressing is True with Deploying: DaemonSet "openshift-multus/multus" is not available (awaiting 3 nodes)
DaemonSet "openshift-multus/multus-admission-controller" is waiting for other operators to become ready
DaemonSet "openshift-sdn/sdn-controller" is not yet scheduled on any nodes
DaemonSet "openshift-sdn/ovs" is not available (awaiting 3 nodes)
DaemonSet "openshift-sdn/sdn" is not available (awaiting 3 nodes)
INFO Cluster operator node-tuning Available is False with TunedUnavailable: DaemonSet "tuned" has no available Pod(s).
ERROR Cluster operator openshift-apiserver Degraded is True with APIServerDeployment_UnavailablePod: APIServerDeploymentDegraded: 3 of 3 requested instances are unavailable for apiserver.openshift-apiserver
INFO Cluster operator openshift-apiserver Available is False with APIServerDeployment_NoPod::APIServices_PreconditionNotReady: APIServicesAvailable: PreconditionNotReady
APIServerDeploymentAvailable: no apiserver.openshift-apiserver pods available on any node.
INFO Cluster operator openshift-controller-manager Progressing is True with _DesiredStateNotYetAchieved: Progressing: daemonset/controller-manager: observed generation is 0, desired generation is 8.
Progressing: daemonset/controller-manager: number available is 0, desired number available > 1
INFO Cluster operator openshift-controller-manager Available is False with _NoPodsAvailable: Available: no daemon pods available on any node.
INFO Cluster operator operator-lifecycle-manager-packageserver Available is False with :
INFO Cluster operator operator-lifecycle-manager-packageserver Progressing is True with : Working toward 0.15.1
INFO Cluster operator service-ca Progressing is True with _ManagedDeploymentsAvailable: Progressing:
Progressing: service-ca does not have available replicas
FATAL failed to initialize the cluster: Multiple errors are preventing progress:
* Cluster operator dns is reporting a failure: Not all desired DNS DaemonSets available
* Cluster operator machine-config is reporting a failure: Failed to resync 4.5.0-0.okd-2020-08-12-020541 because: timed out waiting for the condition during waitForDaemonsetRollout: Daemonset machine-config-daemon is not ready. status: (desired: 3, updated: 3, ready: 0, unavailable: 3)
* Could not update oauthclient "console" (352 of 584): the server does not recognize this resource, check extension API servers
* Could not update prometheusrule "openshift-cloud-credential-operator/cloud-credential-operator-alerts" (187 of 584): the server does not recognize this resource, check extension API servers
* Could not update prometheusrule "openshift-cluster-samples-operator/samples-operator-alerts" (304 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (499 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-machine-approver/cluster-machine-approver" (509 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-version/cluster-version-operator" (8 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-config-operator/config-operator" (114 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-image-registry/image-registry" (505 of 584): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-machine-api/cluster-autoscaler-operator" (219 of 584): the server does not recognize this resource, check extension API servers
* deployment openshift-cluster-version/cluster-version-operator is not available MinimumReplicasUnavailable: Deployment does not have minimum availability. |
Sorry for all the noise, I managed to install in the end. Since I only deleted the ignition files before recreating them there was .openshift_install_state.json left around .... |
* update (okd-project#5) * docs/index: Update Slack channel & mailing list links (okd-project#264) * docs/index: Remove "Review our Apache 2 license" The license is already mentionned elsewhere and not primarily relevant as a first step for new users to get started. * docs/index: Update Slack channel & mailing list links * add Charter (okd-project#265) * add link to ignore file to overcome 403 return code (okd-project#254) (okd-project#3) * ignore link to https://medium.com/@casonadams/edgerouter-x-adguardhome-b9d453f5725b as site returns 403 Signed-off-by: Brian Innes <[email protected]> Co-authored-by: Brian Innes <[email protected]> Co-authored-by: Brian Innes <[email protected]> * added Charter + mkdoc fixes Co-authored-by: Brian Innes <[email protected]> Co-authored-by: Brian Innes <[email protected]> * added meeting minutes * fixed location * fixed affilations * adding minutes to menu and re-arranging * cleaning up minutes * explicit link * updated mkdocs.yaml * exclude minutes from spell checking * replace dead link for Faros * Color accessibility tweaks (okd-project#271) * Test commit * Revert "Test commit" This reverts commit bc8bd34bbf5308c6533ee05a72d30746910e04ca. * Updating copyright * Style updates * More color adjustments * Padding tweaks on homepage * Linear gradient for sidebar navs * Admonition bg color update * Code typography color tweak Co-authored-by: Timothée Ravier <[email protected]> Co-authored-by: Brian Innes <[email protected]> Co-authored-by: Brian Innes <[email protected]> Co-authored-by: Cloud User <[email protected]> Co-authored-by: Jaime Magiera <[email protected]> Co-authored-by: Jaime Magiera <[email protected]> Co-authored-by: LuminousCoder <[email protected]> * fix typo * additional technical content * update tooling versions * Update requirements.txt reduce version as github runner doesn't appear to be up to date with available releases * Update main.yml switch to fedora container Co-authored-by: Timothée Ravier <[email protected]> Co-authored-by: Brian Innes <[email protected]> Co-authored-by: Brian Innes <[email protected]> Co-authored-by: Cloud User <[email protected]> Co-authored-by: Jaime Magiera <[email protected]> Co-authored-by: Jaime Magiera <[email protected]> Co-authored-by: LuminousCoder <[email protected]>
Describe the bug
I'm installing OKD in 3-master, 0-worker configuration in KVM VMs. In general, installation is a failure. Although master nodes get up:
Most of the controllers fail:
From what I gathered, static pods for etcd are not created.
Version
UPI, 4.5.0-0.okd-2020-07-14-153706-ga
As host, I've tried both FCOS stable 32.20200629.3.0 and testing 32.20200715.2.2.
How reproducible
100%
Log bundle
log bundle (107MB)
The text was updated successfully, but these errors were encountered: