Frequently Pods are getting stuck into Terminating State #2100
Unanswered
vaibhavdhingra
asked this question in
Q&A
Replies: 2 comments
-
Might be related to https://access.redhat.com/solutions/7074052 |
Beta Was this translation helpful? Give feedback.
0 replies
-
Same here, except on release past the one mentioned in the Red Hat doc.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Recently we have shifted our application to OKD 4.15 from OKD 3.10, and we've noticed that, pods got stuck into terminating state.
When we checked the crio and kubelet logs, we found below error messages and this leads to high CPU usage on nodes, and at the end we need to restart crio service to bring down the CPU utilization or sometimes need to restart the node as well:
Crio logs:
Jan 22 06:06:16 ip-10-4-8-115 crio[2349]: time="2025-01-22 06:06:16.092761673Z" level=warning msg="Stopping container fbafe1cadf9f333ddff6b8596b704fe554e9a0c80312ba14a8267e87e9349694 with stop signal timed out. Killing"
Jan 22 06:06:16 ip-10-4-8-115 crio[2349]: time="2025-01-22 06:06:16.095655185Z" level=error msg="Killing container a333f0af16b4dffe9b234f797a78eab359f188a9af6708abdd4b18e19027c0ce failed:
/usr/bin/runc --root /run/runc --systemd-cgroup kill a333f0af16b4dffe9b234f797a78eab359f188a9af6708abdd4b18e19027c0ce KILL
failed: time="2025-01-22T06:06:16Z" level=error msg="container not running"\n : exit status 1"Jan 22 06:06:16 ip-10-4-8-115 crio[2349]: time="2025-01-22 06:06:16.095719037Z" level=warning msg="Stopping container a333f0af16b4dffe9b234f797a78eab359f188a9af6708abdd4b18e19027c0ce with stop signal timed out. Killing"
Jan 22 06:06:16 ip-10-4-8-115 crio[2349]: time="2025-01-22 06:06:16.096466434Z" level=error msg="Killing container fbafe1cadf9f333ddff6b8596b704fe554e9a0c80312ba14a8267e87e9349694 failed:
/usr/bin/runc --root /run/runc --systemd-cgroup kill fbafe1cadf9f333ddff6b8596b704fe554e9a0c80312ba14a8267e87e9349694 KILL
failed: time="2025-01-22T06:06:16Z" level=error msg="container not running"\n : exit status 1"Jan 22 06:06:16 ip-10-4-8-115 crio[2349]: time="2025-01-22 06:06:16.096535066Z" level=warning msg="Stopping container fbafe1cadf9f333ddff6b8596b704fe554e9a0c80312ba14a8267e87e9349694 with stop signal timed out. Killing"
Jan 22 06:06:16 ip-10-4-8-115 crio[2349]: time="2025-01-22 06:06:16.099274491Z" level=error msg="Killing container a333f0af16b4dffe9b234f797a78eab359f188a9af6708abdd4b18e19027c0ce failed:
/usr/bin/runc --root /run/runc --systemd-cgroup kill a333f0af16b4dffe9b234f797a78eab359f188a9af6708abdd4b18e19027c0ce KILL
failed: time="2025-01-22T06:06:16Z" level=error msg="container not running"\n : exit status 1"Jan 22 06:06:16 ip-10-4-8-115 crio[2349]: time="2025-01-22 06:06:16.099319893Z" level=warning msg="Stopping container a333f0af16b4dffe9b234f797a78eab359f188a9af6708abdd4b18e19027c0ce with stop signal timed out. Killing"
Jan 22 06:06:16 ip-10-4-8-115 crio[2349]: time="2025-01-22 06:06:16.100352075Z" level=error msg="Killing container fbafe1cadf9f333ddff6b8596b704fe554e9a0c80312ba14a8267e87e9349694 failed:
/usr/bin/runc --root /run/runc --systemd-cgroup kill fbafe1cadf9f333ddff6b8596b704fe554e9a0c80312ba14a8267e87e9349694 KILL
failed: time="2025-01-22T06:06:16Z" level=error msg="container not running"\n : exit status 1"Jan 22 06:06:16 ip-10-4-8-115 crio[2349]: time="2025-01-22 06:06:16.100469053Z" level=warning msg="Stopping container fbafe1cadf9f333ddff6b8596b704fe554e9a0c80312ba14a8267e87e9349694 with stop signal timed out. Killing"
Jan 22 06:06:16 ip-10-4-8-115 crio[2349]: time="2025-01-22 06:06:16.103884815Z" level=error msg="Killing container a333f0af16b4dffe9b234f797a78eab359f188a9af6708abdd4b18e19027c0ce failed:
/usr/bin/runc --root /run/runc --systemd-cgroup kill a333f0af16b4dffe9b234f797a78eab359f188a9af6708abdd4b18e19027c0ce KILL
failed: time="2025-01-22T06:06:16Z" level=error msg="container not running"\n : exit status 1"Jan 22 06:06:36 ip-10-4-8-115 systemd-journald[808]: [🡕] Suppressed 20916 messages from crio.service
Jan 22 06:06:36 ip-10-4-8-115 crio[2349]: time="2025-01-22 06:06:36.364664188Z" level=error msg="Killing container fbafe1cadf9f333ddff6b8596b704fe554e9a0c80312ba14a8267e87e9349694 failed:
/usr/bin/runc --root /run/runc --systemd-cgroup kill fbafe1cadf9f333ddff6b8596b704fe554e9a0c80312ba14a8267e87e9349694 KILL
failed: time="2025-01-22T06:06:36Z" level=error msg="container not running"\n : exit status 1"Jan 22 06:06:36 ip-10-4-8-115 crio[2349]: time="2025-01-22 06:06:36.364710356Z" level=warning msg="Stopping container fbafe1cadf9f333ddff6b8596b704fe554e9a0c80312ba14a8267e87e9349694 with stop signal timed out. Killing"
Jan 22 06:06:36 ip-10-4-8-115 crio[2349]: time="2025-01-22 06:06:36.365026749Z" level=error msg="Killing container a333f0af16b4dffe9b234f797a78eab359f188a9af6708abdd4b18e19027c0ce failed:
/usr/bin/runc --root /run/runc --systemd-cgroup kill a333f0af16b4dffe9b234f797a78eab359f188a9af6708abdd4b18e19027c0ce KILL
failed: time="2025-01-22T06:06:36Z" level=error msg="container not running"\n : exit status 1"Jan 22 06:06:36 ip-10-4-8-115 crio[2349]: time="2025-01-22 06:06:36.365085934Z" level=warning msg="Stopping container a333f0af16b4dffe9b234f797a78eab359f188a9af6708abdd4b18e19027c0ce with stop signal timed out. Killing"
Kubelet Logs:
Jan 22 06:06:06 ip-10-4-8-115 kubenswrapper[2440]: E0122 06:06:06.796728 2440 kuberuntime_container.go:775] "Kill container failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"pod="gconf/logd-1-n28bt" podUID="06335a95-7aea-4b27-b0d8-71852a2187e2" containerName="logd" containerID={"Type":"cri-o","ID":"fbafe1cadf9f333ddff6b8596b704fe554e9a0c80312ba14a8267e87e9349694"}
Jan 22 06:06:08 ip-10-4-8-115 kubenswrapper[2440]: I0122 06:06:08.174996 2440 prober.go:107] "Probe failed" probeType="Readiness" pod="gconf/logd-2-cq5qs" podUID="03d3e326-e628-4681-b9dd-cbd236c3441f" containerName="logd" probeResult="failure" output="Get "http://10.131.3.101:8080/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Jan 22 06:06:13 ip-10-4-8-115 kubenswrapper[2440]: I0122 06:06:13.105818 2440 prober.go:107] "Probe failed" probeType="Readiness" pod="gconf/logd-1-n28bt" podUID="06335a95-7aea-4b27-b0d8-71852a2187e2" containerName="logd" probeResult="failure" output="Get "http://10.131.2.51:8080/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Jan 22 06:06:18 ip-10-4-8-115 kubenswrapper[2440]: I0122 06:06:18.175774 2440 prober.go:107] "Probe failed" probeType="Readiness" pod="gconf/logd-2-cq5qs" podUID="03d3e326-e628-4681-b9dd-cbd236c3441f" containerName="logd" probeResult="failure" output="Get "http://10.131.3.101:8080/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Jan 22 06:06:23 ip-10-4-8-115 kubenswrapper[2440]: I0122 06:06:23.105984 2440 prober.go:107] "Probe failed" probeType="Readiness" pod="gconf/logd-1-n28bt" podUID="06335a95-7aea-4b27-b0d8-71852a2187e2" containerName="logd" probeResult="failure" output="Get "http://10.131.2.51:8080/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Jan 22 06:06:28 ip-10-4-8-115 kubenswrapper[2440]: I0122 06:06:28.175894 2440 prober.go:107] "Probe failed" probeType="Readiness" pod="gconf/logd-2-cq5qs" podUID="03d3e326-e628-4681-b9dd-cbd236c3441f" containerName="logd" probeResult="failure" output="Get "http://10.131.3.101:8080/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Jan 22 06:06:33 ip-10-4-8-115 kubenswrapper[2440]: I0122 06:06:33.106109 2440 prober.go:107] "Probe failed" probeType="Readiness" pod="gconf/logd-1-n28bt" podUID="06335a95-7aea-4b27-b0d8-71852a2187e2" containerName="logd" probeResult="failure" output="Get "http://10.131.2.51:8080/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Jan 22 06:06:38 ip-10-4-8-115 kubenswrapper[2440]: I0122 06:06:38.175665 2440 prober.go:107] "Probe failed" probeType="Readiness" pod="gconf/logd-2-cq5qs" podUID="03d3e326-e628-4681-b9dd-cbd236c3441f" containerName="logd" probeResult="failure" output="Get "http://10.131.3.101:8080/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Jan 22 06:06:43 ip-10-4-8-115 kubenswrapper[2440]: I0122 06:06:43.105894 2440 prober.go:107] "Probe failed" probeType="Readiness" pod="gconf/logd-1-n28bt" podUID="06335a95-7aea-4b27-b0d8-71852a2187e2" containerName="logd" probeResult="failure" output="Get "http://10.131.2.51:8080/ready\": dial tcp 10.131.2.51:8080: i/o timeout (Client.Timeout exceeded while awaiting headers)"
Jan 22 06:06:48 ip-10-4-8-115 kubenswrapper[2440]: I0122 06:06:48.176874 2440 prober.go:107] "Probe failed" probeType="Readiness" pod="gconf/logd-2-cq5qs" podUID="03d3e326-e628-4681-b9dd-cbd236c3441f" containerName="logd" probeResult="failure" output="Get "http://10.131.3.101:8080/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Jan 22 06:06:53 ip-10-4-8-115 kubenswrapper[2440]: I0122 06:06:53.106147 2440 prober.go:107] "Probe failed" probeType="Readiness" pod="gconf/logd-1-n28bt" podUID="06335a95-7aea-4b27-b0d8-71852a2187e2" containerName="logd" probeResult="failure" output="Get "http://10.131.2.51:8080/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Jan 22 06:06:58 ip-10-4-8-115 kubenswrapper[2440]: I0122 06:06:58.175459 2440 prober.go:107] "Probe failed" probeType="Readiness" pod="gconf/logd-2-cq5qs" podUID="03d3e326-e628-4681-b9dd-cbd236c3441f" containerName="logd" probeResult="failure" output="Get "http://10.131.3.101:8080/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Moreover, we had also tried to stop the containers manually using below command , but no luck, got below error message:
crictl stop fbafe1cadf9f3
E0122 06:12:44.613093 1667320 remote_runtime.go:349] "StopContainer from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" containerID="fbafe1cadf9f3"
FATA[0002] stopping the container "fbafe1cadf9f3": rpc error: code = DeadlineExceeded desc = context deadline exceeded
crio version: 1.28.2
OKD version: 4.15.0-0.okd-2024-03-10-010116
K8s version: v1.28.7+6e2789b
Beta Was this translation helpful? Give feedback.
All reactions