Closed
Description
What happened?
We have an consistent behavior where Chrome nodes get stuck on Terminating state.
I'm not sure I can provide the exact steps to reproduce but I'm happy to share logs from a system where this is happening.
Command used to start Selenium Grid with Docker (or Kubernetes)
global:
seleniumGrid:
imageRegistry: {{ fvt_image_registry }}/selenium
imagePullSecret: xxx
hub:
imageTag: 4.18.1-20240224
chromeNode:
imageTag: 122.0-20240224
resources:
requests:
cpu: "0.1"
firefoxNode:
enabled: false
edgeNode:
enabled: false
autoscaling:
enabled: true
scalingType: job
scaledOptions:
maxReplicaCount: 999
scaledJobOptions:
scalingStrategy:
strategy: default
ingress:
hostname: selenium-grid.local
path: /selenium
Relevant log output
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
keda-operator-d44bc8ffc-f7rzk 1/1 Running 0 161m 172.30.107.176 10.74.145.3 <none> <none>
keda-operator-metrics-apiserver-b994566dc-8b59f 1/1 Running 1 (14h ago) 14h 172.30.180.80 10.74.145.8 <none> <none>
selenium-grid-selenium-chrome-node-25f9x-5b6dz 1/1 Terminating 0 3h24m 172.30.93.77 10.48.76.225 <none> <none>
selenium-grid-selenium-chrome-node-5cx4x-j9clf 1/1 Terminating 0 174m 172.30.180.158 10.74.145.8 <none> <none>
selenium-grid-selenium-chrome-node-5hpg8-m2xdz 1/1 Running 0 160m 172.30.202.92 10.48.76.223 <none> <none>
selenium-grid-selenium-chrome-node-7mt5f-gwgm6 1/1 Running 0 160m 172.30.139.72 10.74.145.20 <none> <none>
selenium-grid-selenium-chrome-node-cmggr-vvl2l 1/1 Running 0 160m 172.30.180.178 10.74.145.8 <none> <none>
selenium-grid-selenium-chrome-node-fmq9j-qlvmj 1/1 Terminating 0 174m 172.30.107.131 10.74.145.3 <none> <none>
selenium-grid-selenium-chrome-node-fxgnj-hb8qf 1/1 Terminating 0 174m 172.30.93.97 10.48.76.225 <none> <none>
selenium-grid-selenium-chrome-node-gsrsp-h9tzz 1/1 Terminating 0 3h16m 172.30.93.83 10.48.76.225 <none> <none>
selenium-grid-selenium-chrome-node-xd72s-vws74 1/1 Terminating 0 3h24m 172.30.202.89 10.48.76.223 <none> <none>
selenium-grid-selenium-chrome-node-xm8h6-69wl8 1/1 Terminating 0 3h24m 172.30.139.32 10.74.145.20 <none> <none>
selenium-grid-selenium-chrome-node-xqt4h-hkt67 1/1 Terminating 0 3h16m 172.30.139.102 10.74.145.20 <none> <none>
selenium-grid-selenium-hub-5f49c8fc47-vzmt6 1/1 Running 0 14h 172.30.202.81 10.48.76.223 <none> <none>
Chrome node log
2024-03-11 18:58:48,902 INFO Included extra file "/etc/supervisor/conf.d/selenium.conf" during parsing
2024-03-11 18:58:48,905 INFO RPC interface 'supervisor' initialized
2024-03-11 18:58:48,905 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-03-11 18:58:48,906 INFO supervisord started with pid 8
2024-03-11 18:58:49,909 INFO spawned: 'xvfb' with pid 9
2024-03-11 18:58:49,912 INFO spawned: 'vnc' with pid 10
2024-03-11 18:58:49,915 INFO spawned: 'novnc' with pid 11
2024-03-11 18:58:49,917 INFO spawned: 'selenium-node' with pid 12
2024-03-11 18:58:49,938 INFO success: selenium-node entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
E: [pulseaudio] main.c: Daemon startup failed.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Appending Selenium options: --session-timeout 300
Appending Selenium options: --register-period 60
Appending Selenium options: --register-cycle 5
Appending Selenium options: --heartbeat-period 30
Appending Selenium options: --log-level INFO
Generating Selenium Config
Setting up SE_NODE_HOST...
Tracing is disabled
Selenium Grid Node configuration:
[events]
publish = "tcp://selenium-grid-selenium-hub.selenium:4442"
subscribe = "tcp://selenium-grid-selenium-hub.selenium:4443"
[server]
port = "5555"
[node]
grid-url = "http://admin:[email protected]:4444"
session-timeout = "300"
override-max-sessions = false
detect-drivers = false
drain-after-session-count = 1
max-sessions = 1
[[node.driver-configuration]]
display-name = "chrome"
stereotype = '{"browserName": "chrome", "browserVersion": "122.0", "platformName": "Linux", "goog:chromeOptions": {"binary": "/usr/bin/google-chrome"}}'
max-sessions = 1
Starting Selenium Grid Node...
2024-03-11 18:58:51,777 INFO success: xvfb entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 18:58:51,777 INFO success: vnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 18:58:51,777 INFO success: novnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
18:58:51.977 INFO [LoggingOptions.configureLogEncoding] - Using the system default encoding
18:58:51.985 INFO [OpenTelemetryTracer.createTracer] - Using OpenTelemetry for tracing
18:58:52.554 INFO [UnboundZmqEventBus.<init>] - Connecting to tcp://selenium-grid-selenium-hub.selenium:4442 and tcp://selenium-grid-selenium-hub.selenium:4443
18:58:52.767 INFO [UnboundZmqEventBus.<init>] - Sockets created
18:58:53.781 INFO [UnboundZmqEventBus.<init>] - Event bus ready
18:58:54.166 INFO [NodeServer.createHandlers] - Reporting self as: http://172.30.93.77:5555
18:58:54.252 INFO [NodeOptions.getSessionFactories] - Detected 1 available processors
18:58:54.763 INFO [NodeOptions.report] - Adding chrome for {"browserName": "chrome","browserVersion": "122.0","goog:chromeOptions": {"binary": "\u002fusr\u002fbin\u002fgoogle-chrome"},"platformName": "linux","se:noVncPort": 7900,"se:vncEnabled": true} 1 times
2024-03-11T18:58:54UTC [Probe.Startup] - Wait for the Node to report its status
18:58:54.884 INFO [Node.<init>] - Binding additional locator mechanisms: relative
18:58:55.373 INFO [NodeServer$1.start] - Starting registration process for Node http://172.30.93.77:5555
18:58:55.375 INFO [NodeServer.execute] - Started Selenium node 4.18.1 (revision b1d3319b48): http://172.30.93.77:5555
18:58:55.464 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
18:58:55.984 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
18:58:57.964 INFO [LocalNode.checkSessionCount] - Draining Node, configured sessions value (1) has been reached.
18:58:57.972 INFO [LocalNode.newSession] - Session created by the Node. Id: b03b291c5d5108416cf0ac1327aeeda8, Caps: Capabilities {acceptInsecureCerts: true, browserName: chrome-headless-shell, browserVersion: 122.0.6261.69, chrome: {chromedriverVersion: 122.0.6261.69 (81bc525b6a36..., userDataDir: /tmp/.org.chromium.Chromium...}, fedcm:accounts: true, goog:chromeOptions: {debuggerAddress: localhost:43491}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: Proxy(), se:bidiEnabled: false, se:cdp: ws://admin:admin@selenium-g..., se:cdpVersion: 122.0.6261.69, se:vnc: ws://admin:admin@selenium-g..., se:vncEnabled: true, se:vncLocalAddress: ws://172.30.93.77:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
2024-03-11T18:58:58UTC [Probe.Startup] - Node responds the ID: 81a24be5-1a59-402b-9407-62e5be087a72 with status: UP
2024-03-11T18:58:58UTC [Probe.Startup] - Grid responds a matched Node ID: 81a24be5-1a59-402b-9407-62e5be087a72
2024-03-11T18:58:58UTC [Probe.Startup] - Node ID: 81a24be5-1a59-402b-9407-62e5be087a72 is found in the Grid. Node is ready.
19:03:34.310 INFO [SessionSlot.stop] - Stopping session b03b291c5d5108416cf0ac1327aeeda8
19:03:34.350 INFO [LocalNode.stopTimedOutSession] - Node draining complete!
19:03:35.357 INFO [NodeServer.lambda$createHandlers$3] - Shutting down
2024-03-11 19:03:35,722 INFO exited: selenium-node (exit status 0; expected)
2024-03-11 19:03:35,722 WARN received SIGINT indicating exit request
2024-03-11 19:03:35,723 INFO waiting for xvfb, vnc, novnc to die
2024-03-11 19:03:37,727 INFO stopped: novnc (terminated by SIGTERM)
2024-03-11 19:03:38,730 INFO stopped: vnc (terminated by SIGTERM)
2024-03-11 19:03:38,731 INFO waiting for xvfb to die
2024-03-11 19:03:39,732 INFO stopped: xvfb (terminated by SIGTERM)
Chrome node yml:
Name: selenium-grid-selenium-chrome-node-25f9x-5b6dz
Namespace: selenium
Priority: 0
Service Account: selenium-grid-selenium-serviceaccount
Node: 10.48.76.225/10.48.76.225
Start Time: Mon, 11 Mar 2024 18:58:22 +0000
Labels: app=selenium-grid-selenium-chrome-node
app.kubernetes.io/component=selenium-grid-4.18.1-20240224
app.kubernetes.io/instance=selenium-grid
app.kubernetes.io/managed-by=helm
app.kubernetes.io/name=selenium-grid-selenium-chrome-node
app.kubernetes.io/version=4.18.1-20240224
controller-uid=449ef28c-fc3f-4da4-8b3f-31469fb86d9d
helm.sh/chart=selenium-grid-0.28.4
job-name=selenium-grid-selenium-chrome-node-25f9x
scaledjob.keda.sh/name=selenium-grid-selenium-chrome-node
Annotations: checksum/event-bus-configmap: 2698802d0bbf358d1634b47dff1ef36c5fc2501a27a9d2eef02c7874eb9496f8
checksum/logging-configmap: 7f721b250f90c8a5877dc9217b97f0a14392b420edf0e5af105a60944d2b9dc3
checksum/node-configmap: 3b6c0fffa6e6a10d57e5455ce21e1e7ee55e0638f15ff521b8c96fe8c10d8e91
checksum/server-configmap: ac6520a86bfffa04b4946bbce02ac8f1be341d800f4ab09f4e7cf274f74d3770
cni.projectcalico.org/containerID: a880de0585776e16db52c8eb9c290406e1b69dfd510cd15fc0c421cd6a9ed1dc
cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
k8s.v1.cni.cncf.io/network-status:
[{
"name": "k8s-pod-network",
"ips": [
"172.30.93.77"
],
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status:
[{
"name": "k8s-pod-network",
"ips": [
"172.30.93.77"
],
"default": true,
"dns": {}
}]
openshift.io/scc: restricted-v2
seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Terminating (lasts 3h15m)
Termination Grace Period: 30s
SeccompProfile: RuntimeDefault
IP: 172.30.93.77
IPs:
IP: 172.30.93.77
Controlled By: Job/selenium-grid-selenium-chrome-node-25f9x
Containers:
selenium-grid-selenium-chrome-node:
Container ID: cri-o://f177006210af63a402465588e62b42ca96387427102f59824d4a6b29b197ab21
Image: docker-na-private.artifactory.swg-devops.com/wiotp-docker-local/selenium/node-chrome:122.0-20240224
Image ID: docker-na-private.artifactory.swg-devops.com/wiotp-docker-local/selenium/node-chrome@sha256:3b50643ff9885215c9142fefecace7b17efdde64a8c863ef702bd4a6c3e6a378
Port: 5555/TCP
Host Port: 0/TCP
State: Running
Started: Mon, 11 Mar 2024 18:58:48 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 1Gi
Startup: exec [bash -c /opt/selenium/nodeProbe.sh Startup >> /proc/1/fd/1] delay=0s timeout=60s period=5s #success=1 #failure=12
Environment Variables from:
selenium-grid-selenium-event-bus ConfigMap Optional: false
selenium-grid-selenium-node-config ConfigMap Optional: false
selenium-grid-selenium-logging-config ConfigMap Optional: false
selenium-grid-selenium-server-config ConfigMap Optional: false
selenium-grid-selenium-secrets Secret Optional: false
Environment:
SE_OTEL_SERVICE_NAME: selenium-grid-selenium-chrome-node
SE_NODE_PORT: 5555
SE_NODE_REGISTER_PERIOD: 60
SE_NODE_REGISTER_CYCLE: 5
Mounts:
/dev/shm from dshm (rw)
/opt/selenium/nodePreStop.sh from selenium-grid-selenium-node-config (rw,path="nodePreStop.sh")
/opt/selenium/nodeProbe.sh from selenium-grid-selenium-node-config (rw,path="nodeProbe.sh")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n6b64 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
selenium-grid-selenium-node-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: selenium-grid-selenium-node-config
Optional: false
dshm:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 1Gi
kube-api-access-n6b64:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedPreStopHook 165m kubelet Exec lifecycle hook ([bash -c /opt/selenium/nodePreStop.sh >> /proc/1/fd/1]) for Container "selenium-grid-selenium-chrome-node" in Pod "selenium-grid-selenium-chrome-node-25f9x-5b6dz_selenium(8ff39bf5-08ec-4467-b609-0db98b02ff8c)" failed - error: rpc error: code = Unknown desc = command error: time="2024-03-11T14:37:35-05:00" level=fatal msg="nsexec-1[153181]: failed to open /proc/187153/ns/ipc: No such file or directory"
time="2024-03-11T14:37:35-05:00" level=fatal msg="nsexec-0[153169]: failed to sync with stage-1: next state: Invalid argument"
time="2024-03-11T14:37:35-05:00" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1, message: ""
Warning FailedPreStopHook 135m kubelet Exec lifecycle hook ([bash -c /opt/selenium/nodePreStop.sh >> /proc/1/fd/1]) for Container "selenium-grid-selenium-chrome-node" in Pod "selenium-grid-selenium-chrome-node-25f9x-5b6dz_selenium(8ff39bf5-08ec-4467-b609-0db98b02ff8c)" failed - error: rpc error: code = Unknown desc = command error: time="2024-03-11T15:08:06-05:00" level=fatal msg="nsexec-1[358906]: failed to open /proc/187153/ns/ipc: No such file or directory"
time="2024-03-11T15:08:06-05:00" level=fatal msg="nsexec-0[358893]: failed to sync with stage-1: next state: Invalid argument"
time="2024-03-11T15:08:06-05:00" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1, message: ""
Warning FailedPreStopHook 104m kubelet Exec lifecycle hook ([bash -c /opt/selenium/nodePreStop.sh >> /proc/1/fd/1]) for Container "selenium-grid-selenium-chrome-node" in Pod "selenium-grid-selenium-chrome-node-25f9x-5b6dz_selenium(8ff39bf5-08ec-4467-b609-0db98b02ff8c)" failed - error: rpc error: code = Unknown desc = command error: time="2024-03-11T15:38:36-05:00" level=fatal msg="nsexec-1[149417]: failed to open /proc/187153/ns/ipc: No such file or directory"
time="2024-03-11T15:38:36-05:00" level=fatal msg="nsexec-0[149388]: failed to sync with stage-1: next state: Invalid argument"
time="2024-03-11T15:38:36-05:00" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1, message: ""
Warning FailedPreStopHook 74m kubelet Exec lifecycle hook ([bash -c /opt/selenium/nodePreStop.sh >> /proc/1/fd/1]) for Container "selenium-grid-selenium-chrome-node" in Pod "selenium-grid-selenium-chrome-node-25f9x-5b6dz_selenium(8ff39bf5-08ec-4467-b609-0db98b02ff8c)" failed - error: rpc error: code = Unknown desc = command error: time="2024-03-11T16:09:06-05:00" level=fatal msg="nsexec-1[454447]: failed to open /proc/187153/ns/ipc: No such file or directory"
time="2024-03-11T16:09:06-05:00" level=fatal msg="nsexec-0[454427]: failed to sync with stage-1: next state: Invalid argument"
time="2024-03-11T16:09:06-05:00" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1, message: ""
Warning FailedPreStopHook 43m kubelet Exec lifecycle hook ([bash -c /opt/selenium/nodePreStop.sh >> /proc/1/fd/1]) for Container "selenium-grid-selenium-chrome-node" in Pod "selenium-grid-selenium-chrome-node-25f9x-5b6dz_selenium(8ff39bf5-08ec-4467-b609-0db98b02ff8c)" failed - error: rpc error: code = Unknown desc = command error: time="2024-03-11T16:39:37-05:00" level=fatal msg="nsexec-1[259528]: failed to open /proc/187153/ns/ipc: No such file or directory"
time="2024-03-11T16:39:37-05:00" level=fatal msg="nsexec-0[259505]: failed to sync with stage-1: next state: Invalid argument"
time="2024-03-11T16:39:37-05:00" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1, message: ""
Normal Killing 13m (x7 over 3h16m) kubelet Stopping container selenium-grid-selenium-chrome-node
Warning FailedKillPod 13m (x6 over 165m) kubelet error killing pod: [failed to "KillContainer" for "selenium-grid-selenium-chrome-node" with KillContainerError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded", failed to "KillPodSandbox" for "8ff39bf5-08ec-4467-b609-0db98b02ff8c" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
Warning FailedPreStopHook 13m kubelet Exec lifecycle hook ([bash -c /opt/selenium/nodePreStop.sh >> /proc/1/fd/1]) for Container "selenium-grid-selenium-chrome-node" in Pod "selenium-grid-selenium-chrome-node-25f9x-5b6dz_selenium(8ff39bf5-08ec-4467-b609-0db98b02ff8c)" failed - error: rpc error: code = Unknown desc = command error: time="2024-03-11T17:10:07-05:00" level=fatal msg="nsexec-1[51553]: failed to open /proc/187153/ns/ipc: No such file or directory"
time="2024-03-11T17:10:07-05:00" level=fatal msg="nsexec-0[51532]: failed to sync with stage-1: next state: Invalid argument"
time="2024-03-11T17:10:07-05:00" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1, message: ""
Operating System
Openshift 4.12.49
Docker Selenium version (image tag)
4.18.1
Selenium Grid chart version (chart version)
0.28.4