Open
Description
Type of question
General context and help around the operator-sdk
Question
What did you do?
Install operator-sdk v0.28.0
What did you expect to see?
Operator startup
What did you see instead? Under which circumstances?
Operatorhubio pod does not start:
runner@arc-runners-x2src-runner-mxhq2:~$ kubectl get pods -A | grep operatorhubio
olm operatorhubio-catalog-gqxnw 0/1 CrashLoopBackOff 15 (4m6s ago) 55m
runner@arc-runners-x2src-runner-mxhq2:~$ kubectl describe pods -n olm operatorhubio-catalog-gqxnw | tail -n 5
Normal Pulled 52m kubelet Successfully pulled image "quay.io/operatorhubio/catalog:latest" in 16.469534578s
Normal Created 52m (x2 over 54m) kubelet Created container registry-server
Normal Started 52m (x2 over 54m) kubelet Started container registry-server
Warning Unhealthy 5m47s (x150 over 54m) kubelet Startup probe failed: timeout: failed to connect service ":50051" within 1s
Warning BackOff 42s (x132 over 42m) kubelet Back-off restarting failed container
runner@arc-runners-x2src-runner-mxhq2:~$ kubectl logs -n olm operatorhubio-catalog-gqxnw
time="2024-05-21T10:21:02Z" level=info msg="starting pprof endpoint" address="localhost:6060"
time="2024-05-21T10:21:02Z" level=info msg="found existing cache contents" backend=pogreb.v1 cache=/tmp/cache configs=/configs
Process seems to freeze for 2/3 minutes at the step logged above.
Environment
-
operator-lifecycle-manager version: v0.28.0
-
Kubernetes version information:
kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:53:42Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.0", GitCommit:"a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2", GitTreeState:"clean", BuildDate:"2022-09-01T23:30:43Z", GoVersion:"go1.19", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.27) and server (1.25) exceeds the supported minor version skew of +/-1
- Kubernetes cluster kind:
ARC and kind based:
kind version
kind v0.15.0 go1.19 linux/amd64
Additional context
The command /bin/opm serve /configs --cache-dir=/tmp/cache
takes ~2/3 minutes to start in this container and this trigger the startupProbe. This occurs only on one of our infrastructure. Is there a way to increase the probe duration or to debug what's happening in opm process?