Skip to content

Startup Probe kills "/bin/opm serve" process and prevents operatorhubio pod to start #3269

Open
@fjammes

Description

@fjammes

Type of question

General context and help around the operator-sdk

Question

What did you do?

Install operator-sdk v0.28.0

What did you expect to see?

Operator startup

What did you see instead? Under which circumstances?

Operatorhubio pod does not start:

runner@arc-runners-x2src-runner-mxhq2:~$ kubectl get pods -A | grep operatorhubio
olm                  operatorhubio-catalog-gqxnw                  0/1     CrashLoopBackOff   15 (4m6s ago)   55m
runner@arc-runners-x2src-runner-mxhq2:~$ kubectl describe pods -n olm operatorhubio-catalog-gqxnw | tail -n 5
  Normal   Pulled     52m                    kubelet            Successfully pulled image "quay.io/operatorhubio/catalog:latest" in 16.469534578s
  Normal   Created    52m (x2 over 54m)      kubelet            Created container registry-server
  Normal   Started    52m (x2 over 54m)      kubelet            Started container registry-server
  Warning  Unhealthy  5m47s (x150 over 54m)  kubelet            Startup probe failed: timeout: failed to connect service ":50051" within 1s
  Warning  BackOff    42s (x132 over 42m)    kubelet            Back-off restarting failed container
runner@arc-runners-x2src-runner-mxhq2:~$ kubectl logs  -n olm operatorhubio-catalog-gqxnw
time="2024-05-21T10:21:02Z" level=info msg="starting pprof endpoint" address="localhost:6060"
time="2024-05-21T10:21:02Z" level=info msg="found existing cache contents" backend=pogreb.v1 cache=/tmp/cache configs=/configs

Process seems to freeze for 2/3 minutes at the step logged above.

Environment

  • operator-lifecycle-manager version: v0.28.0

  • Kubernetes version information:

kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:53:42Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.0", GitCommit:"a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2", GitTreeState:"clean", BuildDate:"2022-09-01T23:30:43Z", GoVersion:"go1.19", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.27) and server (1.25) exceeds the supported minor version skew of +/-1
  • Kubernetes cluster kind:

ARC and kind based:

kind version
kind v0.15.0 go1.19 linux/amd64

Additional context

The command /bin/opm serve /configs --cache-dir=/tmp/cache takes ~2/3 minutes to start in this container and this trigger the startupProbe. This occurs only on one of our infrastructure. Is there a way to increase the probe duration or to debug what's happening in opm process?

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions