Subscription is stuck when installing same operator multiple times into different namespaces at different dates #3210
Labels
kind/bug
Categorizes issue or PR as related to a bug.
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
Bug Report
This one is really odd, and might be somewhat related to the way how bundle unpack Job names are generated based on the hash value of the bundle (and namespaces?).
When I installed one operator into specific namespace and then try to attempt in another day (actually, 20 days later) to install same operator but into different namespace, then the 2nd
Subscription
hung and never is reconciled.What did you do?
At Apr 4th installed
operand-deployment-lifecycle-manager.v4.0.0
into namespacecp30test
. All good thereAt Apr 24th attempted to install the same package (same catalogsource, same channel, same packagename) but into namespace
cp46test
Subscription in
cp46test
is hung - i.e. it is never reconciled fully - except of the status field updated that all the catalog sources are healtyin namespace
openshift-operator-lifecycle-manager
in thecatalog-operator-8586f5974d-khh7g
Pod there are erorr messages like:E0424 20:12:47.382632 1 queueinformer_operator.go:319] sync "cp46test" failed: bundle unpacking failed with an error: jobs.batch "8d67f73b77c43214c1f31adf025bfc258a4b6d671a34f339926a897eb6d45c6" already exists
Indeed, a
Job
named8d67f73b77c43214c1f31adf025bfc258a4b6d671a34f339926a897eb6d45c6
exists inopenshift-marketplace
namespaceSo, seems that OLM fails to install 2nd instance of same operator, if there is some hash function collision of the bundle unpack Job names
ConfigMap
which holds the bundle details: https://github.com/openshift/operator-framework-olm/blob/master/staging/operator-lifecycle-manager/pkg/controller/bundle/bundle_unpacker.go#L92https://github.com/openshift/operator-framework-olm/blob/master/staging/operator-lifecycle-manager/pkg/controller/bundle/bundle_unpacker.go#L665
Some more screenshots:


Attaching the relevant YAML resources and OCP must-gather generated (but has 176MB, above size limit, can submit requested subset of files, or you can ping me to get access to the whole package):
What did you expect to see?
I would like to see 2nd installation of the operator in separate namespace working just fine
What did you see instead? Under which circumstances?
As above, install hung
Environment
Kubernetes version information:
Kubernetes cluster kind: OCP
k8s version: v1.27.11+ec42b99
Possible Solution
Mitigation is to manually remove the Job which completed Apr 4th and then installation will proceed.
Additional context
N/A
The text was updated successfully, but these errors were encountered: