-
Notifications
You must be signed in to change notification settings - Fork 219
[bug] Operator becoming non-functional after transient RBAC changes #1419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc. @lburgazzoli |
Probably related to this issue: |
Related to #1170 also. |
@csviri do we have an integration test for this? |
yep: https://github.com/java-operator-sdk/java-operator-sdk/blob/2cb616c4c4fd0094ee6e3a0ef2a0ea82173372bf/operator-framework/src/test/java/io/javaoperatorsdk/operator/InformerRelatedBehaviorITS.java it is a little special, since it is not trivial to test. See javadocs on the test class. |
awesome! thanks! |
Bug Report
Hi all and thanks for the amazing project!
I was looking at real-world edge cases where the functionality of the operator gets compromised because the Informers are crashing in background.
A little playing with RBAC resources with an operator running turns it to be completely unresponsive on any CR event.
What did you do?
minikube
clusterkubectl apply -f sample-operators/tomcat-operator/k8s/tomcat-sample1.yaml
kubectl delete serviceaccount/tomcat-operator -n tomcat-operator
Now the operator becomes completely unresponsive:
test-tomcat1
CRkubectl apply -f sample-operators/tomcat-operator/k8s/tomcat-sample2.yaml
What did you expect to see?
The operator pod should(probably) restart in case it loose access to the API in order to be able to restore the communication.
Alternatively, the situation should be handled and, somehow, the connection of the
SharedInformers
restored.What did you see instead? Under which circumstances?
The operator remains unresponsive but alive.
Environment
Kubernetes cluster type:
minikube
$ Mention java-operator-sdk version from pom.xml file
main
$ java -version
$ kubectl version
Possible Solution
The best would be to have callback endpoint in the Controller that gets called if an error happens with the
SharedInformer
s, so that the user can decide what to do.At a very minimum, in this specific situation, I do believe that crashing the Operator is the correct behavior, but it would be nice to have a more generic mechanism for handling
SharedInformer
s failures that are currently happening in background.Additional context
During my test, I verified that the communication with the API server gets restored if the API server becomes temporarily unavailable, that's great work 👍
The text was updated successfully, but these errors were encountered: