Skip to content

fix: do not update cache if the execution failed #319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

metacosm
Copy link
Collaborator

Should fix #310

@metacosm metacosm requested a review from csviri January 22, 2021 22:36
@metacosm metacosm self-assigned this Jan 22, 2021
@ppatierno
Copy link
Contributor

@metacosm I have tested this running the CustomService based example but removing the Namespaced implementation from it which drives the exception I raised in #310.
The fix doesn't work because that code is never reached in the eventProcessingFinished, because the following condition is true and return:

if (retry != null && postExecutionControl.exceptionDuringExecution()) {
        handleRetryOnException(executionScope);
        return;
      }

It seems to be right because in the end due to an exception, the operator sdk has to retry.
The problem I see is that on retry, it's lost the reason why the previous execution was failed.
Could it be useful to save the postExecutionControl instance in the current executionScope as "previous exception in case of retry" so that on next handleDispatch (or handleCreateOrUpdate) we have that in the execution scope and we can evaluate the customResourceUpdatedDuringExecution at the right time?
It's just the first idea came to my mind, maybe there is a different approach for you who knows the framework better than me.

@ppatierno
Copy link
Contributor

@metacosm actually I tried a different way of thinking an opened this PR #320

@csviri
Copy link
Collaborator

csviri commented Jan 24, 2021

The problem is that we add the finalizer to the cached object, and even if the update fails the object itself remains changed.
IMHO this last PR would fix this particular problem, but I can imagine that this could cause problems elsewhere, since we are using just the instance from cache everywhere.
So the proper solution would be to clone the object when reading it from cache, so the cache always returns a clone of the custom resource not the instance in the cache itself. So even there is an exception in the dispatcher, we make sure that we use an instance that is not changed by us, but a version that represents the state in the API.

@csviri
Copy link
Collaborator

csviri commented Jan 24, 2021

#321
@ppatierno @metacosm what do you think?
@ppatierno could you test please if this solves the problem?

@metacosm
Copy link
Collaborator Author

@csviri I'd like to revisit with you the decision to use a cache because I'm still not convinced we need one and I'm afraid it might cause more issues than it will solve…

@ppatierno
Copy link
Contributor

At the beginning I thought that the cache was need for handling events for resources with newer generation only then I saw that this kind of skip is made by using this lastGenerationProcessedSuccessfully map. I would know why the resource cache is needed as well.

@csviri
Copy link
Collaborator

csviri commented Jan 24, 2021

@ppatierno @metacosm we can organize a call, happy to discuss the details. Its needed also for the generation aware processing also, since we are doing optimistic version control, but that is a details. The main idea is:

If we execute the controller, the custom resource is always received as a parameter in the controller, this naturally needed, since we would like to update its status, but mainly we always reconcile based on specs of a custom resource. Since we are not just listening to the changes of the custom resource itself, but we can have other event sources ( where we execute the controller if we receive event from different source - and want to provide the CR also in this case ), for this case we use the cache with the latest resource, since we are already watching for he changes for that resource its quite natural to cache it (not to call always the K8S Api for a fresh one).

@csviri
Copy link
Collaborator

csviri commented Jan 25, 2021

@ppatierno does the last PR fix your issue? I would say we should merge that in case does. Event if we want to change the behavior with caches that is something for a long term.

@ppatierno
Copy link
Contributor

@csviri yes I tested it yesterday and it fixed this case.

@csviri csviri mentioned this pull request Jan 25, 2021
@csviri csviri closed this Jan 25, 2021
@metacosm metacosm deleted the fix-finalizer branch February 19, 2021 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Controller create/update handling called even if exception when applying finalizer
3 participants