fix: do not update cache if the execution failed #319

metacosm · 2021-01-22T22:36:13Z

Should fix #310

ppatierno · 2021-01-23T14:42:01Z

@metacosm I have tested this running the CustomService based example but removing the Namespaced implementation from it which drives the exception I raised in #310.
The fix doesn't work because that code is never reached in the eventProcessingFinished, because the following condition is true and return:

if (retry != null && postExecutionControl.exceptionDuringExecution()) {
        handleRetryOnException(executionScope);
        return;
      }

It seems to be right because in the end due to an exception, the operator sdk has to retry.
The problem I see is that on retry, it's lost the reason why the previous execution was failed.
Could it be useful to save the postExecutionControl instance in the current executionScope as "previous exception in case of retry" so that on next handleDispatch (or handleCreateOrUpdate) we have that in the execution scope and we can evaluate the customResourceUpdatedDuringExecution at the right time?
It's just the first idea came to my mind, maybe there is a different approach for you who knows the framework better than me.

ppatierno · 2021-01-23T15:02:16Z

@metacosm actually I tried a different way of thinking an opened this PR #320

csviri · 2021-01-24T12:24:18Z

The problem is that we add the finalizer to the cached object, and even if the update fails the object itself remains changed.
IMHO this last PR would fix this particular problem, but I can imagine that this could cause problems elsewhere, since we are using just the instance from cache everywhere.
So the proper solution would be to clone the object when reading it from cache, so the cache always returns a clone of the custom resource not the instance in the cache itself. So even there is an exception in the dispatcher, we make sure that we use an instance that is not changed by us, but a version that represents the state in the API.

csviri · 2021-01-24T12:38:55Z

#321
@ppatierno @metacosm what do you think?
@ppatierno could you test please if this solves the problem?

metacosm · 2021-01-24T13:32:50Z

@csviri I'd like to revisit with you the decision to use a cache because I'm still not convinced we need one and I'm afraid it might cause more issues than it will solve…

ppatierno · 2021-01-24T14:09:36Z

At the beginning I thought that the cache was need for handling events for resources with newer generation only then I saw that this kind of skip is made by using this lastGenerationProcessedSuccessfully map. I would know why the resource cache is needed as well.

csviri · 2021-01-24T15:39:50Z

@ppatierno @metacosm we can organize a call, happy to discuss the details. Its needed also for the generation aware processing also, since we are doing optimistic version control, but that is a details. The main idea is:

If we execute the controller, the custom resource is always received as a parameter in the controller, this naturally needed, since we would like to update its status, but mainly we always reconcile based on specs of a custom resource. Since we are not just listening to the changes of the custom resource itself, but we can have other event sources ( where we execute the controller if we receive event from different source - and want to provide the CR also in this case ), for this case we use the cache with the latest resource, since we are already watching for he changes for that resource its quite natural to cache it (not to call always the K8S Api for a fresh one).

csviri · 2021-01-25T16:14:02Z

@ppatierno does the last PR fix your issue? I would say we should merge that in case does. Event if we want to change the behavior with caches that is something for a long term.

ppatierno · 2021-01-25T16:43:16Z

@csviri yes I tested it yesterday and it fixed this case.

metacosm requested a review from csviri January 22, 2021 22:36

metacosm self-assigned this Jan 22, 2021

fix: do not update cache if the execution failed

716399e

metacosm force-pushed the fix-finalizer branch from ac82207 to 716399e Compare January 22, 2021 22:52

csviri mentioned this pull request Jan 25, 2021

clone on read from cache #321

Merged

csviri closed this Jan 25, 2021

metacosm deleted the fix-finalizer branch February 19, 2021 09:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: do not update cache if the execution failed #319

fix: do not update cache if the execution failed #319

Uh oh!

metacosm commented Jan 22, 2021

Uh oh!

ppatierno commented Jan 23, 2021

Uh oh!

ppatierno commented Jan 23, 2021

Uh oh!

csviri commented Jan 24, 2021 •

edited

Loading

Uh oh!

csviri commented Jan 24, 2021

Uh oh!

metacosm commented Jan 24, 2021

Uh oh!

ppatierno commented Jan 24, 2021

Uh oh!

csviri commented Jan 24, 2021

Uh oh!

csviri commented Jan 25, 2021

Uh oh!

ppatierno commented Jan 25, 2021

Uh oh!

Uh oh!

fix: do not update cache if the execution failed #319

fix: do not update cache if the execution failed #319

Uh oh!

Conversation

metacosm commented Jan 22, 2021

Uh oh!

ppatierno commented Jan 23, 2021

Uh oh!

ppatierno commented Jan 23, 2021

Uh oh!

csviri commented Jan 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csviri commented Jan 24, 2021

Uh oh!

metacosm commented Jan 24, 2021

Uh oh!

ppatierno commented Jan 24, 2021

Uh oh!

csviri commented Jan 24, 2021

Uh oh!

csviri commented Jan 25, 2021

Uh oh!

ppatierno commented Jan 25, 2021

Uh oh!

Uh oh!

csviri commented Jan 24, 2021 •

edited

Loading