Skip to content

Caching just Updated Custom Resource Optimization alternatives #588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
csviri opened this issue Oct 6, 2021 · 9 comments · Fixed by #604
Closed

Caching just Updated Custom Resource Optimization alternatives #588

csviri opened this issue Oct 6, 2021 · 9 comments · Fixed by #604
Assignees
Labels
caching-epic kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@csviri
Copy link
Collaborator

csviri commented Oct 6, 2021

In case we execute a controller and update the custom resource at the end, it can take time while we receive the custom resource event from the server and gets into the informers cache in CustomResoureEventSource . This causes problem in case when there are already events received during the execution. In next reconciliation , what happens instantly, we use an old custom resource, and the next update will fail on conflict.

We handled this situation in V1 but simply caching manually the updated custom resource from the response of the update api call. Currently we cannot do this with informers (well not with a very hacky way)

Note that if there is retry in place this is not a correctness issue just an optimization, but this is possibly happening quite often, and it's quite ugly. Since we see lots of retries.

Alternative solutions to V2:

  1. Don't execute the buffered events instantly, just schedule a new one time event in the close feature (~1-2 second?), usually we receive the update custom resource until that time into cache (not necessary to the Event Handler because of generation filtering). We have to be careful there this event producing should be cancelled in case the event arrives. (I prefer this)
    (This would be also reusable to optimize the re scheduling of events)

  2. Don't filter out the next event from custom resource event source. This will allow to receive the event about the update and execute the custom resource. Note that is hard to make correct (if not impossible now) in all cases, there are possible timing issues we have to investiage)

  3. the combination of 1. and 2.

@csviri csviri added this to the v2 milestone Oct 6, 2021
@metacosm
Copy link
Collaborator

metacosm commented Oct 6, 2021

Shouldn't event only occur when the informer is updated, though? In which situation would there be events that are not associated to an informer update?

@csviri
Copy link
Collaborator Author

csviri commented Oct 7, 2021

It does eventually, the problem is the timing. So we update a resource after controller execution, but instantly execute the buffered events if there are. This is fater then the event from k8s api received in the informer. So the cache in the informer not necessary has the up to date CR, but we know there is one. So we can handle this situation (for example in a way mentioned above)

@metacosm
Copy link
Collaborator

One possible option, though a tricky one to implement would be to delay updating the resource on the server until all the events are processed. Another option would be to have an ephemeral cache of the resource that would only span the current interaction (though I guess defining that interaction might be tricky as well).

@lburgazzoli
Copy link
Collaborator

Not sure how feasible it would be but guess if here is where the custom event filter comes into play as we can use the updated custom resource from the controller execution as one of the parameter of the filter (we need to update the events in the queue) and let the user decide what to do.

@csviri
Copy link
Collaborator Author

csviri commented Oct 11, 2021

I more and more leaning to a flag that we don't process the events until we receive the next event from the informer. This might be solvable relatively elegantly just by adding a new special event handler for this purpose from the CustomResourceEventSource informer to the DefaultEventHandler. Or just a flag into current event filtering.

Note that we know what event we expecting, thus with what resource version, but would ignore that, thus not check it.

@lburgazzoli
Copy link
Collaborator

I more and more leaning to a flag that we don't process the events until we receive the next event from the informer. This might be solvable relatively elegantly just by adding a new special event handler for this purpose from the CustomResourceEventSource informer to the DefaultEventHandler. Or just a flag into current event filtering.

What would happen in case of an update that does not result in a change to CR ?
I think you won't get an event in such case

Note that we know what event we expecting, thus with what resource version, but would ignore that, thus not check it.

@csviri
Copy link
Collaborator Author

csviri commented Oct 11, 2021

What would happen in case of an update that does not result in a change to CR ?
I think you won't get an event in such case

An update is a CR or Status update by definition. If there is no such update we don't have this problem. So we will just check this if there was a CR update.

@lburgazzoli
Copy link
Collaborator

Yes but the actual update may not happen if the resulting patch is empty right ?
I guess to be on the safe side we should take into account the resource version of the original resource vs the resource after the update right ?

@csviri
Copy link
Collaborator Author

csviri commented Oct 11, 2021

We don't support patches now (probably will), but yes.
This is already happening in the current implementation (in 1.x) . So the whole logic around this is executed only if the current resource version (returned from the update) is different from the one in the cache AND the preciouse resource version is in the cache. This way we kinda minimize the some other possibilities - like having also on other update meanwhile from kubectl for example on CR.

https://github.com/java-operator-sdk/java-operator-sdk/blob/98e3def23afebeee11062735dc1c4269f8bfe0c8/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/DefaultEventHandler.java#L298-L315

See line 312.

@csviri csviri self-assigned this Oct 13, 2021
@csviri csviri linked a pull request Oct 14, 2021 that will close this issue
@csviri csviri added kind/feature Categorizes issue or PR as related to a new feature. caching-epic labels Oct 14, 2021
@csviri csviri closed this as completed Oct 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
caching-epic kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants