-
Notifications
You must be signed in to change notification settings - Fork 219
blog: primary resource caching #2815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
4f8ed1f
blog: caching
csviri 3c133b6
docs: blogpost about primary caching
csviri e9e9508
wip
csviri 3388a1a
wip
csviri 343a828
Update docs/content/en/blog/news/primary-cache-for-next-recon.md
csviri 171b526
Update primary-cache-for-next-recon.md
csviri 6294342
mermaid and improvement
csviri 466facf
date
csviri f41390c
title
csviri 594598e
docs
csviri 48e9c81
improve
csviri 9e74dd2
wording
csviri a18765d
improve
csviri 3517ef1
docs: start improving wording
metacosm 85a1be7
wording
csviri 328f40a
comment improve
csviri 88f22b4
docs: improve
metacosm d56c576
improvements
csviri File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
--- | ||
title: How to guarantee allocated values for next reconciliation | ||
date: 2025-05-22 | ||
author: >- | ||
[Attila Mészáros](https://github.com/csviri) and [Chris Laprun](https://github.com/metacosm) | ||
--- | ||
|
||
We recently released v5.1 of Java Operator SDK (JOSDK). One of the highlights of this release is related to a topic of | ||
so-called | ||
[allocated values](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#representing-allocated-values | ||
). | ||
|
||
To describe the problem, let's say that our controller needs to create a resource that has a generated identifier, i.e. | ||
a resource which identifier cannot be directly derived from the custom resource's desired state as specified in its | ||
`spec` field. To record the fact that the resource was successfully created, and to avoid attempting to | ||
recreate the resource again in subsequent reconciliations, it is typical for this type of controller to store the | ||
generated identifier in the custom resource's `status` field. | ||
|
||
The Java Operator SDK relies on the informers' cache to retrieve resources. These caches, however, are only guaranteed | ||
to be eventually consistent. It could happen that, if some other event occurs, that would result in a new | ||
reconciliation, **before** the update that's been made to our resource status has the chance to be propagated first to | ||
the cluster and then back to the informer cache, that the resource in the informer cache does **not** contain the latest | ||
version as modified by the reconciler. This would result in a new reconciliation where the generated identifier would be | ||
missing from the resource status and, therefore, another attempt to create the resource by the reconciler, which is not | ||
what we'd like. | ||
|
||
Java Operator SDK now provides a utility class [ | ||
`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java) | ||
to handle this particular use case. Using that overlay cache, your reconciler is guaranteed to see the most up-to-date | ||
version of the resource on the next reconciliation: | ||
|
||
```java | ||
|
||
@Override | ||
public UpdateControl<StatusPatchCacheCustomResource> reconcile( | ||
StatusPatchCacheCustomResource resource, | ||
Context<StatusPatchCacheCustomResource> context) { | ||
|
||
// omitted code | ||
|
||
var freshCopy = createFreshCopy(resource); // need fresh copy just because we use the SSA version of update | ||
freshCopy | ||
.getStatus() | ||
.setValue(statusWithAllocatedValue()); | ||
|
||
// using the utility instead of update control to patch the resource status | ||
var updated = | ||
PrimaryUpdateAndCacheUtils.ssaPatchStatusAndCacheResource(resource, freshCopy, context); | ||
return UpdateControl.noUpdate(); | ||
} | ||
``` | ||
|
||
How does `PrimaryUpdateAndCacheUtils` work? | ||
There are multiple ways to solve this problem, but ultimately, we only provide the solution described below. If you | ||
want to dig deep in alternatives, see | ||
this [PR](https://github.com/operator-framework/java-operator-sdk/pull/2800/files). | ||
|
||
The trick is to intercept the resource that the reconciler updated and cache that version in an additional cache on top | ||
of the informer's cache. Subsequently, if the reconciler needs to read the resource, the SDK will first check if it is | ||
in the overlay cache and read it from there if present, otherwise read it from the informer's cache. If the informer | ||
receives an event with a fresh resource, we always remove the resource from the overlay cache, since that is a more | ||
recent resource. But this **works only** if the reconciler updates the resource using **optimistic locking**. | ||
If the update fails on conflict, because the resource has already been updated on the cluster before we got | ||
the chance to get our update in, we simply wait and poll the informer cache until the new resource version from the | ||
server appears in the informer's cache, | ||
and then try to apply our updates to the resource again using the updated version from the server, again with optimistic | ||
locking. | ||
|
||
So why is optimistic locking required? We hinted at it above, but the gist of it, is that if another party updates the | ||
resource before we get a chance to, we wouldn't be able to properly handle the resulting situation correctly in all | ||
cases. The informer would receive that new event before our own update would get a chance to propagate. Without | ||
optimistic locking, there wouldn't be a fail-proof way to determine which update should prevail (i.e. which occurred | ||
first), in particular in the event of the informer losing the connection to the cluster or other edge cases (the joys of | ||
distributed computing!). | ||
|
||
Optimistic locking simplifies the situation and provides us with stronger guarantees: if the update succeeds, then we | ||
can be sure we have the proper resource version in our caches. The next event will contain our update in all cases. | ||
Because we know that, we can also be sure that we can evict the cached resource in the overlay cache whenever we receive | ||
a new event. The overlay cache is only used if the SDK detects that the original resource (i.e. the one before we | ||
applied our status update in the example above) is still in the informer's cache. | ||
|
||
The following diagram sums up the process: | ||
|
||
```mermaid | ||
flowchart TD | ||
A["Update Resource with Lock"] --> B{"Is Successful"} | ||
B -- Fails on conflict --> D["Poll the Informer cache until resource updated"] | ||
D --> A | ||
B -- Yes --> n2{"Original resource still in informer cache?"} | ||
n2 -- Yes --> C["Cache the resource in overlay cache"] | ||
n2 -- No --> n3["Informer cache already contains up-to-date version, do not use overlay cache"] | ||
``` |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you return
noUpdate
here? Shouldn't it returnpatchStatus
instead?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not, the utils doess the patching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, to be clear, if a user wants to use the utils, they need to return
noUpdate
? Or, it's just that it won't matter if they return something else?Either way, that should be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using the utility instead of update control
, yes that might not be enough, will, expand on that, also a separate PR for the core docs