-
Notifications
You must be signed in to change notification settings - Fork 448
Acknowledge all ASO resource types before creating #5571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
/hold for squash |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5571 +/- ##
==========================================
+ Coverage 52.84% 52.97% +0.12%
==========================================
Files 278 278
Lines 29610 29674 +64
==========================================
+ Hits 15647 15719 +72
+ Misses 13146 13139 -7
+ Partials 817 816 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@Jont828 points out that status should generally not be informing any reconcile logic. I now remember seeing that written down somewhere too, but can't find it in the API conventions or kubebuilder book. If you have a link that would be great to add here for posterity. I'll see if I can rework this PR. |
Reference 1 for the above text: https://book-v1.book.kubebuilder.io/basics/status_subresource Although, the above links are from older generations of kubebuilder, the underlying convention of not relying on Status of a resource to perform a reconcile logic still is widely used. |
Acknowledge all ASO resource types before creating
I've updated this to avoid using status from previous reconciliations as an input to anything. Main changes:
/retitle Acknowledge all ASO resource types before creating |
/retest |
2 similar comments
/retest |
/retest |
As of this comment, the apidiff job is expected to fail
|
Still reviewing this PR, little too dense to skim through. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @nojnhuh ! This is a cool PR, took some time to understand it. Sorry for the delay there.
I have some questions and suggestions listed below.
controllers/resource_reconciler.go
Outdated
@@ -39,6 +42,8 @@ import ( | |||
"sigs.k8s.io/cluster-api-provider-azure/util/tele" | |||
) | |||
|
|||
const ownedKindsAnnotation = "sigs.k8s.io/cluster-api-provider-azure-owned-aso-kinds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this annotation to /azure/const.go
unless there is a specific reason to keep it here ?
We also should add explanatory comment on the significance of using this annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This annotation is strictly an implementation detail, so I'd prefer to keep here mostly to signal the intent that this shouldn't be used by anything else.
for _, spec := range r.owner.GetResourceStatuses() { | ||
newStatus, err := r.deleteResource(ctx, spec.Resource) | ||
ownedKindsValue := r.owner.GetAnnotations()[ownedKindsAnnotation] | ||
ownedKinds, err := parseOwnedKinds(ownedKindsValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It be great if we can add a supporting unit test for parseOwnedKinds()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this annotation is an implementation detail, I'd prefer to let the higher-level tests exercise this indirectly.
return false, fmt.Errorf("failed to parse %s annotation: %s", ownedKindsAnnotation, ownedKindsValue) | ||
} | ||
|
||
ownedObjs, err := r.ownedObjs(ctx, ownedKinds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion as above, would love to see a unit test validating r.ownedObjs()
.
controllers/resource_reconciler.go
Outdated
|
||
// partitionResources splits the sets of resources in spec and the current set | ||
// of owned, existing resources into three groups: | ||
// - unobservedTypeResources are of a type not yet known to this owning CAPZ resource. | ||
// - observedTypeResources are of a type already known to this owning CAPZ resource. | ||
// - deletedResources exist but are not defined in spec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we update deletedResources
to toBeDeletedResources
since the resources saved in this struct will be issued a delete later on ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- observedTypeResources are of a type already known to this owning CAPZ resource.
I might be missing the significance of the word "Type" here ?
When you say observedTypeResources
does "TypeResource" signify something other than a typed entity in Go ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update below resources instead ?
unobservedTypeResources
tounrecordedResources
observedTypeResources
torecordedResources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "type" in the name is important because that's the granularity at which CAPZ is tracking whether or not a resource is safe to create. CAPZ doesn't track whether or not it's seen every individual resource in spec.resources
, only the types of those resources. So "observedTypeResources" are "resources whose type has been observed."
} | ||
|
||
for _, spec := range observedTypeResources { | ||
spec.SetNamespace(r.owner.GetNamespace()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this update, spec.SetNamespace()
, result in validation-update-error due to immutability of Namespace on an ASO resource ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the resource's namespace in a SSA patch would create a new resource in that new namespace. I'm not sure how a user would get into that scenario here in practice though. CAPZ always makes sure the ASO resource's namespace matches that of its owning CAPZ resource.
controllers/resource_reconciler.go
Outdated
if ownedKinds.Has(typeMeta) { | ||
observedTypeResources = append(observedTypeResources, spec) | ||
} else { | ||
unobservedTypeResources = append(unobservedTypeResources, spec) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ownedKinds
also gets updated with the TypeMeta of the deletedResources
, will the user be able to recreate a deleted resource with a new name ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can always create a resource of any ASO type. The annotation contains the types of the deleted resources only so CAPZ knows how to find resources which might be orphaned and construct delete requests for them.
controllers/resource_reconciler.go
Outdated
unobservedTypeResources, observedTypeResources, deletedResources := partitionResources(ownedKinds, r.resources, ownedObjs) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we incorporate the below suggestion for better readability ?
unobservedTypeResources, observedTypeResources, deletedResources := partitionResources(ownedKinds, r.resources, ownedObjs) | |
unrecordedResources, recordedResources, toBeDeletedResources := partitionResources(ownedKinds, r.resources, ownedObjs) | |
|
||
gvk := spec.GroupVersionKind() | ||
log.V(4).Info("applying resource", "resource", klog.KObj(spec), "resourceVersion", gvk.GroupVersion(), "resourceKind", gvk.Kind) | ||
err := r.Patch(ctx, spec, client.Apply, client.FieldOwner("capz-manager"), client.ForceOwnership) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, how is the controller's GUID set upon creation ? Is client.ForceOwnership
enforcing the controller's GUID ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'm following your question entirely. If by "controller" you mean "the resource reference in the ASO resource's metadata.ownerReferences
with controller: true
and "GUID" you mean metadata.uid
, then this is how things work:
- AzureASOManagedControlPlane is created, API server sets
metadata.uid
- CAPZ reconciles AzureASOManagedControlPlane, seeing it defines a ManagedCluster in its
spec.resources
- CAPZ sets the ManagedCluster's owner references to include the AzureASOManagedControlPlane, setting the reference's
uid
to the AzureASOManagedControlPlane'smetadata.uid
.
The ForceOwnership only deals with how conflicts in the SSA patch are handled. The docs for that option say that most controllers should use this.
controllers/resource_reconciler.go
Outdated
r.owner.SetResourceStatuses(newResourceStatuses) | ||
|
||
return nil | ||
return len(unobservedTypeResources) > 0, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we only going to trigger a requeue if len(unobservedTypeResources) > 0
? Should we be triggering a requeue nonetheless ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only explicitly trigger a requeue here because we're already watching ASO objects and requeueing their owners when those ASO objects change. If this is the first reconciliation though and no ASO objects were created, then we can't requeue the CAPZ object based on ASO object changes (since the objects don't exist).
Actually, CAPZ already requeues the CAPZ object in reaction to its annotations changing though, so we don't need to do any explicit requeueing here.
} | ||
} | ||
|
||
func parseOwnedKinds(value string) (sets.Set[metav1.TypeMeta], error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably have a unit test for this function validating a scenario with an ownerKind with multiple /
in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've simplified this to use logic from k8s.io/apimachinery to parse the annotation, so maybe not worth testing this rigorously directly.
@nojnhuh Let me know if this is ready for review again! |
Un-assigning myself from this PR for now since I won't be able to review it again. :) |
rename variables
tweak annotation format
don't explicitly requeue
@willie-yao I think I've addressed all the feedback on this so far, so this is ready for another look when time permits. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@nojnhuh: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR fixes a potential issue where if CAPZ creates an ASO resource defined by an ASOAPI resource, then crashes or otherwise fails to record that resource in the PATCH to the CAPZ resource status at the end of the reconciliation loop, a quickly following delete of the CAPZ resource (or removing that ASO resource from
spec
) will not know to also delete that ASO resource if it's not recorded instatus
.This change makes CAPZ record each ASO resource group/version/kind in a
sigs.k8s.io/cluster-api-provider-azure-owned-aso-kinds
annotation and send those changes to the API server before continuing to actually create resources of that type. This way, no matter when a CAPZ ASO API resource is deleted, CAPZ can rely on any ASO resource type that needs to be deleted to be in that annotation.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
I initially split this into a couple commits in case that's easier to review: one for the functional change and one to refactor things a bit.
TODOs:
Release note: