-
Notifications
You must be signed in to change notification settings - Fork 216
Bug 1873234: Manage ownership hand-off of metal3 between MAO and CBO for upgrade and downgrade #689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Marking this is as WIP because I am having trouble getting one UT to pass. I have added a comment against the test and part of the test I am having trouble with. I am continuing to work on getting that working. In the meantime, here are the rest of the changes for you to look at. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
We're after feature freeze. This isn't addressing a bug at this point, and shouldn't be considered a release blocker in any case. Let's look at this after 4.7 opens.
@sadasu is this the one we need to get into 4.6 to support upgrades to 4.7? |
Yes. @dhellmann. |
OK, we'll need a bug, as @michaelgugino pointed out. |
@sadasu: This pull request references Bugzilla bug 1873234, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@sadasu: This pull request references Bugzilla bug 1873234, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@sadasu: This pull request references Bugzilla bug 1873234, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
In an upgrade scenario, check if CBO is available to take ownership of the metal3 deployment. In that case, back off and let CBO manage it. In a downgrade/rollback scenario where the metal3 deployment is annotated as being owned by cluster-baremetal-operator, but the baremetal operator has been removed, take back control of the metal3 deployment.
/bugzilla refresh |
@dhellmann: This pull request references Bugzilla bug 1873234, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
expectedError bool | ||
}{ | ||
{ | ||
testCase: "Only maoOwnedAnnotation", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking these annotations in tests and not checking them in real code is not useful. In any case, I don't think we need the new CBO annotation in any case, at least not in this code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maoOwnedAnnotation was added as part of #424. Only cboOwnedAnnotation is added as part of this change. This annotation will not be added to the metal3 deployment by any component in 4.6. It will be added by CBO in 4.7. During some part of the 4.7 dev cycle, both MAO and CBO would be capable of managing the metal3 deployment and during that time, we want CBO to take precedence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing in the actual product code that checks for this condition, and these tests don't call any MAO function that might result in both being present. So, these tests are unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @dhellmann explained it better than I could have:
"The 4.6 and 4.7 configurations for metal3 will be different. Therefore if we ever have a time when 2 components are both trying to update the Deployment, they're going to thrash making changes back and forth. There are at least 2 situations where that could come up."
In this scenario where MAO and CBO are both trying to make changes to metal3 Deployment, MAO would have added the maoOwnedAnnotation to metal3 deployment with configurationA. And, CBO would also try to update the Deployment with configurationB. In addition, CBO would add the cboOwnedAnnotation to its own metal3 Deployment. With the changes added to syncBaremetalControllers() we will allow CBO to take control and MAO will back off.
Yes, no MAO function will result in both these annotation to be set. But, MAO will have to react correctly to situation where both annotations are set. Hence the test.
/test e2e-azure-operator |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i mostly understand what is going on with this patch, but i have one question.
@sadasu: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for answering my question Sandhya
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: elmiko The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
/retest Please review the full test history for this PR and help us cut down flakes. |
@sadasu: All pull requests linked via external trackers have merged: Bugzilla bug 1873234 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@sadasu @dhellmann why do we need to do any check for the CBO to be available or annotations at all? Also https://github.com/openshift/machine-api-operator/blob/master/install/0000_30_machine-api-operator_04_metal3provisioning.crd.yaml and https://github.com/openshift/machine-api-operator/blob/master/install/0000_30_machine-api-operator_08_baremetalhost.crd.yaml would need to be removed. |
Covered in some detail by: and: |
thanks for the pointers @markmc! #689 (comment) |
In an upgrade scenario, check if CBO is available to take ownership of the metal3 deployment. In that case, back off and let cluster-baremetal-operator (CBO) manage it.
In a downgrade/rollback scenario where the metal3 deployment is
annotated as being owned by cluster-baremetal-operator, but the CBO
has been removed, take back control of the metal3
deployment.