Skip to content

Bug 1873234: Manage ownership hand-off of metal3 between MAO and CBO for upgrade and downgrade #689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 31, 2020

Conversation

sadasu
Copy link
Contributor

@sadasu sadasu commented Aug 27, 2020

In an upgrade scenario, check if CBO is available to take ownership of the metal3 deployment. In that case, back off and let cluster-baremetal-operator (CBO) manage it.

In a downgrade/rollback scenario where the metal3 deployment is
annotated as being owned by cluster-baremetal-operator, but the CBO
has been removed, take back control of the metal3
deployment.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 27, 2020
@sadasu
Copy link
Contributor Author

sadasu commented Aug 27, 2020

/cc @dhellmann @michaelgugino @elmiko

@sadasu
Copy link
Contributor Author

sadasu commented Aug 27, 2020

Marking this is as WIP because I am having trouble getting one UT to pass. I have added a comment against the test and part of the test I am having trouble with. I am continuing to work on getting that working. In the meantime, here are the rest of the changes for you to look at. Thanks!

Copy link
Contributor

@michaelgugino michaelgugino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

We're after feature freeze. This isn't addressing a bug at this point, and shouldn't be considered a release blocker in any case. Let's look at this after 4.7 opens.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 27, 2020
@dhellmann
Copy link
Contributor

@sadasu is this the one we need to get into 4.6 to support upgrades to 4.7?

@sadasu
Copy link
Contributor Author

sadasu commented Aug 27, 2020

@sadasu is this the one we need to get into 4.6 to support upgrades to 4.7?

Yes. @dhellmann.

@dhellmann
Copy link
Contributor

@sadasu is this the one we need to get into 4.6 to support upgrades to 4.7?

Yes. @dhellmann.

OK, we'll need a bug, as @michaelgugino pointed out.

@sadasu sadasu changed the title WIP: MAO to stop managing metal3 deployments if CBO is available Bug 1873234: MAO to stop managing metal3 deployments if CBO is available Aug 27, 2020
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Aug 27, 2020
@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Bugzilla bug 1873234, which is invalid:

  • expected the bug to target the "4.6.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1873234: MAO to stop managing metal3 deployments if CBO is available

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu
Copy link
Contributor Author

sadasu commented Aug 27, 2020

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Bugzilla bug 1873234, which is invalid:

  • expected the bug to target the "4.6.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu sadasu changed the title Bug 1873234: MAO to stop managing metal3 deployments if CBO is available Bug 1873234: Manage ownership hand-off of metal3 between MAO and CBO for upgrade and downgrade Aug 27, 2020
@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Bugzilla bug 1873234, which is invalid:

  • expected the bug to target the "4.6.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1873234: Manage ownership hand-off of metal3 between MAO and CBO for upgrade and downgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 27, 2020
In an upgrade scenario, check if CBO is available to take ownership
of the metal3 deployment. In that case, back off and let CBO manage it.

In a downgrade/rollback scenario where the metal3 deployment is
annotated as being owned by cluster-baremetal-operator, but the
baremetal operator has been removed, take back control of the metal3
deployment.
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 27, 2020
@dhellmann
Copy link
Contributor

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Aug 27, 2020
@openshift-ci-robot
Copy link
Contributor

@dhellmann: This pull request references Bugzilla bug 1873234, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

expectedError bool
}{
{
testCase: "Only maoOwnedAnnotation",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking these annotations in tests and not checking them in real code is not useful. In any case, I don't think we need the new CBO annotation in any case, at least not in this code base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maoOwnedAnnotation was added as part of #424. Only cboOwnedAnnotation is added as part of this change. This annotation will not be added to the metal3 deployment by any component in 4.6. It will be added by CBO in 4.7. During some part of the 4.7 dev cycle, both MAO and CBO would be capable of managing the metal3 deployment and during that time, we want CBO to take precedence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing in the actual product code that checks for this condition, and these tests don't call any MAO function that might result in both being present. So, these tests are unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @dhellmann explained it better than I could have:
"The 4.6 and 4.7 configurations for metal3 will be different. Therefore if we ever have a time when 2 components are both trying to update the Deployment, they're going to thrash making changes back and forth. There are at least 2 situations where that could come up."

In this scenario where MAO and CBO are both trying to make changes to metal3 Deployment, MAO would have added the maoOwnedAnnotation to metal3 deployment with configurationA. And, CBO would also try to update the Deployment with configurationB. In addition, CBO would add the cboOwnedAnnotation to its own metal3 Deployment. With the changes added to syncBaremetalControllers() we will allow CBO to take control and MAO will back off.

Yes, no MAO function will result in both these annotation to be set. But, MAO will have to react correctly to situation where both annotations are set. Hence the test.

@sadasu
Copy link
Contributor Author

sadasu commented Aug 28, 2020

/test e2e-azure-operator

@sadasu
Copy link
Contributor Author

sadasu commented Aug 31, 2020

/retest

Copy link
Contributor

@michaelgugino michaelgugino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 31, 2020
Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mostly understand what is going on with this patch, but i have one question.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 31, 2020

@sadasu: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-azure 1a2c56d link /test e2e-azure

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for answering my question Sandhya
/approve

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elmiko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 31, 2020
@michaelgugino
Copy link
Contributor

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 31, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit f38acc8 into openshift:master Aug 31, 2020
@openshift-ci-robot
Copy link
Contributor

@sadasu: All pull requests linked via external trackers have merged:

Bugzilla bug 1873234 has been moved to the MODIFIED state.

In response to this:

Bug 1873234: Manage ownership hand-off of metal3 between MAO and CBO for upgrade and downgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@enxebre
Copy link
Member

enxebre commented Sep 1, 2020

@sadasu @dhellmann why do we need to do any check for the CBO to be available or annotations at all?
The CBO either is running or not. If it is running in 4.6 I'd expect this PR to remove syncBaremetalControllers and all baremetal specific code completely. Then during a 4.6 upgrade the CBO just takes over, in a downgrade the old mao code takes over.

Also https://github.com/openshift/machine-api-operator/blob/master/install/0000_30_machine-api-operator_04_metal3provisioning.crd.yaml and https://github.com/openshift/machine-api-operator/blob/master/install/0000_30_machine-api-operator_08_baremetalhost.crd.yaml would need to be removed.

@markmc
Copy link
Contributor

markmc commented Sep 1, 2020

@sadasu @dhellmann why do we need to do any check for the CBO to be available or annotations at all?

Covered in some detail by:

https://github.com/openshift/enhancements/blob/master/enhancements/baremetal/an-slo-for-baremetal.md#upgrade--downgrade--version-skew-strategy

and:

#689 (comment)

@enxebre
Copy link
Member

enxebre commented Sep 1, 2020

thanks for the pointers @markmc! #689 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants