Skip to content

validation changes for assisted bare metal deployments #388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dhellmann
Copy link
Contributor

@dhellmann dhellmann commented Jun 24, 2020

The assisted installer is currently bypassing the installer validation
checks for the baremetal platform by using artificial data or
workarounds. This enhancement describes how we can modify the
installer to avoid those workarounds while retaining all of the checks
needed for the installer-provisioned infrastructure workflow.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 24, 2020
@stbenjam
Copy link
Member

Thanks for writing this up. I think it captures all the discussions and alternative proposals.

By doing the minimal set of validations at the InstallConfig level you can produce manifests and ignitions without needing the bootstrap infrastructure available. There’s probably some benefits there by separating the concerns even without the assisted install case.

@yevgeny-shnaidman
Copy link

@dhellmann doesn't it look a little bit strange that for some platforms we enforce validations during InstallConfig loading, and for some only during Cluster generation?

@stbenjam
Copy link
Member

stbenjam commented Jun 25, 2020

Why do you think it’s strange? The provision and platform checks that already exist for other platforms are a better fit for the checks we are doing that are causing you problems.

@romfreiman
Copy link

romfreiman commented Jun 25, 2020

I'm fine with this approach (as I stated multiple times already). My only concern is whether we're not breaking API here.
I mean if now someone expect to fail during ignition phase (maybe there are even tests that are verifying this?) but will fail later on. Would it affect big customers?
Unfortunately I dont have this knowledge, hence raising the concern

@stbenjam
Copy link
Member

I don't think any IPI user will care, and it's probably more correct and consistent with other platforms with this approach. You'd be able to run create manifests and ignition-configs from your laptop, and then bring the result to where you intend to provision, for example.

No one may even notice the change for the IPI workflow, a good percentage of users just run create cluster directly, and moving these validations later will be imperceptible (milliseconds).


The assisted installer does not need the `bootstrapProvisioningIP`,
`bootstrapProvisioningIP`, `provisioningNetworkCIDR`,
`provisioningDHCPRange`, or `provisioningNetworkInterface` fields. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about clusterOSImage and bootstrapOSImage? We could validate that if specified (they're not in the assisted install), that it's a valid URL in the installConfig, and only validate it exists in ProvisionCheck.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a paragraph about splitting those checks.

@dhellmann dhellmann force-pushed the assisted-installer-validations branch from 01c51dd to 2f6703a Compare June 25, 2020 13:08
@stbenjam
Copy link
Member

Thanks, I'm happy with this.

@hardys is out for a couple days, @crawford could you have a look at the proposal?

/assign @crawford

which validations should apply in each case, so there is no need to
make the validations configurable.

### Add an environment variable
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this is a non-starter. The installer team will reject this idea based on previous experience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that's why it's down here in the alternatives list.

@dhellmann dhellmann force-pushed the assisted-installer-validations branch from 2f6703a to e06603e Compare June 25, 2020 15:11
@markmc
Copy link
Contributor

markmc commented Jun 25, 2020

I think we're missing a description of how exactly openshift-install will be used in an assisted installation. It's touched on in this enhancement and #376, but I think anyone not close to the subject would come away quite confused.

Like, firstly an assisted install only ever uses "create ignition-configs" and not "create cluster". (I guess some would say this implies it's a UPI use case)

But rather than the typical platform: none for UPI, we want to use platform: baremetal to express some intent about how the cluster should be installed, and its final configuration.

My understanding of that intent is: (a) configuring the keepalived/haproxy API load balancer (which is IPI-ish), and (b) enable the baremetal-operator for day 2 machine API operations (usually only available for IPI). And not (c) "provision a libvirt VM with the bootstrap configuration, and a bunch of IPMI-controlled physical machines as control plane nodes using PXE" (definitely IPI).

For each baremetal.Platform field, can we make it clear which of those 3 capabilities the field relates to?

If we did, then this proposal becomes "a UPI-like use case for 2 of these 3 capabilities" and it would be more obvious which fields are required for that.

Related, AIUI the "configure the baremetal-operator" intent currently requires all of the fields in the metal3.io Provisioning? But we desire that to be a day 2 "set up PXE-based machine provisioning" thing? (That topic is touched on in openshift/machine-api-operator#560)

Describing this in terms of which validations run at which phase doesn't really resonate with me - it's more which platform: baremetal capabilities (of the three) are being requested, and which validations apply to those capabilities?

Maybe this enhancement could become more "allowing bare-metal UPI installs to use some baremetal IPI capabilities"?

@markmc
Copy link
Contributor

markmc commented Jun 25, 2020

My only concern is whether we're not breaking API here.

See my comment. If we talk about this in terms of relaxing the "you may only use these 3 capabilities together" to "you may use 2 of these capabilities, without the other" ... it becomes more obvious to me that this isn't an API-breaking change. (i.e. changing foo(bar, blaa, baz) to foo(bar, blaa, baz=None) isn't an API break)

@stbenjam
Copy link
Member

stbenjam commented Jun 25, 2020

I think we're missing a description of how exactly openshift-install will be used in an assisted installation. It's touched on in this enhancement and #376, but I think anyone not close to the subject would come away quite confused.

Like, firstly an assisted install only ever uses "create ignition-configs" and not "create cluster". (I guess some would say this implies it's a UPI use case)

Perhaps we should explicitly call this out in this enhancement. The implied point of this enhancement is to facilitate the use of the baremetal platform for a UPI-like workflow. That is, allow you to generate manifests and ignitions without us actually requiring the fields needed for the automated bootstrap and master provisioning parts.

Deferring checking those fields is totally consistent with how other platforms do things. Even if we're doing IPI, Provisioning and Platform checks are clearly the right place for those validations; we made a mistake by validating everything in the installconfig validations.

This is a bit of a contrived example but imagine a user wanting to do an IPI install, but she has a bunch of manifests and ignition customizations. It makes sense to let her do that on a development laptop, and then bring that output to the provisioning host to run create cluster. That works if we rearrange the validations to be run only where needed. It also allows the UPI-workflow assisted installer wants to do.

My understanding of that intent is: (a) configuring the keepalived/haproxy API load balancer (which is IPI-ish), and (b) enable the baremetal-operator for day 2 machine API operations (usually only available for IPI). And not (c) "provision a libvirt VM with the bootstrap configuration, and a bunch of IPMI-controlled physical machines as control plane nodes using PXE" (definitely IPI).

Why are a and b separate? The way I understand things, we're not supporting separation: if you specify baremetal platform you get (a) and (b). We've deferred the epic to make Metal3 entirely optional.

Related, AIUI the "configure the baremetal-operator" intent currently requires all of the fields in the metal3.io Provisioning? But we desire that to be a day 2 "set up PXE-based machine provisioning" thing? (That topic is touched on in openshift/machine-api-operator#560)

Well, that's changing. #386 covers the optional provisioning network. It's sort of orthogonal to the assisted install case, even if they'll use it. It's valid even in IPI.

Although one note: all provisioning decisions on day 1 are permanent. We don't intend to make it possible to turn on a provisioning network later. If you install without it (either in assisted or IPI), you get Metal3 but can only do virtual media based provisioning and all the power management things. Trying to change the Provisioning configuration on day 2 is a separate epic and set of stories we could visit in the 4.7 time frame.

@stbenjam
Copy link
Member

@markmc If we instead went to a capabilities-based approach, what would the interface look like?

@markmc
Copy link
Contributor

markmc commented Jun 25, 2020

@markmc If we instead went to a capabilities-based approach, what would the interface look like?

Something like "the presence of field A implies capabilities 1 and 2, and field B implies capability 3" ?

(i.e. we want to make field B optional if you just want capability 1 and 2)

Maybe the key difference is the docs for this would describe how to use those capabilities individually rather than talking about which fields are validated by which phases

@markmc
Copy link
Contributor

markmc commented Jun 25, 2020

My understanding of that intent is: (a) configuring the keepalived/haproxy API load balancer (which is IPI-ish), and (b) enable the baremetal-operator for day 2 machine API operations (usually only available for IPI). And not (c) "provision a libvirt VM with the bootstrap configuration, and a bunch of IPMI-controlled physical machines as control plane nodes using PXE" (definitely IPI).

Why are a and b separate? The way I understand things, we're not supporting separation: if you specify baremetal platform you get (a) and (b). We've deferred the epic to make Metal3 entirely optional.

Right, but I could imagine a future enhancement making it possible to use (a) alone in UPI installs ... and then couldn't rely on the fields/phases association, so we'd have to switch then to thinking in terms of individual capabilities. Why not make that switch in approach now?

(I think I'm advocating for more of a change in documentation - this enhancement and user docs - rather than changing the mechanism)

@dhellmann
Copy link
Contributor Author

My understanding of that intent is: (a) configuring the keepalived/haproxy API load balancer (which is IPI-ish), and (b) enable the baremetal-operator for day 2 machine API operations (usually only available for IPI). And not (c) "provision a libvirt VM with the bootstrap configuration, and a bunch of IPMI-controlled physical machines as control plane nodes using PXE" (definitely IPI).

Why are a and b separate? The way I understand things, we're not supporting separation: if you specify baremetal platform you get (a) and (b). We've deferred the epic to make Metal3 entirely optional.

Right, but I could imagine a future enhancement making it possible to use (a) alone in UPI installs ... and then couldn't rely on the fields/phases association, so we'd have to switch then to thinking in terms of individual capabilities. Why not make that switch in approach now?

(I think I'm advocating for more of a change in documentation - this enhancement and user docs - rather than changing the mechanism)

How would the user express which capabilities they want?

@kirankt
Copy link

kirankt commented Jun 26, 2020

My understanding of that intent is: (a) configuring the keepalived/haproxy API load balancer (which is IPI-ish), and (b) enable the baremetal-operator for day 2 machine API operations (usually only available for IPI). And not (c) "provision a libvirt VM with the bootstrap configuration, and a bunch of IPMI-controlled physical machines as control plane nodes using PXE" (definitely IPI).

Why are a and b separate? The way I understand things, we're not supporting separation: if you specify baremetal platform you get (a) and (b). We've deferred the epic to make Metal3 entirely optional.

Right, but I could imagine a future enhancement making it possible to use (a) alone in UPI installs ... and then couldn't rely on the fields/phases association, so we'd have to switch then to thinking in terms of individual capabilities. Why not make that switch in approach now?
(I think I'm advocating for more of a change in documentation - this enhancement and user docs - rather than changing the mechanism)

How would the user express which capabilities they want?

My guess is something similar to what @stbenjam proposed in #386. Sort of choice in a IPI platform-wide profile, which enables/disables a whole set of related-features based on a said profile. Not sure about what features need to be turned on/off but couple things come to mind right away: validations (this proposal), provisioning network (@stbenjam 's) proposal, etc.

Profiles could be:
Full deploy of IPI (default)
Day 2 deploy only (UPI, Assisted, ISO-based installs, etc)
...

@dhellmann dhellmann force-pushed the assisted-installer-validations branch from e06603e to 461a137 Compare June 26, 2020 15:37
@dhellmann
Copy link
Contributor Author

@markmc I have incorporated the feedback you gave me earlier in the latest draft.

@dhellmann
Copy link
Contributor Author

I haven't changed the title. Should I rename this to incorporate "user-provisioned" instead of focusing on the assisted installer?

@stbenjam
Copy link
Member

My guess is something similar to what @stbenjam proposed in #386. Sort of choice in a IPI platform-wide profile, which enables/disables a whole set of related-features based on a said profile.

At this point, I'd prefer not to combine the provisioning network profiles with other settings -- we may need some explicit indication later that an install as assisted, but I think this proposal covers what we need now.

@dhellmann
Copy link
Contributor Author

My guess is something similar to what @stbenjam proposed in #386. Sort of choice in a IPI platform-wide profile, which enables/disables a whole set of related-features based on a said profile.

At this point, I'd prefer not to combine the provisioning network profiles with other settings -- we may need some explicit indication later that an install as assisted, but I think this proposal covers what we need now.

Yeah, I misunderstood what Mark was suggesting. After chatting with him it was clear that he meant the framing for this proposal was focused on the solution rather than the problem, and that thinking about the problem more broadly would result in a more useful problem statement and better context for reviewers not already familiar with the assisted installer. I've tried to incorporate his feedback into the latest draft (including stealing directly from some sample text he provided).

@markmc
Copy link
Contributor

markmc commented Jun 26, 2020

My guess is something similar to what @stbenjam proposed in #386. Sort of choice in a IPI platform-wide profile, which enables/disables a whole set of related-features based on a said profile.

At this point, I'd prefer not to combine the provisioning network profiles with other settings -- we may need some explicit indication later that an install as assisted, but I think this proposal covers what we need now.

Yeah, I misunderstood what Mark was suggesting. After chatting with him it was clear that he meant the framing for this proposal was focused on the solution rather than the problem, and that thinking about the problem more broadly would result in a more useful problem statement and better context for reviewers not already familiar with the assisted installer. I've tried to incorporate his feedback into the latest draft (including stealing directly from some sample text he provided).

Thanks for your patience with my feedback. I'm happy with the proposal, and I hope the new framing of the problem helps capture where it's coming from at a higher level. 👍

@stbenjam
Copy link
Member

Are we all happy with this proposal then? Feel free to remove the hold when ready.

/lgtm
/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2020
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 26, 2020
@kirankt
Copy link

kirankt commented Jun 26, 2020

/approve

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhellmann, kirankt, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

The [connected-assisted-installer
enhancement](./connected-assisted-installer.md) describes how an agent
running on several bare-metal hosts receive a command from the
assisted installer application to begin the installation of that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I dont like the term application. I would call it either assisted installer or assisted installer service

This installer workflow mirrors the [bare-metal user-provisioned
infrastructure
workflow](https://github.com/openshift/installer/blob/master/docs/user/metal/install_upi.md)
in that the the OpenShift installer is used to generate assets for the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the the

`pkg/types/baremetal/validation` package that can be invoked from
[PlatformProvisionCheck.Generate()](https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/platformprovisioncheck.go#L37).

There are two validation rules for the `clusterOSImage` and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are those used for? just for general knowledge

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are used to optionally specify a URL for the two RHCOS images needed for an IPI baremetal deployment - these are large and by default are downloaded every deploy from the internet, so this interface enables faster repeated testing and also deployment in disconnected environments.

@romfreiman
Copy link

Looks good. I think we also discussed BMH annotations, but not 100% it's relevant to this enhancement

installer-provisioned infrastructure installer.
- Recapitulate the test plans for the assisted installer.
- Recapitulate the [approach for making parts of the bare metal
infrastructure automation optional](./baremetal-provisioning-optional.md).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broken link? couldn't find that doc

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it. I guess the link will work once the enhancement will be merged.


**For non-optional features moving to GA, the graduation criteria must include
end to end tests.**

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the sections about graduation etc don't really apply here, can we just remove them?

@hardys
Copy link
Contributor

hardys commented Jun 29, 2020

Thanks for this - it lgtm and aligns well with the ideas I'd previously discussed and mentioned on openshift/installer#3794

@dhellmann
Copy link
Contributor Author

I think this is ready to merge. We can address any other issues as follow-up patches.

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 29, 2020
@openshift-merge-robot openshift-merge-robot merged commit c6daad5 into openshift:master Jun 29, 2020
dhellmann added a commit to dhellmann/openshift-enhancements that referenced this pull request Jun 29, 2020
fix some review comments missed before openshift#388 merged

Signed-off-by: Doug Hellmann <[email protected]>
dhellmann added a commit to dhellmann/openshift-enhancements that referenced this pull request Sep 1, 2020
fix some review comments missed before openshift#388 merged

Signed-off-by: Doug Hellmann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.