Skip to content

Commit 461a137

Browse files
committed
validation changes for assisted bare metal deployments
Signed-off-by: Doug Hellmann <[email protected]>
1 parent 8ae65e0 commit 461a137

File tree

1 file changed

+280
-0
lines changed

1 file changed

+280
-0
lines changed
Lines changed: 280 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,280 @@
1+
---
2+
title: assisted-installer-bare-metal-validations
3+
authors:
4+
- "@dhellmann"
5+
- "@stbenjam"
6+
reviewers:
7+
- "@romfreiman"
8+
- "@yevgeny-shnaidman"
9+
approvers:
10+
- "@hardys"
11+
- "@crawford"
12+
- "@markmc"
13+
creation-date: 2020-06-24
14+
last-updated: 2020-06-26
15+
status: implementable
16+
---
17+
18+
# Assisted Installer Bare Metal Validations
19+
20+
## Release Signoff Checklist
21+
22+
- [x] Enhancement is `implementable`
23+
- [ ] Design details are appropriately documented from clear requirements
24+
- [ ] Test plan is defined
25+
- [ ] Graduation criteria for dev preview, tech preview, GA
26+
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)
27+
28+
## Summary
29+
30+
The assisted installer supports a user-provisioned infrastructure
31+
installation workflow for bare-metal clusters, but configures the
32+
installed cluster with certain capabilities that are currently only
33+
available using the `baremetal` platform in an installer-provisioned
34+
infrastructure workflow.
35+
36+
This enhancement describes how we can modify the installer to allow
37+
bare-metal user-provisioned infrastructure clusters, including those
38+
produced with the assisted installer, to include those
39+
installer-provisioned infrastructure capabilities.
40+
41+
## Motivation
42+
43+
The [connected-assisted-installer
44+
enhancement](./connected-assisted-installer.md) describes how an agent
45+
running on several bare-metal hosts receive a command from the
46+
assisted installer application to begin the installation of that
47+
host. One of the first tasks that each host performs upon receiving
48+
this instruction is to download, from the assisted installer, the
49+
appropriate Ignition file for that host's role. These ignition files
50+
are generated by the assisted installer earlier in the process using
51+
the `openshift-install create ignition-configs` command.
52+
53+
This installer workflow mirrors the [bare-metal user-provisioned
54+
infrastructure
55+
workflow](https://github.com/openshift/installer/blob/master/docs/user/metal/install_upi.md)
56+
in that the the OpenShift installer is used to generate assets for the
57+
cluster but the `openshift-install create cluster` command is not used
58+
to manage the deployment. However, the assisted installer does need to
59+
enable some `baremetal` installer-provisioned infrastructure platform
60+
features on the installed cluster. Therefore, we need to consider how
61+
to make those features available while using a user-provisioned
62+
infrastructure workflow. Currently, the assisted installer is using
63+
the [workaround of providing dummy
64+
data](https://github.com/filanov/bm-inventory/blob/54d56712b59d306125ee68d2014c98cf1e995e75/internal/installcfg/installcfg.go#L143-L190)
65+
for fields that are currently required by the `baremetal` platform but
66+
that are unused in this use-case.
67+
68+
### Goals
69+
70+
- Allow the use of certain installer-provisioned infrastructure
71+
features on bare-metal user-provisioned infrastructure clusters
72+
- Avoid making any incompatible changes to the bare-metal
73+
installer-provisioned infrastructure workflow
74+
75+
### Non-Goals
76+
77+
- Removing or relaxing any validation rules for the
78+
installer-provisioned infrastructure installer.
79+
- Recapitulate the test plans for the assisted installer.
80+
- Recapitulate the [approach for making parts of the bare metal
81+
infrastructure automation optional](./baremetal-provisioning-optional.md).
82+
83+
## Proposal
84+
85+
The [`baremetal.Platform` fields in
86+
`install-config.yaml`](https://github.com/openshift/installer/blob/master/docs/user/metal/install_ipi.md#install-config)
87+
can be considered to be configuring three distinct capabilities:
88+
89+
1. Provision a libvirt VM with the bootstrap configuration, and use
90+
IPMI/PXE to provision a set of hosts as control plane nodes.
91+
2. Configure the `baremetal-operator` to run on the installed cluster
92+
for day 2 machine-management operations such as power control and
93+
automated provisioning of additional workers.
94+
3. Configure [cluster-hosted control-plane
95+
load-balancing](https://github.com/openshift/installer/blob/master/docs/design/baremetal/networking-infrastructure.md)
96+
using `keepalived` and `haproxy`.
97+
98+
The proposal is to move where some of the the input validation happens
99+
to allow the latter 2 capabilities to be configured for
100+
user-provisioned infrastructure, without requiring configuration that
101+
is only relevant to the first capability.
102+
103+
### User Stories [optional]
104+
105+
#### Story 1
106+
107+
As a developer building the assisted installer tools, I want to be
108+
able to run the OpenShift installer to generate assets for a bare
109+
metal cluster without invoking validations that only apply to the
110+
installer-provisioned infrastructure workflows so I can use the assets
111+
in user-provisioned workflows.
112+
113+
#### Story 2
114+
115+
As a developer of the OpenShift installer, I want to be able to add
116+
validation rules for bare metal clusters correctly based on whether
117+
the cluster is being built using the installer-provisioned
118+
infrastructure workflow or the user-provisioned workflow so I can
119+
apply only the rules necessary for each case.
120+
121+
### Implementation Details/Notes/Constraints [optional]
122+
123+
All of the validation is currently performed during early phases of
124+
the installer for the `InstallConfig` target. The user-provisioned
125+
workflow used by the assisted installer does not run the phase for the
126+
`Cluster` target (it only ever runs up to the `IgnitionConfigs`
127+
target), so if we move some of the validation into the later phase we
128+
will have a better separation of the parts needed to build the assets
129+
for the user-provisioned workflow and the assisted installer from the
130+
parts needed to run the installer for installer-provisioned
131+
infrastructure for bare metal. A secondary benefit is that the
132+
validation for the `baremetal` platform will more closely match the
133+
validation for cloud platforms.
134+
135+
The install config inputs are currently validated as part of
136+
generating the `InstallConfig` asset in
137+
[InstallConfig.finish()](https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/installconfig.go#L141),
138+
which invokes some logic that applies to all platforms in
139+
[ValidateInstallConfig()](https://github.com/openshift/installer/blob/master/pkg/types/validation/installconfig.go#L40)
140+
before running the platform-specific logic from
141+
[validatePlatform()](https://github.com/openshift/installer/blob/master/pkg/types/validation/installconfig.go#L341). For
142+
the `baremetal` platform that results in calling
143+
[ValidatePlatform()](https://github.com/openshift/installer/blob/master/pkg/types/baremetal/validation/platform.go#L218).
144+
145+
The
146+
[PlatformProvisionCheck](https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/platformprovisioncheck.go)
147+
and
148+
[PlatformCredsCheck](https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/platformcredscheck.go)
149+
assets are both dependencies for the `Cluster` asset, which means they
150+
are evaluated later in the installation process. Both already have
151+
platform-specific logic, although not for the `baremetal` platform.
152+
153+
The assisted installer does not need the `bootstrapProvisioningIP`,
154+
`provisioningNetworkCIDR`, `provisioningDHCPRange`, or
155+
`provisioningNetworkInterface` fields. The values in those fields are
156+
used to configure the provisioning tools used for the
157+
installer-provisioned infrastructure workflow, so those validation
158+
checks could move to a new function in the
159+
`pkg/types/baremetal/validation` package that can be invoked from
160+
[PlatformProvisionCheck.Generate()](https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/platformprovisioncheck.go#L37).
161+
162+
There are two validation rules for the `clusterOSImage` and
163+
`bootstrapOSImage` fields, to ensure the value is a valid URL and to
164+
ensure the URL exists. Those checks can be split between the two
165+
phases, so that the URL format validation is done in
166+
`ValidateInstallConfig()` and the existence check is moved to
167+
`PlatformProvisionCheck`.
168+
169+
The assisted installer does not need the `libvirtURI` or host BMC
170+
credentials, so the checks for those values could move to a new public
171+
function in the `pkg/types/baremetal/validation` package that can be
172+
invoked from
173+
[PlatformCredsCheck.Generate()](https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/platformcredscheck.go#L60).
174+
175+
Changing the BMC credential validation in the installer is only part
176+
of the work required to make `BareMetalHost` resources work without
177+
credentials. The `baremetal-operator` also [needs to be
178+
updated](https://github.com/metal3-io/baremetal-operator/pull/563) to
179+
not treat the missing credentials as an error.
180+
181+
### Risks and Mitigations
182+
183+
Moving some of the validation steps means that invalid data will be
184+
caught later in the process. Most users will run the `create cluster`
185+
command directly, so they will not see a significant change. When the
186+
installer runs under Hive, or if the user runs the phases individually
187+
so they can inject extra manifests, the `create cluster` step may
188+
happen much later in the process. Bad data will still be caught before
189+
the bootstrap VM is launched, though, so it should be relatively quick
190+
in the overall installation process.
191+
192+
There are no security implications for this change.
193+
194+
## Design Details
195+
196+
### Test Plan
197+
198+
Besides the existing and new unit tests, the existing e2e tests for
199+
the `baremetal` platform will continue to test the full set of
200+
validation rules. The test suite for the assisted installer will
201+
verify that no extra validation rules are applied.
202+
203+
### Graduation Criteria
204+
205+
**Note:** *Section not required until targeted at a release.*
206+
207+
Define graduation milestones.
208+
209+
These may be defined in terms of API maturity, or as something else. Initial proposal
210+
should keep this high-level with a focus on what signals will be looked at to
211+
determine graduation.
212+
213+
Consider the following in developing the graduation criteria for this
214+
enhancement:
215+
- Maturity levels - `Dev Preview`, `Tech Preview`, `GA`
216+
- Deprecation
217+
218+
Clearly define what graduation means.
219+
220+
#### Examples
221+
222+
These are generalized examples to consider, in addition to the aforementioned
223+
[maturity levels][maturity-levels].
224+
225+
##### Dev Preview -> Tech Preview
226+
227+
- Ability to utilize the enhancement end to end
228+
- End user documentation, relative API stability
229+
- Sufficient test coverage
230+
- Gather feedback from users rather than just developers
231+
232+
##### Tech Preview -> GA
233+
234+
- More testing (upgrade, downgrade, scale)
235+
- Sufficient time for feedback
236+
- Available by default
237+
238+
**For non-optional features moving to GA, the graduation criteria must include
239+
end to end tests.**
240+
241+
## Implementation History
242+
243+
Major milestones in the life cycle of a proposal should be tracked in `Implementation
244+
History`.
245+
246+
## Alternatives
247+
248+
### Add a new platform
249+
250+
We could add a new platform to the installer to provide a clear home
251+
for all of the logic that is specific to the assisted installer. This
252+
would be a more invasive change, and might end up being confusing
253+
since the installer would either still need to configure the cluster
254+
with the `baremetal` platform or the `machine-api-operator` would need
255+
to understand a new platform and configure the correct machine API
256+
implementation.
257+
258+
### Add flag(s) to install-config.yml
259+
260+
We could add a flag or other input field to `install-config.yml` to
261+
control the validation behavior. For example,
262+
<https://github.com/openshift/installer/pull/3794> proposes a list of
263+
validation rules to skip. There are two reasons not to do this. First,
264+
it would expose the validation rules through the installer's public
265+
API, even though we would only want the assisted installer to use the
266+
feature. Second, it places the burden of managing the set of
267+
validations on the wrapper in the assisted installer, instead of on
268+
the author of validation rules in the installer source code. We know
269+
which validations should apply in each case, so there is no need to
270+
make the validations configurable.
271+
272+
### Add an environment variable
273+
274+
An environment variable is a hidden API control for the installer,
275+
which makes using it and debugging its use more difficult. It also
276+
implies that there may be checks for the variable in several places in
277+
the code where the existing validation rules are, which spreads the
278+
knowledge of which validation rules apply in different scenarios
279+
throughout the installer code base, instead of using the existing
280+
dependency structure to manage them.

0 commit comments

Comments
 (0)