@@ -434,6 +434,155 @@ bare metal platforms - i.e. the `Disabled` state - there are greater
434
434
potential downsides from jumping into using cluster profiles for this
435
435
at this early stage.
436
436
437
+ ## Discussion
438
+
439
+ ** Q: Should BMO be CVO-managed, OLM-managed, or SLO-managed?**
440
+
441
+ @smarterclayton
442
+
443
+ I believe [ BMO] should be managed by the machine api operator. CVO
444
+ does not manage "operators", it manages resources. It does not do
445
+ conditional logic for operator deployment. That's the responsibility
446
+ of second level operators, of which MAO is one.
447
+
448
+ I don't see much difference between the current mechanism of MAO
449
+ deploying an actuator (a controller AKA an operator) and MAO deploying
450
+ the bare metal operator.
451
+
452
+ Why can't launching BMO under MAO be exactly like launching an
453
+ actuator, and then BMO manages the actuator? Or simply make the bare
454
+ metal actuator own the responsibility of managing lifecycle of its
455
+ components?
456
+
457
+ How can we make "managing sub operators" cheaper by reducing
458
+ deployment complexity?
459
+
460
+ There needs to be a second level operator that either deploys or
461
+ manages the appropriate machine components for the current
462
+ infrastructure platform.
463
+
464
+ There appears to be a missing “machine-infrastructure” operator that
465
+ acts like cluster network operator and deploys the right
466
+ components. I’m really confused why that wouldn’t just be “machine api
467
+ operator”.
468
+
469
+ Having unique operators per infrastructure sounds like an anti pattern
470
+ if we already have a top level operator.
471
+
472
+ @deads2k
473
+
474
+ There are development and support benefits to being able to divide
475
+ responsibilities between the machine-api-operator making calls to a
476
+ cloud provider API from the mechanisms that provides those cloud
477
+ provider APIs themselves and the support infrastructure for the
478
+ machines. Doing so forces good planning and API boundaries on both the
479
+ MAO and the baremetal deployments. … clear separation of
480
+ responsibility and failures for both developers and customers.
481
+
482
+ @smarterclayton
483
+
484
+ An SLO is a "component" or "subsystem" - given what we know today,
485
+ bare metal feels like our one infrastructure platform that most
486
+ deserves to be viewed as its own subsystem.
487
+
488
+ ** Q: How should BMO behave if it is SLO-managed?**
489
+
490
+ @deads2k
491
+
492
+ [ Add BMO] to the payload and then the baremetal operator would put
493
+ itself into a Disabled state if it was on a non-metal platform.
494
+
495
+ @smarterclayton
496
+
497
+ Disabled operators already need special treatment in the API. They
498
+ must be efficient and self-effacing when not used, like the image
499
+ registry, samples, or insights operators must (mark disabled, be
500
+ deemphasized in UI).
501
+
502
+ The baremetal-operator is installed by default, if infrastructure is
503
+ != BareMetal on startup then it just pauses (and does nothing) and
504
+ sets its cluster operator condition to Disabled=true, Available=true,
505
+ Progressing=False with appropriate messages, or if infrastructure ==
506
+ BareMetal, then it runs as normal. The cluster operator object is
507
+ always set, but when disabled user interfaces should convey that
508
+ disabled state differently than failing (by graying it out).
509
+
510
+ BMO must fully participate in CVO lifecycle. CVO enforces upgrade
511
+ rules. BMO API must be stable.
512
+
513
+ ** Q: Should bare metal specific CRDs be installed on all clusters or
514
+ only on bare metal clusters?**
515
+
516
+ @smarterclayton
517
+
518
+ [ Bare metal specific CRDs] feel like they are part of MAO, just like
519
+ CNO installs CRDs for the two core platform types. In general, CNO
520
+ already demonstrates this pattern and is successful doing so, so the
521
+ default answer for this pattern is MAO should behave like CNO and any
522
+ deviation needs justification.
523
+
524
+ @derekwaynecarr
525
+
526
+ I think its an error that we have namespaces and crds deployed to a
527
+ cluster for contexts that are not appropriate. we should aspire to
528
+ move away from that rather than continue to lean into it. for example,
529
+ every cluster has a openshift-kni-infra or openshift-ovirt-infra even
530
+ where it is not appropriate.
531
+
532
+ ** Q: Why not use CVO profiles to control when BMO is deployed?**
533
+
534
+ @smarterclayton
535
+
536
+ CVO Profiles were not intended to be dynamic or conditional (and there
537
+ are substantial risks to doing that).
538
+
539
+ Profiles don't seem appropriate for conditional parameterization of
540
+ the payload based on global configuration
541
+
542
+ The general problem with profiles is that they expand the scope of the
543
+ templating the payload provides. [ ..] If we expanded this to include
544
+ operators that are determined by infrastructure, then we're
545
+ potentially introducing a new variable (not just a new profile), since
546
+ we very well may want to deploy bare metal operator in a hypershift
547
+ mode.
548
+
549
+ ** Q: Why not name the new operator "metal3-operator"? Should this
550
+ operator come from the Metal3 upstream project?**
551
+
552
+ @markmc
553
+
554
+ Naming - in terms of what name shows up in ` oc get clusteroperator ` , I
555
+ think that should be a name reflecting the functionality in user terms
556
+ rather than the software project brand. And if ` baremetal ` is the name
557
+ of the clusteroperator, then I think it makes sense to follow that
558
+ through and metal3 is an implementation detail.
559
+
560
+ Scope - if we imagine other bare metal related functionality in
561
+ OpenShift that isn't directly related to the Metal3 project, do we
562
+ think that should fall under another SLO, or this one? I think it's
563
+ best to say this new SLO is where bare metal related functionality
564
+ would be managed.
565
+
566
+ Upstream project - you could imagine an upstream project which would
567
+ encapsulate the [ kustomize-based deployment
568
+ scenarios] ( https://github.com/openshift/baremetal-operator/blob/master/docs/ironic-endpoint-keepalived-configuration.md#kustomization-structure )
569
+ in the metal3/baremetal-operator project. We could re-use something
570
+ like that, but we would also need to add OpenShift integration
571
+ downstream - e.g. the clusteroperator, and checking the platform type
572
+ in the infrastructure resource. Is there an example of another SLO
573
+ that is derived from an operator that is more generally applicable?
574
+
575
+ ** Q: Why not use ` ownerReferences ` on the ` metal3 ` deployment to
576
+ indicate that it is owned by the CBO?**
577
+
578
+ (Discussion ongoing in the PR)
579
+
580
+ ** Q: If the concern is there is "too much bare metal stuff" in the MAO
581
+ repo, wouldn't that concern also apply to the [ vSphere
582
+ actuator] ( https://github.com/openshift/machine-api-operator/tree/master/pkg/controller/vsphere ) ?**
583
+
584
+ (Discussion ongoing in the PR)
585
+
437
586
## References
438
587
439
588
- "/enhancements/baremetal/baremetal-provisioning-config.md"
0 commit comments