Skip to content

AUTOSCALE-266: Refactor karpenter-operator and karpenter into v2 components #6187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 29, 2025

Conversation

maxcao13
Copy link
Member

@maxcao13 maxcao13 commented May 22, 2025

What this PR does / why we need it:

This PR refactors the karpenter deployment and assets which are managed by the karpenter
operator to CPOv2. This removes the ability to pass in the karpenter aws image as an argument
to the karpenter-operator deployment, since we can directly get the karpenter aws image from
the payload image itself from within the karpenter-operator releaseImage lookup code. This
commit also removes the deprecated karpenter code.

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 22, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 22, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 22, 2025

@maxcao13: This pull request references AUTOSCALE-266 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

What this PR does / why we need it:
This PR refactors the karpenter-operator and karpenter into CPOv2 components.

As part of this PR, one of the commits allows the hcp_controller to now manage and reconcile the karpenter deployment instead of the karpenter-operator. This is needed because in order to use CPOv2, we need to pass in a cpoContext, and it is non-trivial to create a new cpoContext specially for the karpenter-operator container.

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from rtheis and sjenning May 22, 2025 23:40
@openshift-ci openshift-ci bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels May 22, 2025
@cwbotbot
Copy link

cwbotbot commented May 23, 2025

Test Results

e2e-aws

e2e-aks

Failed Tests

Total failed tests: 9

  • TestNodePool
  • TestNodePool/HostedCluster0
  • TestNodePool/HostedCluster0/Main
  • TestNodePool/HostedCluster0/Main/TestNTOMachineConfigAppliedInPlace
  • TestNodePool/HostedCluster0/Main/TestNTOMachineConfigAppliedInPlace/EnsureNoCrashingPods

... and 4 more failed tests

@maxcao13 maxcao13 force-pushed the refactor-karpenter-cpov2 branch from db7ccdf to 06a392e Compare May 27, 2025 02:40
@muraee
Copy link
Contributor

muraee commented May 27, 2025

can you please add the karpenter-operator component to TestReconcileComponents() here: https://github.com/openshift/hypershift/blob/main/hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go#L4189-L4200

@maxcao13 maxcao13 force-pushed the refactor-karpenter-cpov2 branch from 06a392e to 006e2b5 Compare May 28, 2025 00:06
@maxcao13
Copy link
Member Author

/test e2e-aws-techpreview e2e-aws-karpenter-core

@openshift-ci-robot
Copy link

openshift-ci-robot commented May 28, 2025

@maxcao13: This pull request references AUTOSCALE-266 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

What this PR does / why we need it:

This PR refactors the karpenter deployment and assets which are managed by the karpenter
operator to CPOv2. This removes the ability to pass in the karpenter aws image as an argument
to the karpenter-operator deployment, since we can directly get the karpenter aws image from
the payload image itself from within the karpenter-operator releaseImage lookup code. This
commit also removes the deprecated karpenter code.

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@maxcao13
Copy link
Member Author

This is probably no longer WIP, thanks for the review Mulham. +1

/cc @enxebre

@openshift-ci openshift-ci bot requested a review from enxebre May 28, 2025 00:14
@maxcao13 maxcao13 changed the title WIP: AUTOSCALE-266: Refactor karpenter-operator and karpenter into v2 components AUTOSCALE-266: Refactor karpenter-operator and karpenter into v2 components May 28, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 28, 2025
This commit refactors the karpenter deployment and assets which are managed by the karpenter
operator to CPOv2. This removes the ability to pass in the karpenter aws image as an argument
to the karpenter-operator deployment, since we can directly get the karpenter aws image from
the payload image itself from within the karpenter-operator releaseImage lookup code. This
commit also removes the deprecated karpenter code.

Signed-off-by: Max Cao <[email protected]>
@maxcao13 maxcao13 force-pushed the refactor-karpenter-cpov2 branch from 006e2b5 to 5d39a51 Compare May 28, 2025 20:13
@maxcao13
Copy link
Member Author

maxcao13 commented May 28, 2025

/test e2e-aws-techpreview

The core tests are timing out, I think we need to up some of the timeouts on them in our karpenter fork. After changing it and running it locally, it seems to pass after increasing the timeouts.

Copy link
Contributor

openshift-ci bot commented May 28, 2025

@maxcao13: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-karpenter-core 006e2b5 link false /test e2e-aws-karpenter-core

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@enxebre
Copy link
Member

enxebre commented May 29, 2025

/approve

Copy link
Contributor

openshift-ci bot commented May 29, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre, maxcao13

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 29, 2025
@muraee
Copy link
Contributor

muraee commented May 29, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 29, 2025
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD bb9a7a5 and 2 for PR HEAD 5d39a51 in total

@enxebre
Copy link
Member

enxebre commented May 29, 2025

/override "Red Hat Konflux / hypershift-operator-main-enterprise-contract / hypershift-operator-main"

Copy link
Contributor

openshift-ci bot commented May 29, 2025

@enxebre: Overrode contexts on behalf of enxebre: Red Hat Konflux / hypershift-operator-main-enterprise-contract / hypershift-operator-main

In response to this:

/override "Red Hat Konflux / hypershift-operator-main-enterprise-contract / hypershift-operator-main"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD bb9a7a5 and 2 for PR HEAD 5d39a51 in total

@openshift-merge-bot openshift-merge-bot bot merged commit 4c37875 into openshift:main May 29, 2025
19 of 20 checks passed
@maxcao13 maxcao13 deleted the refactor-karpenter-cpov2 branch May 29, 2025 17:02
@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: hypershift
This PR has been included in build ose-hypershift-container-v4.20.0-202505291843.p0.g4c37875.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants