Skip to content

Add support for node selectors based filtering in TAS #4989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mwysokin
Copy link
Contributor

@mwysokin mwysokin commented Apr 15, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR adds support for node selectors in TAS.

Which issue(s) this PR fixes:

Fixes #4571

Special notes for your reviewer:

The PR has the following changes:

  1. Getting node selectors from nodes and using them to filter out non matching nodes.
  2. 2 new integrations tests for cases when a node label is either added or updated.
  3. Added additional check of ExpectClusterQueuesToBeActive for test suite Node is mutated during test cases which seemed to be missing compare to the other suites.

Does this PR introduce a user-facing change?

TAS: Add support for Node Selectors.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Apr 15, 2025
@k8s-ci-robot k8s-ci-robot requested a review from kannon92 April 15, 2025 21:47
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 15, 2025
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 15, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @mwysokin. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 15, 2025
Copy link

netlify bot commented Apr 15, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit da959cc
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/68094c1c3b57990008dd415a

@mwysokin mwysokin changed the title Implement TAS using node selectors during node filtering Add support for node selectors based filtering in TAS Apr 15, 2025
@tenzen-y
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 15, 2025
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick pass, leaving more detailed review to @mbobrovskyi

})
})

ginkgo.It("should admit workload when node label value is corrected", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling node mutations is outside the scope of this PR, and it works because of some prior work.

I think it is safe to test on a static set of nodes under "Nodes are created before test with rack being the lowest level". We could simply add a label to one of the nodes and confirm the workload picks the node.

If you think there is some value in the mutating tests, I would just keep one of the two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale was: let's have 2 tests:

  • one will check whether if a correct labels appears everything will work,
  • the other one will check whether not only the key was correct but also the value, so the key match should initially work but the values wouldn't match and the node wouldn't be selected but once the correct value is set the node would match. I guess this could be merged into a single test. I'd like to be sure that we have a proof of not matching based on the single item of key, value pair.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged 2 tests into one. I think it's beneficial to show that kueue reacts to events of the nodes. Using static integration tests would be simpler but since we already have the dynamic version I would use it since it covers a wider area already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the key match should initially work but the values wouldn't match and the node wouldn't be selected but once the correct value is set the node would match

I buy the rationale, by checking util.ExpectPendingWorkloadsMetric(clusterQueue, 0, 1) you confirm that Scheduler already processed the workload and classified as "inadmissible" without the label.

I merged 2 tests into one.

Thanks.

non-blocking (let me know what you think): I propose to drop setting the label to a wrong value (this block). The wrong label case we already cover at the unit test level in "skip node which doesn't match node selector, label exists, value doesn't match; BestFit".

@mwysokin mwysokin requested review from mimowo and PBundyra April 23, 2025 20:24
@mwysokin
Copy link
Contributor Author

@mimowo @PBundyra Please give it another pass 🙏 Hopefully all the comments have been addressed.

@mimowo
Copy link
Contributor

mimowo commented Apr 24, 2025

Awesome!
/approve
/lgtm
/hold
in case you would like to address the suggestion in #4989 (comment), otherwise feel free to /unhold

/cherry-pick release-0.11

@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: once the present PR merges, I will cherry-pick it on top of release-0.11 in a new PR and assign it to you.

In response to this:

Awesome!
/approve
/lgtm
/hold
in case you would like to address the suggestion in #4989 (comment), otherwise feel free to /unhold

/cherry-pick release-0.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 24, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 24, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 9aaa95dbbd38e6673efd367f4009815db0ed7c5f

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, mwysokin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 24, 2025
@PBundyra
Copy link
Contributor

/lgtm
On my side as well

@mimowo
Copy link
Contributor

mimowo commented Apr 24, 2025

/unhold
This suggestion in #4989 (comment) can be done in a follow up.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 24, 2025
@mimowo
Copy link
Contributor

mimowo commented Apr 24, 2025

/test pull-kueue-test-scheduling-perf-main
known flake #4851

@k8s-ci-robot k8s-ci-robot merged commit 2b7d388 into kubernetes-sigs:main Apr 24, 2025
20 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.12 milestone Apr 24, 2025
@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: #4989 failed to apply on top of branch "release-0.11":

Applying: Implement TAS using node selectors during node filtering
Using index info to reconstruct a base tree...
M	test/integration/singlecluster/tas/tas_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/integration/singlecluster/tas/tas_test.go
CONFLICT (content): Merge conflict in test/integration/singlecluster/tas/tas_test.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 Implement TAS using node selectors during node filtering

In response to this:

Awesome!
/approve
/lgtm
/hold
in case you would like to address the suggestion in #4989 (comment), otherwise feel free to /unhold

/cherry-pick release-0.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mimowo
Copy link
Contributor

mimowo commented Apr 24, 2025

@mwysokin please prepare the cherry-pick manually using hack/cherry_pick_pull.sh

k8s-ci-robot pushed a commit that referenced this pull request Apr 24, 2025
…filtering in TAS (#5079)

* Implement TAS using node selectors during node filtering

* REVIEW: Change logging in TAS filtering node logic back to level 2

* REVIEW: remove superfluous Set constructions

Co-authored-by: Mykhailo Bobrovskyi <[email protected]>

* REVIEW: remove superfluous Set construction

Co-authored-by: Mykhailo Bobrovskyi <[email protected]>

* REVIEW: remove overly verbose handling of case when there are no node selectors

Co-authored-by: Patryk Bundyra <[email protected]>

* Apply node selection logic only for the lowest level of topology

* REVIEW: add error handling for invalid node selectors

* Fix lint

* REVIEW: simplify test label value

Co-authored-by: Patryk Bundyra <[email protected]>

* REVIEW: Extract compilation of the node selector outside of the fillInCounts functions

* REVIEW: Merge 2 integration tests into 1

---------

Co-authored-by: Mykhailo Bobrovskyi <[email protected]>
Co-authored-by: Patryk Bundyra <[email protected]>
k8s-ci-robot pushed a commit that referenced this pull request Apr 24, 2025
…filtering in TAS (#5087)

* Implement TAS using node selectors during node filtering

* REVIEW: Change logging in TAS filtering node logic back to level 2

* REVIEW: remove superfluous Set constructions

Co-authored-by: Mykhailo Bobrovskyi <[email protected]>

* REVIEW: remove superfluous Set construction

Co-authored-by: Mykhailo Bobrovskyi <[email protected]>

* REVIEW: remove overly verbose handling of case when there are no node selectors

Co-authored-by: Patryk Bundyra <[email protected]>

* Apply node selection logic only for the lowest level of topology

* REVIEW: add error handling for invalid node selectors

* Fix lint

* REVIEW: simplify test label value

Co-authored-by: Patryk Bundyra <[email protected]>

* REVIEW: Extract compilation of the node selector outside of the fillInCounts functions

* REVIEW: Merge 2 integration tests into 1

---------

Co-authored-by: Mykhailo Bobrovskyi <[email protected]>
Co-authored-by: Patryk Bundyra <[email protected]>
@tenzen-y
Copy link
Member

/release-note-edit

TAS: Add support for Node Selectors. (#5087, @mwysokin)

@tenzen-y
Copy link
Member

/release-note-edit

TAS: Add support for Node Selectors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TAS: workloads with nodeSelectors are scheduled by TAS, but not kube-scheduler
7 participants