Add support for node selectors based filtering in TAS #4989

mwysokin · 2025-04-15T21:47:18Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR adds support for node selectors in TAS.

Which issue(s) this PR fixes:

Fixes #4571

Special notes for your reviewer:

The PR has the following changes:

Getting node selectors from nodes and using them to filter out non matching nodes.
2 new integrations tests for cases when a node label is either added or updated.
Added additional check of ExpectClusterQueuesToBeActive for test suite Node is mutated during test cases which seemed to be missing compare to the other suites.

Does this PR introduce a user-facing change?

TAS: Add support for Node Selectors.

k8s-ci-robot · 2025-04-15T21:47:28Z

Hi @mwysokin. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

netlify · 2025-04-15T21:47:40Z

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Name	Link
🔨 Latest commit	`da959cc`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/68094c1c3b57990008dd415a

tenzen-y · 2025-04-15T22:22:31Z

/ok-to-test

mimowo

Just a quick pass, leaving more detailed review to @mbobrovskyi

pkg/cache/tas_flavor_snapshot.go

Co-authored-by: Mykhailo Bobrovskyi <[email protected]>

… selectors Co-authored-by: Patryk Bundyra <[email protected]>

pkg/cache/tas_flavor_snapshot.go

Co-authored-by: Patryk Bundyra <[email protected]>

mimowo · 2025-04-23T13:13:14Z

test/integration/singlecluster/tas/tas_test.go

+				})
+			})
+
+			ginkgo.It("should admit workload when node label value is corrected", func() {


Handling node mutations is outside the scope of this PR, and it works because of some prior work.

I think it is safe to test on a static set of nodes under "Nodes are created before test with rack being the lowest level". We could simply add a label to one of the nodes and confirm the workload picks the node.

If you think there is some value in the mutating tests, I would just keep one of the two.

My rationale was: let's have 2 tests:

one will check whether if a correct labels appears everything will work,

the other one will check whether not only the key was correct but also the value, so the key match should initially work but the values wouldn't match and the node wouldn't be selected but once the correct value is set the node would match. I guess this could be merged into a single test. I'd like to be sure that we have a proof of not matching based on the single item of key, value pair.

I merged 2 tests into one. I think it's beneficial to show that kueue reacts to events of the nodes. Using static integration tests would be simpler but since we already have the dynamic version I would use it since it covers a wider area already.

so the key match should initially work but the values wouldn't match and the node wouldn't be selected but once the correct value is set the node would match

I buy the rationale, by checking util.ExpectPendingWorkloadsMetric(clusterQueue, 0, 1) you confirm that Scheduler already processed the workload and classified as "inadmissible" without the label.

I merged 2 tests into one.

Thanks.

non-blocking (let me know what you think): I propose to drop setting the label to a wrong value (this block). The wrong label case we already cover at the unit test level in "skip node which doesn't match node selector, label exists, value doesn't match; BestFit".

…nCounts functions

mwysokin · 2025-04-23T20:25:32Z

@mimowo @PBundyra Please give it another pass 🙏 Hopefully all the comments have been addressed.

mimowo · 2025-04-24T07:26:27Z

Awesome!
/approve
/lgtm
/hold
in case you would like to address the suggestion in #4989 (comment), otherwise feel free to /unhold

/cherry-pick release-0.11

k8s-infra-cherrypick-robot · 2025-04-24T07:26:30Z

@mimowo: once the present PR merges, I will cherry-pick it on top of release-0.11 in a new PR and assign it to you.

In response to this:

Awesome!
/approve
/lgtm
/hold
in case you would like to address the suggestion in #4989 (comment), otherwise feel free to /unhold

/cherry-pick release-0.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-04-24T07:26:35Z

LGTM label has been added.

Git tree hash: 9aaa95dbbd38e6673efd367f4009815db0ed7c5f

k8s-ci-robot · 2025-04-24T07:26:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, mwysokin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mimowo]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

PBundyra · 2025-04-24T08:34:54Z

/lgtm
On my side as well

mimowo · 2025-04-24T10:00:34Z

/unhold
This suggestion in #4989 (comment) can be done in a follow up.

mimowo · 2025-04-24T10:24:31Z

/test pull-kueue-test-scheduling-perf-main
known flake #4851

k8s-infra-cherrypick-robot · 2025-04-24T10:35:15Z

@mimowo: #4989 failed to apply on top of branch "release-0.11":

Applying: Implement TAS using node selectors during node filtering
Using index info to reconstruct a base tree...
M	test/integration/singlecluster/tas/tas_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/integration/singlecluster/tas/tas_test.go
CONFLICT (content): Merge conflict in test/integration/singlecluster/tas/tas_test.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 Implement TAS using node selectors during node filtering

In response to this:

Awesome!
/approve
/lgtm
/hold
in case you would like to address the suggestion in #4989 (comment), otherwise feel free to /unhold

/cherry-pick release-0.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

mimowo · 2025-04-24T10:43:35Z

@mwysokin please prepare the cherry-pick manually using hack/cherry_pick_pull.sh

…filtering in TAS (#5079) * Implement TAS using node selectors during node filtering * REVIEW: Change logging in TAS filtering node logic back to level 2 * REVIEW: remove superfluous Set constructions Co-authored-by: Mykhailo Bobrovskyi <[email protected]> * REVIEW: remove superfluous Set construction Co-authored-by: Mykhailo Bobrovskyi <[email protected]> * REVIEW: remove overly verbose handling of case when there are no node selectors Co-authored-by: Patryk Bundyra <[email protected]> * Apply node selection logic only for the lowest level of topology * REVIEW: add error handling for invalid node selectors * Fix lint * REVIEW: simplify test label value Co-authored-by: Patryk Bundyra <[email protected]> * REVIEW: Extract compilation of the node selector outside of the fillInCounts functions * REVIEW: Merge 2 integration tests into 1 --------- Co-authored-by: Mykhailo Bobrovskyi <[email protected]> Co-authored-by: Patryk Bundyra <[email protected]>

…filtering in TAS (#5087) * Implement TAS using node selectors during node filtering * REVIEW: Change logging in TAS filtering node logic back to level 2 * REVIEW: remove superfluous Set constructions Co-authored-by: Mykhailo Bobrovskyi <[email protected]> * REVIEW: remove superfluous Set construction Co-authored-by: Mykhailo Bobrovskyi <[email protected]> * REVIEW: remove overly verbose handling of case when there are no node selectors Co-authored-by: Patryk Bundyra <[email protected]> * Apply node selection logic only for the lowest level of topology * REVIEW: add error handling for invalid node selectors * Fix lint * REVIEW: simplify test label value Co-authored-by: Patryk Bundyra <[email protected]> * REVIEW: Extract compilation of the node selector outside of the fillInCounts functions * REVIEW: Merge 2 integration tests into 1 --------- Co-authored-by: Mykhailo Bobrovskyi <[email protected]> Co-authored-by: Patryk Bundyra <[email protected]>

tenzen-y · 2025-04-24T15:37:16Z

/release-note-edit

TAS: Add support for Node Selectors. (#5087, @mwysokin)

tenzen-y · 2025-04-24T15:39:27Z

/release-note-edit

TAS: Add support for Node Selectors.

Implement TAS using node selectors during node filtering

0250fc0

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Apr 15, 2025

k8s-ci-robot requested a review from kannon92 April 15, 2025 21:47

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 15, 2025

k8s-ci-robot requested a review from mbobrovskyi April 15, 2025 21:47

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 15, 2025

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 15, 2025

mwysokin mentioned this pull request Apr 15, 2025

TAS: workloads with nodeSelectors are scheduled by TAS, but not kube-scheduler #4571

Closed

mwysokin changed the title ~~Implement TAS using node selectors during node filtering~~ Add support for node selectors based filtering in TAS Apr 15, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 15, 2025

mimowo reviewed Apr 16, 2025

View reviewed changes

pkg/cache/tas_flavor_snapshot.go Outdated Show resolved Hide resolved

PBundyra reviewed Apr 16, 2025

View reviewed changes

pkg/cache/tas_flavor_snapshot.go Outdated Show resolved Hide resolved

mimowo reviewed Apr 16, 2025

View reviewed changes

pkg/cache/tas_flavor_snapshot.go Outdated Show resolved Hide resolved

mbobrovskyi reviewed Apr 16, 2025

View reviewed changes

pkg/cache/tas_flavor_snapshot.go Outdated Show resolved Hide resolved

pkg/cache/tas_flavor_snapshot.go Outdated Show resolved Hide resolved

mwysokin and others added 7 commits April 16, 2025 21:29

REVIEW: Change logging in TAS filtering node logic back to level 2

eeeb075

REVIEW: remove superfluous Set constructions

2003fb3

Co-authored-by: Mykhailo Bobrovskyi <[email protected]>

REVIEW: remove superfluous Set construction

6f2770e

Co-authored-by: Mykhailo Bobrovskyi <[email protected]>

REVIEW: remove overly verbose handling of case when there are no node…

ccb2c40

… selectors Co-authored-by: Patryk Bundyra <[email protected]>

Apply node selection logic only for the lowest level of topology

f0b29ec

REVIEW: add error handling for invalid node selectors

dad8f05

Fix lint

fce2235

mwysokin requested review from PBundyra, mbobrovskyi and mimowo April 16, 2025 22:59

mwysokin commented Apr 16, 2025

View reviewed changes

pkg/cache/tas_flavor_snapshot.go Outdated Show resolved Hide resolved

REVIEW: simplify test label value

d7f386d

Co-authored-by: Patryk Bundyra <[email protected]>

mimowo reviewed Apr 23, 2025

View reviewed changes

mwysokin added 2 commits April 23, 2025 19:11

REVIEW: Extract compilation of the node selector outside of the fillI…

d56c6e9

…nCounts functions

REVIEW: Merge 2 integration tests into 1

da959cc

mwysokin requested review from mimowo and PBundyra April 23, 2025 20:24

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 24, 2025

k8s-ci-robot assigned mimowo Apr 24, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 24, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 24, 2025

k8s-ci-robot assigned PBundyra Apr 24, 2025

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 24, 2025

k8s-ci-robot merged commit 2b7d388 into kubernetes-sigs:main Apr 24, 2025
20 checks passed

k8s-ci-robot added this to the v0.12 milestone Apr 24, 2025

mwysokin mentioned this pull request Apr 24, 2025

Automated cherry pick of #4989: Add support for node selectors based filtering in TAS #5079

Merged

mwysokin mentioned this pull request Apr 24, 2025

Automated cherry pick of #4989: Add support for node selectors based filtering in TAS #5087

Merged

Add support for node selectors based filtering in TAS #4989

Add support for node selectors based filtering in TAS #4989

Uh oh!

Conversation

mwysokin commented Apr 15, 2025 • edited by k8s-ci-robot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

k8s-ci-robot commented Apr 15, 2025

Uh oh!

netlify bot commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Uh oh!

tenzen-y commented Apr 15, 2025

Uh oh!

mimowo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mimowo Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

mwysokin Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

mwysokin Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

mimowo Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

mwysokin commented Apr 23, 2025

Uh oh!

mimowo commented Apr 24, 2025

Uh oh!

k8s-infra-cherrypick-robot commented Apr 24, 2025

Uh oh!

k8s-ci-robot commented Apr 24, 2025

Uh oh!

k8s-ci-robot commented Apr 24, 2025

Uh oh!

PBundyra commented Apr 24, 2025

Uh oh!

mimowo commented Apr 24, 2025

Uh oh!

mimowo commented Apr 24, 2025

Uh oh!

Uh oh!

k8s-infra-cherrypick-robot commented Apr 24, 2025

Uh oh!

mimowo commented Apr 24, 2025

Uh oh!

tenzen-y commented Apr 24, 2025

Uh oh!

tenzen-y commented Apr 24, 2025

Uh oh!

Uh oh!

mwysokin commented Apr 15, 2025 •

edited by k8s-ci-robot

Loading

netlify bot commented Apr 15, 2025 •

edited

Loading