feat: add fallback support for value metric type #6655

y-rabie · 2025-03-27T03:18:17Z

Since the constraint on having fallback only for AverageValue seems to me kinda unwarranted, it's here relaxed a bit, and can be removed altogether if we opt for another implementation.

The constraint now becomes that the scaleTargetRef object has a field .status.readyReplicas that its controller updates with the number of ready replicas, so that we can directly use that. This is de facto the case with Deployments/StatefulSets/Replicasets/Argo Rollouts.

We can then generically fetch the object as unstructured and access the value of the field to divide by it. A brief math illustration starting with the HPA's equation

desiredReplicas = ceil [currentReplicas * (currentMetricValue/desiredMetricValue) ]

By passing currentMetricValue = desiredMetricValue * fallbackReplicas / currentReplicas

We end up with

desiredReplicas = ceil [currentReplicas * (( desiredMetricValue * fallbackReplicas / currentReplicas )/desiredMetricValue) ]

desiredReplicas = ceil [currentReplicas * (fallbackReplicas / currentReplicas ) ] = ceil [fallbackReplicas] = fallbackReplicas

Emphasis: currentReplicas in HPA's equation is the number of ready replicas.

I preferred this approach to the other one (which would remove the .status.readyReplicas field constraint) which is manually counting the number of ready pods (similar to what HPA does here), since it'd be quite involved. If it seems a better approach to you, I can implement it.
For full clarity, a problematic nit with this: is that we're dependent on the object's controller updating the .status.readyReplicas in a timely manner.

If there had been a lag, then the currentReplicas the HPA multiplies by (which it gets by manually counting pods) would deviate from the currentReplicas value we divide by.

If ours is less, then we'd scale higher than fallbackReplicas for a brief while; if ours is more, then we'd scale less than fallbackReplicas. Either way we should eventually stabilize at fallbackReplicas exactly.
Final unrelated small change for correctness sake (that I think should be fixed regardless of this PR), the line

Value: *resource.NewMilliQuantity(normalisationValue*1000*replicas, resource.DecimalSI),

has been changed to

Value: *resource.NewMilliQuantity(int64(normalisationValue*1000*replicas), resource.DecimalSI),

with normalizationValue being float64 instead of int.

This prevents early casting to int, which would discard fractions in normalizationValue early on, that would have been already rounded when multiplying by 1000 below.

Imagine normalisationValue=2.5 and fallbackReplicas=10, previously, Value = 2*1000*10 = 20000.

Now, Value = 2.5*1000*10 = int64(25000.0) = 2500.

Obviously the former will cause HPA to not scale to fallbackReplicas exactly.

For the unchecked list item, if this looks good, I'll open a PR to the docs repo.

Checklist

When introducing a new scaler, I agree with the scaling governance policy
I have verified that my change is according to the deprecations & breaking changes policy
Tests have been added
Changelog has been updated and is aligned with our changelog requirements
A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
A PR is opened to update the documentation on (repo) (if applicable)
Commits are signed with Developer Certificate of Origin (DCO - learn more)

pkg/fallback/fallback.go

Signed-off-by: y-rabie <[email protected]>

stale · 2025-05-26T04:35:45Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

y-rabie · 2025-05-28T08:53:37Z

@JorTurFer Is this something you guys would be interested in? If not, since it's stale now, I'll close it

JorTurFer · 2025-06-01T16:49:32Z

This is interesting indeed, the limitation of the metric type is something to fix. Sorry for not reviewing the PR, it was missed on my notification, I'm going to review it

JorTurFer · 2025-06-01T17:23:22Z

/run-e2e fallback
Update: You can check the progress here

JorTurFer

In general, the code looks nice, and it's a quite interesting solution! I'd need to check it by myself to be 100% sure that it works as expected, so let's see how e2e test goes. I've kept one small comment inline

JorTurFer · 2025-06-01T16:57:22Z

pkg/fallback/fallback.go

+// field that we can use directly instead of getting their selector and listing all pods with that selector, the way HPA actually
+// does it. Thus, we offset that overhead to the respective controller of the CRD.
+// Any other CRD that doesn't have `.status.readyReplicas` won't support scaling on Value metrics.
+func getReadyReplicasCount(ctx context.Context, client runtimeclient.Client, scaledObject *kedav1alpha1.ScaledObject) (int32, error) {


There is already a function that you can use to retrieve the value using the scaleclient (it's based on the same principle as you're using).

A bit above there is an example:

currentReplicas, err = resolver.GetCurrentReplicas(ctx, client, scaleClient, scaledObject) if err != nil { return nil, false, suppressedError }

Yeah I came across that, but this function returns the spec/desired replicas, not the ready replicas. HPA multiplies by ready replicas and so we need to divide by that. See HPA code reference above

No, GetCurrentReplicas returns the current replica count for a ScaledObject, so the number of running pods.

@rickbrouwer I'm not sure if you're in agreement with my point or @JorTurFer's 😅 but I can see it fetching .spec.replicas, which is obviously different from the number of ready replicas?

Sorry, you are right. I read too quickly about desired and ready.
I think the question is whether GetCurrentReplicas can also be used, or whether you also want to know if these are all ready.
GetCurrentReplicas determines the number of replicas (the number of pods) that are in the spec by retrieving it from the deployment or statefullset. I also used this in this PR to determine the current number of replicas.

I think the question is whether GetCurrentReplicas can also be used, or whether you also want to know if these are all ready.

I don't think so, we do need the number of ready replicas for the math to work out.

As to why I didn't adjust the function/do something similar with the Scale API Client, it's that the .status.readyReplicas field is not part of the Scale subresource, it only has .status.replicas which is the total number of all existing replicas, ready or not. See its type definition.

I also used this in #6464 PR to determine the current number of replicas.

Yeah I saw the PR, it did make sense there to use GetCurrentReplicas since the case only required desired replica count comparison, regardless of readiness.

JorTurFer · 2025-06-01T17:31:19Z

@wozniakjan PTAL too

y-rabie requested a review from a team as a code owner March 27, 2025 03:18

semgrep-app bot reviewed Mar 27, 2025

View reviewed changes

pkg/fallback/fallback.go Outdated Show resolved Hide resolved

semgrep-app bot reviewed Mar 27, 2025

View reviewed changes

pkg/fallback/fallback.go Outdated Show resolved Hide resolved

y-rabie force-pushed the support-fallback-for-value-type branch 4 times, most recently from 5592961 to aeb2c2d Compare March 27, 2025 03:45

feat: add fallback support for value metric type

3349924

Signed-off-by: y-rabie <[email protected]>

y-rabie force-pushed the support-fallback-for-value-type branch from aeb2c2d to 3349924 Compare March 27, 2025 03:56

stale bot added the stale All issues that are marked as stale due to inactivity label May 26, 2025

stale bot removed the stale All issues that are marked as stale due to inactivity label May 28, 2025

JorTurFer reviewed Jun 1, 2025

View reviewed changes

akahn mentioned this pull request Jun 3, 2025

Fallback for metricType: Value #4205

Open

rickbrouwer mentioned this pull request Jun 3, 2025

Admission Webhook blocks ScaledObject with fallback (using AverageValue) 2.17 #6696

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add fallback support for value metric type #6655

feat: add fallback support for value metric type #6655

Uh oh!

y-rabie commented Mar 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

stale bot commented May 26, 2025

Uh oh!

y-rabie commented May 28, 2025

Uh oh!

JorTurFer commented Jun 1, 2025

Uh oh!

JorTurFer commented Jun 1, 2025 •

edited by github-actions bot

Loading

Uh oh!

JorTurFer left a comment

Uh oh!

JorTurFer Jun 1, 2025

Uh oh!

y-rabie Jun 2, 2025

Uh oh!

rickbrouwer Jun 2, 2025

Uh oh!

y-rabie Jun 2, 2025

Uh oh!

rickbrouwer Jun 2, 2025

Uh oh!

y-rabie Jun 2, 2025 •

edited

Loading

Uh oh!

JorTurFer commented Jun 1, 2025

Uh oh!

Uh oh!

feat: add fallback support for value metric type #6655

Are you sure you want to change the base?

feat: add fallback support for value metric type #6655

Uh oh!

Conversation

y-rabie commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

Uh oh!

Uh oh!

stale bot commented May 26, 2025

Uh oh!

y-rabie commented May 28, 2025

Uh oh!

JorTurFer commented Jun 1, 2025

Uh oh!

JorTurFer commented Jun 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JorTurFer left a comment

Choose a reason for hiding this comment

Uh oh!

JorTurFer Jun 1, 2025

Choose a reason for hiding this comment

Uh oh!

y-rabie Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

rickbrouwer Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

y-rabie Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

rickbrouwer Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

y-rabie Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JorTurFer commented Jun 1, 2025

Uh oh!

Uh oh!

y-rabie commented Mar 27, 2025 •

edited

Loading

JorTurFer commented Jun 1, 2025 •

edited by github-actions bot

Loading

y-rabie Jun 2, 2025 •

edited

Loading