-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: add fallback support for value metric type #6655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
5592961
to
aeb2c2d
Compare
Signed-off-by: y-rabie <[email protected]>
aeb2c2d
to
3349924
Compare
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
@JorTurFer Is this something you guys would be interested in? If not, since it's stale now, I'll close it |
This is interesting indeed, the limitation of the metric type is something to fix. Sorry for not reviewing the PR, it was missed on my notification, I'm going to review it |
/run-e2e fallback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, the code looks nice, and it's a quite interesting solution! I'd need to check it by myself to be 100% sure that it works as expected, so let's see how e2e test goes. I've kept one small comment inline
// field that we can use directly instead of getting their selector and listing all pods with that selector, the way HPA actually | ||
// does it. Thus, we offset that overhead to the respective controller of the CRD. | ||
// Any other CRD that doesn't have `.status.readyReplicas` won't support scaling on Value metrics. | ||
func getReadyReplicasCount(ctx context.Context, client runtimeclient.Client, scaledObject *kedav1alpha1.ScaledObject) (int32, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a function that you can use to retrieve the value using the scaleclient (it's based on the same principle as you're using).
A bit above there is an example:
currentReplicas, err = resolver.GetCurrentReplicas(ctx, client, scaleClient, scaledObject)
if err != nil {
return nil, false, suppressedError
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I came across that, but this function returns the spec/desired replicas, not the ready replicas. HPA multiplies by ready replicas and so we need to divide by that. See HPA code reference above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, GetCurrentReplicas returns the current replica count for a ScaledObject, so the number of running pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rickbrouwer I'm not sure if you're in agreement with my point or @JorTurFer's 😅 but I can see it fetching .spec.replicas, which is obviously different from the number of ready replicas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, you are right. I read too quickly about desired and ready.
I think the question is whether GetCurrentReplicas can also be used, or whether you also want to know if these are all ready.
GetCurrentReplicas determines the number of replicas (the number of pods) that are in the spec by retrieving it from the deployment or statefullset. I also used this in this PR to determine the current number of replicas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the question is whether GetCurrentReplicas can also be used, or whether you also want to know if these are all ready.
I don't think so, we do need the number of ready replicas for the math to work out.
As to why I didn't adjust the function/do something similar with the Scale API Client, it's that the .status.readyReplicas
field is not part of the Scale subresource, it only has .status.replicas
which is the total number of all existing replicas, ready or not. See its type definition.
I also used this in #6464 PR to determine the current number of replicas.
Yeah I saw the PR, it did make sense there to use GetCurrentReplicas since the case only required desired replica count comparison, regardless of readiness.
@wozniakjan PTAL too |
Since the constraint on having fallback only for
AverageValue
seems to me kinda unwarranted, it's here relaxed a bit, and can be removed altogether if we opt for another implementation.The constraint now becomes that the
scaleTargetRef
object has a field.status.readyReplicas
that its controller updates with the number of ready replicas, so that we can directly use that. This is de facto the case withDeployments
/StatefulSets
/Replicasets
/ArgoRollouts
.We can then generically fetch the object as
unstructured
and access the value of the field to divide by it. A brief math illustration starting with the HPA's equationdesiredReplicas = ceil [currentReplicas * (currentMetricValue/desiredMetricValue) ]
By passing
currentMetricValue = desiredMetricValue * fallbackReplicas / currentReplicas
We end up with
desiredReplicas = ceil [currentReplicas * (( desiredMetricValue * fallbackReplicas / currentReplicas )/desiredMetricValue) ]
desiredReplicas = ceil [currentReplicas * (fallbackReplicas / currentReplicas ) ] = ceil [fallbackReplicas] = fallbackReplicas
Emphasis: currentReplicas in HPA's equation is the number of ready replicas.
I preferred this approach to the other one (which would remove the
.status.readyReplicas
field constraint) which is manually counting the number of ready pods (similar to what HPA does here), since it'd be quite involved. If it seems a better approach to you, I can implement it.For full clarity, a problematic nit with this: is that we're dependent on the object's controller updating the
.status.readyReplicas
in a timely manner.If there had been a lag, then the
currentReplicas
the HPA multiplies by (which it gets by manually counting pods) would deviate from thecurrentReplicas
value we divide by.If ours is less, then we'd scale higher than
fallbackReplicas
for a brief while; if ours is more, then we'd scale less thanfallbackReplicas
. Either way we should eventually stabilize atfallbackReplicas
exactly.Final unrelated small change for correctness sake (that I think should be fixed regardless of this PR), the line
Value: *resource.NewMilliQuantity(normalisationValue*1000*replicas, resource.DecimalSI),
has been changed to
Value: *resource.NewMilliQuantity(int64(normalisationValue*1000*replicas), resource.DecimalSI),
with
normalizationValue
beingfloat64
instead of int.This prevents early casting to int, which would discard fractions in
normalizationValue
early on, that would have been already rounded when multiplying by 1000 below.Imagine
normalisationValue=2.5
andfallbackReplicas=10
, previously,Value = 2*1000*10 = 20000
.Now,
Value = 2.5*1000*10 = int64(25000.0) = 2500
.Obviously the former will cause HPA to not scale to fallbackReplicas exactly.
For the unchecked list item, if this looks good, I'll open a PR to the docs repo.
Checklist