Skip to content

Randomize endpoints function for the router template : bz1447115 #14008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 30, 2017

Conversation

rajatchopra
Copy link
Contributor

RFE https://bugzilla.redhat.com/show_bug.cgi?id=1447115

Inspite of roundrobin/leastconn and other balancing algorithms, some pods are observed as hot. This patch allows the haproxy config to print servers in random order so that a reset spoils any hotness from the previous run.

@openshift/networking Reviews please

/cc @brenton @knobunc

@rajatchopra rajatchopra force-pushed the randomize branch 2 times, most recently from e0f32fd to 16b631c Compare May 2, 2017 18:56
@@ -318,7 +318,7 @@ backend be_edge_http:{{$cfgIdx}}
{{- range $serviceUnitName, $weight := $cfg.ServiceUnitNames }}
{{- if ne $weight 0 }}
{{- with $serviceUnit := index $.ServiceUnits $serviceUnitName }}
{{- range $idx, $endpoint := endpointsForAlias $cfg $serviceUnit }}
{{- range $idx, $endpoint := randomEndpointsForAlias $cfg $serviceUnit true }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to do this conditionally based on a variable set above when haproxy.router.openshift.io/disable_cookies is considered.

@@ -301,6 +302,17 @@ func genCertificateHostName(hostname string, wildcard bool) string {
return hostname
}

func randomEndpointsForAlias(alias ServiceAliasConfig, svc ServiceUnit, random bool) []Endpoint {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we doc the random argument (or perhaps call it shuffle?)

As something to consider, perhaps we should make it a string? And support types "sorted" as the default, and "shuffled"... but that would allow other types in the future a little more easily. Not that I can think of any off-hand.

endpoints := endpointsForAlias(alias, svc)
if random {
for i := range endpoints {
rIndex := rand.Intn(i + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle, this isn't quite right... we need to either go backwards, or get an integer such that i ≤ j < n. It's easier to go backwards:

for i := len(slice) - 1; i > 0; i-- {
    j := rand.Intn(i + 1)
    slice[i], slice[j] = slice[j], slice[i]
}

@rajatchopra
Copy link
Contributor Author

@knobunc Thanks for the feedback. Fixed the algo. And using the env var for choice of processing. Will have to document this appropriately. Please review and I can pull the docs PR if this meets the mark.

Thanks.

@rajatchopra
Copy link
Contributor Author

[testextended][extended:networking]

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -301,6 +302,19 @@ func genCertificateHostName(hostname string, wildcard bool) string {
return hostname
}

// Returns the list of endpoints for the given route's service
// action argument further processes the list e.g. shuffle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document what the default behavior is please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@knobunc knobunc self-assigned this May 4, 2017
@knobunc
Copy link
Contributor

knobunc commented May 4, 2017

LGTM. I'd say merge, but we are closed for features at the moment. But you can back-port this now (and clone the bug for each release we are back-porting to and set the target version appropriately, I like to put the branch at the start of the subject in the bz and PR too to avoid confusion, i.e. "Randomize endpoints ..." -> "[3.2] Randomize endpoints ...").

// the endpoints (does not change the return order if the data structure did not mutate)
func processEndpointsForAlias(alias ServiceAliasConfig, svc ServiceUnit, action string) []Endpoint {
endpoints := endpointsForAlias(alias, svc)
if strings.ToLower(action) == "shuffle" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for "shuffle" over "random" or something like that? Also are there docs somewhere that describe how you use ROUTER_BACKEND_PROCESS_ENDPOINTS, or is that just a normal pod definition environment variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a normal end var to be defined in the deployment config. No docs yet. It's likely an experiment right now.

Both random and shuffle are probably fine. I felt shuffle leaves less chance of misinterpretation.

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 15, 2017
@openshift-bot openshift-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 23, 2017
@eparis
Copy link
Member

eparis commented May 30, 2017

[merge][severity: blocker]

@openshift-bot
Copy link
Contributor

openshift-bot commented May 30, 2017

continuous-integration/openshift-jenkins/merge Waiting: You are in the build queue at position: 5

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 8e341a1

@openshift-bot
Copy link
Contributor

[Test]ing while waiting on the merge queue

@openshift-bot
Copy link
Contributor

Evaluated for origin testextended up to 8e341a1

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 8e341a1

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/testextended SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended/521/) (Base Commit: 1c38059) (Extended Tests: networking)

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/1847/) (Base Commit: 1c38059)

@eparis eparis merged commit 196604e into openshift:master May 30, 2017
@rajatchopra rajatchopra deleted the randomize branch November 7, 2017 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants