Skip to content

Bug 1798282: DROP: Avoid unnecessary calls to the cloud provider #24532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Miciah
Copy link
Contributor

@Miciah Miciah commented Feb 13, 2020

Drop UPSTREAM: 63926, which causes deletion of "LoadBalancer"-type services to get stuck.

#19742 added the carry in order to reduce unnecessary cloud-provider API calls, but it is no longer needed and causes problems with recent changes to the clean-up logic for services with type "LoadBalancer".

At the time the PR was written, the service controller added every newly created service to its work queue, which was also used for update and delete events. When a worker for this queue subsequently processed the service, it would check if the service needed an external load-balancer, and if not, it would then check if a load balancer existed for the service (in case the service had been added to the queue by an update that changed its type from "LoadBalancer", in which case any previously provisioned load-balancer would need to be deleted). To perform this check, the worker would make a GetLoadBalancer API call using the cloud provider.

#19742 added a check to skip the GetLoadBalancer call if the service's status indicated that it had no associated load-balancer, which should always be the case for a newly created service if its type is not "LoadBalancer".

Later, upstream commit kubernetes/kubernetes@aa3f81d modified the service controller's logic to add a newly created service to the work queue only if its type were "LoadBalancer", thereby obviating the need for the check that #19742 had added. The upstream commit also modified the service controller's load-balancer clean-up logic, and the check that #19742 added breaks this new logic.

  • vendor/k8s.io/kubernetes/pkg/controller/service/controller.go (syncLoadBalancerIfNeeded): Delete check for empty load balancer status.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Feb 13, 2020
@openshift-ci-robot
Copy link

@Miciah: This pull request references Bugzilla bug 1798282, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Bug 1798282: DROP: Avoid unnecessary calls to the cloud provider

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Feb 13, 2020
@openshift-ci-robot openshift-ci-robot added the vendor-update Touching vendor dir or related files label Feb 13, 2020
@Miciah
Copy link
Contributor Author

Miciah commented Feb 17, 2020

In the e2e-gcp-upgrade CI job, both the control-plane-upgrade test and the k8s-service-upgrade test failed. The control plane uses a load balancer that is not managed by the service controller, so this change cannot be responsible for that failure; I suspect the other failure is a flake.

The other jobs failed before the cluster was up. In e2e-cmd, the installer failed with the following:

level=error msg="Error: Error applying IAM policy to project \"openshift-gce-devel-ci\": Too many conflicts.  Latest error: Error setting IAM policy for project \"openshift-gce-devel-ci\": googleapi: Error 409: There were concurrent policy changes. Please retry the whole read-modify-write with exponential backoff., aborted"

In e2e-aws-fips, the job filed with the following:

could not resolve inputs: could not determine inputs for step [input:machine-os-content-base]: could not resolve base image: imagestreamtags.image.openshift.io "4.4:machine-os-content" is forbidden: User "system:anonymous" cannot get imagestreamtags.image.openshift.io in the namespace "ocp": no RBAC policy matched

/retest

@ironcladlou
Copy link
Contributor

/approve
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 20, 2020
@knobunc
Copy link
Contributor

knobunc commented May 20, 2020

/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 20, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

Drop UPSTREAM: 63926, which causes deletion of "LoadBalancer"-type services
to get stuck.

Commit aaf46c4 added the carry in order to
reduce unnecessary cloud-provider API calls, but it is no longer needed and
causes problems with recent changes to the clean-up logic for services with
type "LoadBalancer".

At the time the commit was written, the service controller added every
newly created service to its work queue, which was also used for update and
delete events.  When a worker for this queue subsequently processed the
service, it would check if the service needed an external load-balancer,
and if not, it would then check if a load balancer existed for the
service (in case the service had been added to the queue by an update that
changed its type from "LoadBalancer", in which case any previously
provisioned load-balancer would need to be deleted).  To perform this
check, the worker would make a GetLoadBalancer API call using the cloud
provider.

Commit aaf46c4 added a check to skip the
GetLoadBalancer call if the service's status indicated that it had no
associated load-balancer, which should always be the case for a newly
created service if its type is not "LoadBalancer".

Later, upstream commit aa3f81d modified
the service controller's logic to add a newly created service to the work
queue only if its type were "LoadBalancer", thereby obviating the need for
the check that commit aaf46c4 had added.
The upstream commit also modified the service controller's load-balancer
clean-up logic, and the check that commit
aaf46c4 added breaks this new logic.

This commit fixes bug 1798282.

https://bugzilla.redhat.com/show_bug.cgi?id=1798282

* vendor/k8s.io/kubernetes/pkg/controller/service/controller.go
(syncLoadBalancerIfNeeded): Delete check for empty load balancer status.
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@Miciah Miciah force-pushed the BZ1798282-drop-avoid-unnecessary-calls-to-the-cloud-provider branch from 5cd7b08 to 6e34acc Compare May 20, 2020 19:09
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label May 20, 2020
@Miciah
Copy link
Contributor Author

Miciah commented May 20, 2020

/test e2e-gcp-upgrade

@Miciah
Copy link
Contributor Author

Miciah commented May 21, 2020

/retest

@danehans
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 21, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danehans, ironcladlou, knobunc, Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

5 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit f8622a2 into openshift:master May 22, 2020
@openshift-ci-robot
Copy link

@Miciah: All pull requests linked via external trackers have merged: openshift/origin#24532. Bugzilla bug 1798282 has been moved to the MODIFIED state.

In response to this:

Bug 1798282: DROP: Avoid unnecessary calls to the cloud provider

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants