Skip to content

[DNM] Ovs cpu part2 saga revert set6 #2563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

tssurya
Copy link
Contributor

@tssurya tssurya commented May 10, 2025

MISC fixes for BGP

@tssurya tssurya changed the title Ovs cpu part2 saga revert set6 [DNM] Ovs cpu part2 saga revert set6 May 10, 2025
@openshift-ci openshift-ci bot requested review from JacobTanenbaum and jcaamano May 10, 2025 18:34
Copy link
Contributor

openshift-ci bot commented May 10, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 10, 2025
@tssurya
Copy link
Contributor Author

tssurya commented May 10, 2025

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 10, 2025
Copy link
Contributor

openshift-ci bot commented May 10, 2025

@tssurya: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-hypershift-kubevirt 35bbb5c link false /test e2e-aws-ovn-hypershift-kubevirt
ci/prow/e2e-gcp-ovn-techpreview 35bbb5c link true /test e2e-gcp-ovn-techpreview
ci/prow/e2e-aws-ovn-hypershift-conformance-techpreview 35bbb5c link false /test e2e-aws-ovn-hypershift-conformance-techpreview
ci/prow/e2e-aws-ovn-edge-zones 35bbb5c link true /test e2e-aws-ovn-edge-zones
ci/prow/security 35bbb5c link false /test security
ci/prow/e2e-aws-ovn-upgrade-local-gateway 35bbb5c link true /test e2e-aws-ovn-upgrade-local-gateway
ci/prow/e2e-vsphere-ovn-techpreview 35bbb5c link false /test e2e-vsphere-ovn-techpreview
ci/prow/e2e-metal-ipi-ovn-dualstack 35bbb5c link true /test e2e-metal-ipi-ovn-dualstack

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 11, 2025
pliurh and others added 5 commits May 12, 2025 09:47
In ovn-cluster-manager, we have some code that depends on the
gateway mode.

Signed-off-by: Peng Liu <[email protected]>
(cherry picked from commit d5aa5d5)
Signed-off-by: Peng Liu <[email protected]>
(cherry picked from commit b0f6d9f)
For a number of reasons:

1. Performance

With BGP the number of routes that can be installed on the system scales
up. Tests with > 1500 routes causes trouble. The current sync
implementation has exponential loops. On each iteration,
RouteListFiltered is called which dumps all the routes on the system and
then filters client side. netlink API is by value, and this causes high
GC crunch.

Changed sync implementation to have sequential loops and a single list
call to netlink. Changed to avoid listing routes in any other case.

sinc is still a heavy operation that keeps a lock and migh require
further optimizations. This is a first take.

2. Consistency

The used RoutePartiallyEqual had a different criteria than what was used
to actually delete a route from netlink, leaving the door open for
inconsistencies. This could be thought of as a compromise due to the fact
that routes can assume different defaults when installed than those
provided. Proposal is to do a bit better and ignore those defaults when
comparing against kernel routes as a unified criteria. Should fulfill our
use cases but might need further tweaking in the future. Potential
problem is that kernel switches bad an incorrect value to default
instead of throwing us an error which can cause a route to constantly
sync. Much of this is undocumented as far as I could tell but anyway has
not been observed with the specific routes that we install.

Also, the current implementation was validating and indexing by link index
which is actually not a required field of a route. Instead, key routes by
priority, prefix and table. Route manager assumes ownership of all the
routes with any of the provided keys and will delete unrecognized routes
on sync. This is inline to what our previous netlink delete was doing.

3. Other

* Reset ticker on each sync iteration to avoid backlogging.
* Cleaned up excessive logging

Signed-off-by: Jaime Caamaño Ruiz <[email protected]>
(cherry picked from commit 78a94d4)
This reverts commit 0ee80bf.

Conflict in gateway_shared_intf.go because of not having
https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5153/files#diff-d3aa58d9b58a0a09264f072df46ab01d0501eb508c4656411ae2dc1ac68fb3c4

Signed-off-by: Surya Seetharaman <[email protected]>
(cherry picked from commit ebb7339)
(cherry picked from commit b3760a1)
This reverts commit b1525c3.

Signed-off-by: Surya Seetharaman <[email protected]>
(cherry picked from commit 936e621)
(cherry picked from commit def2909)
@tssurya tssurya force-pushed the OVS-CPU-part2-saga-revert-set6 branch from 35bbb5c to df92245 Compare May 12, 2025 07:49
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 12, 2025
@tssurya
Copy link
Contributor Author

tssurya commented May 12, 2025

/payload 4.20 nightly blocking

Copy link
Contributor

openshift-ci bot commented May 12, 2025

@tssurya: trigger 11 job(s) of type blocking for the nightly release of OCP 4.20

  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-aws-ovn-upgrade-fips
  • periodic-ci-openshift-release-master-ci-4.20-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-aws-ovn-serial
  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview
  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial
  • periodic-ci-openshift-release-master-nightly-4.20-fips-payload-scan
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/97c63030-2f07-11f0-9e4d-434818c28e50-0

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 15, 2025
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants