Skip to content

Telco dataplane performance - initial commit #65266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

HughNhan
Copy link

@HughNhan HughNhan commented May 22, 2025

Description: Initial commit of dataplane performance test using Crucible/Regulus infra.
Design:

  1. Create a new test, [../ocp-qe-perfscale-ci/openshift-eng-ocp-qe-perfscale-ci-main__metal-dataplane-x86.yaml] with two chains: the exisiting"openshift-qe-installer-bm-ping" and the new "openshift-qe-installer-bm-day2-regulus" chain.
  2. The "regulus" chain has one Step that clones a fresh Regulus repo on the bastion machine. The Step also generates a test configuration file, the lab.config and passes it to Regulus.
  3. The Step invokes a Regulus script, the "run_cpt.sh" which will do the testing.

Limitations/Future features not in this commit:

  • Index results to Prow/CPT common storage.
  • Remove Regulus and Crucible artifacts after test.

@openshift-ci openshift-ci bot requested review from rpattath and rsevilla87 May 22, 2025 12:12
@HughNhan
Copy link
Author

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-metal-dataplane-x86-regulus

@openshift-ci-robot
Copy link
Contributor

@HughNhan: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@HughNhan
Copy link
Author

HughNhan commented May 22, 2025

@jtaleric @josecastillolema can you take a look. There are several issues that I need help.

  1. Should the dataplane test is a separate test as this PR is, or part of the _main test
  2. What env vars should be moved to secrets
  3. I am fuzzy on some of the target-cluster-related vars in the *ref.yaml i.e. LAB_CLOUD (piggyback from jetlag CI)

default: ""
documentation: |-
BM host for Ingress/Egress
- name: REG_SRIOV_NIC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this variable should be stored on vault, because there we can have different values for different clouds. Happy to help you with this, ping me on slack, I can give you vault access. @HughNhan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@HughNhan HughNhan May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to evaluate the strategy of what to do with these vars again at a later time when our CPT setup is more mature. To decide these env's, we need a domain expert to analyze the worker nodes, and the current state of the system after the prior Steps in the chain have finished. For now, we can rely on the regulus "smart" config util to figure out dynamically what those values should be. This smart config always runs during the Regulus setup phase.

@josecastillolema
Copy link
Contributor

josecastillolema commented May 22, 2025

Thanks for the PR @HughNhan !
I made some comments, let's work together on this.

ps. Looks like the PR needs a rebase and a make ci-operator-config registry-metadata jobs

@HughNhan
Copy link
Author

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-metal-dataplane-x86-regulus

@openshift-ci-robot
Copy link
Contributor

@HughNhan: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

rm LAB_CLOUD and a few obsoleted env's, and crucible installation logic.
Copy link
Contributor

openshift-ci bot commented May 29, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: HughNhan
Once this PR has been reviewed and has the lgtm label, please assign jtaleric for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@HughNhan
Copy link
Author

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-metal-dataplane-x86-regulus

@openshift-ci-robot
Copy link
Contributor

@HughNhan: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@HughNhan: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-metal-dataplane-x86-regulus openshift-eng/ocp-qe-perfscale-ci presubmit Presubmit changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@HughNhan
Copy link
Author

/retest

@HughNhan
Copy link
Author

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-metal-dataplane-x86-regulus

@openshift-ci-robot
Copy link
Contributor

@HughNhan: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Copy link
Contributor

openshift-ci bot commented May 29, 2025

@HughNhan: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/prow-config-filenames 4eeb7e2 link true /test prow-config-filenames
ci/prow/step-registry-shellcheck 4eeb7e2 link true /test step-registry-shellcheck
ci/prow/config 4eeb7e2 link true /test config
ci/prow/step-registry-metadata 4eeb7e2 link true /test step-registry-metadata
ci/prow/ordered-prow-config 4eeb7e2 link true /test ordered-prow-config
ci/rehearse/openshift-eng/ocp-qe-perfscale-ci/main/metal-dataplane-x86-regulus b25b911 link unknown /pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-metal-dataplane-x86-regulus
ci/prow/yamllint 4eeb7e2 link true /test yamllint
ci/prow/openshift-image-mirror-mappings 4eeb7e2 link true /test openshift-image-mirror-mappings
ci/prow/generated-config 4eeb7e2 link true /test generated-config
ci/prow/release-controller-config 4eeb7e2 link true /test release-controller-config
ci/prow/check-gh-automation 4eeb7e2 link true /test check-gh-automation
ci/prow/ci-operator-registry 4eeb7e2 link true /test ci-operator-registry
ci/prow/core-valid 4eeb7e2 link true /test core-valid
ci/prow/owners 4eeb7e2 link true /test owners
ci/prow/ci-operator-config 4eeb7e2 link true /test ci-operator-config
ci/prow/ci-operator-config-metadata 4eeb7e2 link true /test ci-operator-config-metadata
ci/prow/prow-config-semantics 4eeb7e2 link true /test prow-config-semantics

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

@HughNhan, pj-rehearse: unable to set up jobs ERROR:

failed to save config to GCS bucket: couldn't upload CONFIG_SPEC to GCS: encountered errors during upload: [writer close error: googleapi: Error 403: Request is disallowed by organization's constraints/gcp.restrictServiceUsage constraint for 'projects/1043659492591' attempting to use service 'storage.googleapis.com'., forbidden]

If the problem persists, please contact Test Platform.

@josecastillolema
Copy link
Contributor

@HughNhan thanks for migrating the repo to the redhat-performance org.
Some questions:

  1. Does this PR install and then run the regulus data-plane test? If that's the case, I would recommend to split this into two steps, one for installation and other for the test itself. I leave it up to you
  2. Does this work on both cloud31 and cloud19? For work I mean a) install Regulus and b) run the data-plane test
  3. Is there a successful run that we can take a look at?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants