Skip to content

mtl: ofi change to allow cxi anywhere in provname -v5.0.x #13090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 14, 2025

Conversation

edgargabriel
Copy link
Member

Note: This is needed to allow for cases when CXI
appears elsewhere in the libfabirc provider name
(e.g., "shm+cxi:linkx").

Signed-off-by: Thomas Naughton [email protected]
Signed-off-by: Amir Shehata [email protected]
(cherry picked from commit 63222d3)

Note: This is needed to allow for cases when CXI
      appears elsewhere in the libfabirc provider name
      (e.g., "shm+cxi:linkx").

Signed-off-by: Thomas Naughton <[email protected]>
Signed-off-by: Amir Shehata <[email protected]>
(cherry picked from commit 63222d3)
@github-actions github-actions bot added this to the v5.0.6 milestone Feb 11, 2025
@edgargabriel
Copy link
Member Author

@amirshehataornl can you please test and confirm that this fixes the issue?

@edgargabriel edgargabriel changed the title mtl: ofi change to allow cxi anywhere in provname mtl: ofi change to allow cxi anywhere in provname -v5.0.x Feb 12, 2025
@naughtont3
Copy link
Contributor

@edgargabriel we are testing this with libfabric-2.0.0 locally and will verify once tests complete. In general looks correct, but want to verify nothing else needed.

@naughtont3 naughtont3 self-requested a review February 13, 2025 16:35
@naughtont3
Copy link
Contributor

For ticket trails...

frontier01668: $ mpirun  --map-by ppr:1:l3cache --bind-to core --np 2 gpuwrapper.sh /ccs/home/naughton/projects.frontier/osu/osu-micro-benchmarks-7.3/BUILD-openmpi-5.0.7rc2+pr13090/_install/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bibw -d rocm D D 
[../../../../../c/mpi/pt2pt/standard/osu_bibw.c:52] TJN_DBG: NOTE - SET TJN_DBG ENVVAR TO NUM SECONDS TO SLEEP FOR ATTACH
[../../../../../c/mpi/pt2pt/standard/osu_bibw.c:52] TJN_DBG: NOTE - SET TJN_DBG ENVVAR TO NUM SECONDS TO SLEEP FOR ATTACH
# OSU MPI-ROCM Bi-Directional Bandwidth Test v7.3
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# Size      Bandwidth (MB/s)
# Datatype: MPI_CHAR.
1                       0.27
2                       0.53
4                       1.07
8                       2.13
16                      4.26
32                      8.52
64                     17.05
128                    33.75
256                    67.14
512                   133.60
1024                  270.41
2048                  535.92
4096                 1068.77
8192                 2109.81
16384                4117.13
32768                7899.61
65536               14470.56
131072              25361.91
262144              40604.33
524288              58185.65
1048576             74063.17
2097152             85754.04
4194304             92790.90
frontier01668: $ module list

Currently Loaded Modules:
  1) craype-x86-trento        6) cray-libsci/23.12.5  11) lfs-wrapper/0.0.1                     16) json-c/0.17
  2) perftools-base/23.12.0   7) PrgEnv-cray/8.5.0    12) DefApps                               17) libfabric/2.0.0.debug
  3) cce/17.0.0               8) Core/24.07           13) craype-accel-amd-gfx90a               18) openmpi/5.0.7rc2+pr13090
  4) craype/2.7.31.11         9) tmux/3.4             14) rocm/5.7.1
  5) cray-dsmml/0.2.2        10) hsi/default          15) xpmem/2.8.4-1.0_7.3__ga37cbd9.shasta

Inactive Modules:
  1) darshan-runtime

frontier01668: $
frontier01668: $ env | egrep "MCA|FI_LNX"
PMIX_MCA_gds=hash
OMPI_MCA_plm_slurm_args=--external-launcher
PRTE_MCA_ras_slurm_use_entire_allocation=1
PRTE_MCA_ras_base_launch_orted_on_hn=1
PRTE_MCA_plm_slurm_args=--external-launcher
FI_LNX_PROV_LINKS=shm+cxi
OMPI_MCA_opal_common_ofi_provider_include=shm+cxi:lnx
OMPI_MCA_pml=^ucx
FI_LNX_DISABLE_SHM=0
FI_LNX_USE_SRQ=1
PRTE_MCA_prte_routed_radix=128
OMPI_MCA_mtl=ofi
OMPI_MCA_btl=^tcp,openib
frontier01668: $

@janjust janjust merged commit 83829b1 into open-mpi:v5.0.x Feb 14, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants