Closed
Description
Open MPI version
Details of the problem
When running application under SLURM allocation but using hostfile ras/slurm breaks the launch:
$ mpirun --bind-to core --map-by node -hostfile ./hfile -np 24 -mca pml ob1 -mca btl tcp,self --mca ras_base_verbose 100 <app>
[headnode:04487] mca: base: components_register: registering framework ras components
[headnode:04487] mca: base: components_register: found loaded component slurm
[headnode:04487] mca: base: components_register: component slurm register function successful
[headnode:04487] mca: base: components_open: opening ras components
[headnode:04487] mca: base: components_open: found loaded component slurm
[headnode:04487] mca: base: components_open: component slurm open function successful
[headnode:04487] mca:base:select: Auto-selecting ras components
[headnode:04487] mca:base:select:( ras) Querying component [slurm]
[headnode:04487] mca:base:select:( ras) Query of component [slurm] set priority to 50
[headnode:04487] mca:base:select:( ras) Selected component [slurm]
====================== ALLOCATED NODES ======================
cn01: flags=0x10 slots=12 max_slots=0 slots_inuse=0 state=UP
cn02: flags=0x10 slots=12 max_slots=0 slots_inuse=0 state=UP
=================================================================
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 24 slots
that were requested by the application:
/hpc/local/benchmarks/hpcx_install_Sunday/hpcx-gcc-redhat7.2/ompi-v3.0.x/tests/osu-micro-benchmarks-5.3.2/osu_barrier
Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
[headnode:04487] mca: base: close: component slurm closed
[headnode:04487] mca: base: close: unloading component slurm
if I explicitly disable slurm ras all works fine:
$ mpirun --bind-to core --map-by node -hostfile ./hfile -np 24 -mca pml ob1 -mca btl tcp,self --mca ras_base_verbose 100 --mca ras '^slurm' <app>
[headnode:06721] mca: base: components_register: registering framework ras components
[headnode:06721] mca: base: components_register: found loaded component simulator
[headnode:06721] mca: base: components_register: component simulator register function successful
[headnode:06721] mca: base: components_open: opening ras components
[headnode:06721] mca: base: components_open: found loaded component simulator
[headnode:06721] mca:base:select: Auto-selecting ras components
[headnode:06721] mca:base:select:( ras) Querying component [simulator]
[headnode:06721] mca:base:select:( ras) No component selected!
====================== ALLOCATED NODES ======================
cn01: flags=0x00 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
cn02: flags=0x00 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
=================================================================
... <app output> ...
@rhc54 is this an expected behavior?