-
Notifications
You must be signed in to change notification settings - Fork 903
Request fixes #1716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request fixes #1716
Conversation
This fixes an error when building with --enable-static. Signed-off-by: Nathan Hjelm <[email protected]>
This fixes a hang caused by the request refactor work. The cm pml was not updated and was hanging is most cases. Signed-off-by: Nathan Hjelm <[email protected]>
@bosilca, @jladd-mlnx Two problems. One causing compilation failure and the other causing hangs. |
Hmm, yalla isn't hanging but needs an update as well. incoming. |
This commit brings the pml/yalla component up to date with the request rework changes. Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn Just so I understand. The root cause was because of a lack of support in the CM PML. The combination of OSHMEM with CM PML MXM MTL and IKRIT SML just happened to trigger the hang. OSHMEM itself was not involved in the hang. Correct? |
Looks like it. |
Need jenkins to confirm. |
@bosilca I still see a lot of references to ompi_request_lock in ompi. I think most of them can go away. Is there a reason those were left? |
In all instances where the request lock protects the request_complete it is not necessary anymore. @thananon started to remove them but apparently he stopped after ob1. |
@thananon Should look at the ones in pml/ucx and see if they are still relevant. I got yalla and cm. |
Unrelated error? |
:bot:retest: |
The second part of the error is more interesting "Cannot allocate memory. size = 14" |
It's an OpenIB error on PML OB1. It's a thread multiple test.
|
@jladd-mlnx Is that a jenkins machine failure or legitimate? |
@hjelmn Something random. It ran once, and now it's failing repeatedly. The OSHMEM command line works, however. Jenkins is in fine health. This is a legitimate failure. |
@jladd-mlnx General XRC failure or openib btl XRC failure? |
@hjelmn OpenIB BTL XRC failure. |
Well, since this is a different failure I will merge this. Will have to dissect the other failure tomorrow. |
I might be able to get to it either Friday afternoon, or early next week. |
No description provided.