Description
Problem
The makes_jobserver_used
test is occasionally hanging. (azure log).
I'm able to reproduce on macos with a failure rate a little under 1%. I wasn't able to repro on windows or linux.
It seems unrelated to #7731 or any of the recent jobserver releases. I was able to reproduce it with older versions.
Steps
Repeatedly run the makes_jobserver_used
test on macos until it hangs.
Notes
I haven't been able to make sense of what is happening. Best I can tell, the sequence of events is:
- cargo builds the 3 build scripts.
- one of the rustc jobs gets stuck after it finishes.
- one of the build scripts runs.
- the test is waiting for the second build script to run, but it never starts.
cargo build
already has 2 jobs running (the waiting build script, and the stuck rustc job), and thus can't spawn the next build script.
The stuck rustc job is a bit confusing. The rustc process has finished and is in the zombie state. On the Cargo side, it seems to be caught in a loop in read2
. One of the fd's is closed (either stdout or stderr, it seems random), but somehow the other one seems to still be open, but every attempt to read on it returns WouldBlock.