Multi gpu setup got stuck with accelerate but torchrun works

### System Info

```Shell
- `Accelerate` version: 1.6.0
- Platform: Linux-5.15.0-105-generic-x86_64-with-glibc2.39
- `accelerate` bash location: /root/miniconda3/envs/torch_env/bin/accelerate
- Python version: 3.10.16
- Numpy version: 1.26.4
- PyTorch version (GPU?): 2.7.0+cu128 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- PyTorch SDAA available: False
- PyTorch MUSA available: False
- System RAM: 2015.36 GB
- GPU type: NVIDIA H100 80GB HBM3
- `Accelerate` default config:
	Not found
```

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported `no_trainer` script in the `examples` folder of the `transformers` repo (such as `run_no_trainer_glue.py`)
- [x] My own task or dataset (give details below)

### Reproduction

I use a simple test script from other issues:
```python
import torch
import socket

print("============", socket.gethostname(), "===========")

num = torch.cuda.device_count()
infos = [torch.cuda.get_device_properties(i) for i in range(num)]
print(infos)
```
and the command to launch(failed) is:
`accelerate launch --num_processes=2 run.py`

Firstly it failed because it tries to connect the ipv6 address of another non-exist host:
```
[c10d] The IPv6 network addresses of (job-b1a4dd18-fb07-4177-9b83-6065e29665ac-master-0, 23456) cannot be retrieved (gai error: -2 - Name or service not known).
```
I found it in the env and replace all related variables to my current hostname, and use ping to access the host I set to verify its connectivity.(The related variables are MASTER_ADDR, PET_MASTER_ADDR, HOSTNAME).
Then it just got stuck in launching the script without any info to stdout. I think it will trigger a timeout error in the end.

But with torchrun, the script runs successfully. The command is: 
`torchrun --nproc-per-node 2 --nnodes 1 run.py`

### Expected behavior

At least it should provide some information for me to debug, and I'm wondering the difference between torchrun and accelerate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi gpu setup got stuck with accelerate but torchrun works #3568

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi gpu setup got stuck with accelerate but torchrun works #3568

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions