Skip to content

Commit af11138

Browse files
authored
Increasing the rpc timeout (#894)
- Higher timeout is required when running on ROCm as the required kernels are compiled at runtime.
1 parent 4db1116 commit af11138

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

distributed/rpc/pipeline/main.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,9 @@ def run_master(split_size):
219219
def run_worker(rank, world_size, num_split):
220220
os.environ['MASTER_ADDR'] = 'localhost'
221221
os.environ['MASTER_PORT'] = '29500'
222-
options = rpc.TensorPipeRpcBackendOptions(num_worker_threads=256)
222+
223+
# Higher timeout is added to accommodate for kernel compilation time in case of ROCm.
224+
options = rpc.TensorPipeRpcBackendOptions(num_worker_threads=256, rpc_timeout=300)
223225

224226
if rank == 0:
225227
rpc.init_rpc(

0 commit comments

Comments
 (0)