Skip to content

Commit e9ec56c

Browse files
authored
fix docstring code example for distributed shuffle (#7166)
1 parent 548d2d2 commit e9ec56c

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/datasets/arrow_dataset.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5146,7 +5146,7 @@ def to_iterable_dataset(self, num_shards: Optional[int] = 1) -> "IterableDataset
51465146
```python
51475147
>>> from datasets.distributed import split_dataset_by_node
51485148
>>> ids = ds.to_iterable_dataset(num_shards=512)
5149-
>>> ids = ids.shuffle(buffer_size=10_000) # will shuffle the shards order and use a shuffle buffer when you start iterating
5149+
>>> ids = ids.shuffle(buffer_size=10_000, seed=42) # will shuffle the shards order and use a shuffle buffer when you start iterating
51505150
>>> ids = split_dataset_by_node(ds, world_size=8, rank=0) # will keep only 512 / 8 = 64 shards from the shuffled lists of shards when you start iterating
51515151
>>> dataloader = torch.utils.data.DataLoader(ids, num_workers=4) # will assign 64 / 4 = 16 shards from this node's list of shards to each worker when you start iterating
51525152
>>> for example in ids:

0 commit comments

Comments
 (0)