docker/k8s/batch: increase /dev/shm size for larger datasets #428
Labels
aws_batch
bug
Something isn't working
docker
good first issue
Good for newcomers
kubernetes
kubernetes and volcano schedulers
module: runner
issues related to the torchx.runner and torchx.scheduler modules
Milestone
🐛 Bug
When running models that need to load large datasets via PyTorch dataloaders they need
/dev/shm
to be sufficiently sized for data to be transferred between processes. Docker/K8S has a default/dev/shm
size of64MB
which is much too small. Increasing the size doesn't eat up memory until it's allocated so we should be safe to set the size to be the full memory allocated for the container.Module (check all that applies):
torchx.spec
torchx.component
torchx.apps
torchx.runtime
torchx.cli
torchx.schedulers
torchx.pipelines
torchx.aws
torchx.examples
other
To Reproduce
Steps to reproduce the behavior:
Expected behavior
It runs
Environment
Additional context
https://stackoverflow.com/questions/46085748/define-size-for-dev-shm-on-container-engine/46434614#46434614
The text was updated successfully, but these errors were encountered: