Skip to content

Commit ef02104

Browse files
fco-dvsdesrozisvfdev-5
authored
Docker for users with Horovod (#1248)
* [WIP] Docker for users with Horovod - base / vision / nlp - with apex build * [WIP] Docker for users with Horovod - install horovod with .whl , add nccl in runtime image * Docker for users with Horovod - update Readmes for horovod images and configuration * Docker for users with Horovod - hvd tags/v0.20.0 - ignite examples with git sparse checkout * Docker for users with Horovod - update docs Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]>
1 parent 28fbd2a commit ef02104

File tree

9 files changed

+223
-12
lines changed

9 files changed

+223
-12
lines changed

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -308,12 +308,12 @@ docker run --gpus all -it -v $PWD:/workspace/project --network=host --shm-size 1
308308

309309
Available pre-built images are :
310310

311-
- `pytorchignite/base:latest`
312-
- `pytorchignite/apex:latest`
313-
- `pytorchignite/vision:latest`
314-
- `pytorchignite/apex-vision:latest`
315-
- `pytorchignite/nlp:latest`
316-
- `pytorchignite/apex-nlp:latest`
311+
- `pytorchignite/base:latest | pytorchignite/hvd-base:latest`
312+
- `pytorchignite/apex:latest | pytorchignite/hvd-apex:latest`
313+
- `pytorchignite/vision:latest | pytorchignite/hvd-vision:latest`
314+
- `pytorchignite/apex-vision:latest | pytorchignite/hvd-apex-vision:latest`
315+
- `pytorchignite/nlp:latest | pytorchignite/hvd-nlp:latest`
316+
- `pytorchignite/apex-nlp:latest | pytorchignite/hvd-apex-nlp:latest`
317317

318318
For more details, see [here](docker).
319319

docker/README.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,28 @@
22

33
We provide Dockerfiles in order to build containerized execution environment that ease the use of Ignite for computer vision and NLP tasks.
44

5+
These images are also provided with the following Horovod configuration:
6+
7+
```bash
8+
Horovod v0.20.0:
9+
10+
Available Frameworks:
11+
[ ] TensorFlow
12+
[X] PyTorch
13+
[ ] MXNet
14+
15+
Available Controllers:
16+
[ ] MPI
17+
[X] Gloo
18+
19+
Available Tensor Operations:
20+
[X] NCCL
21+
[ ] DDL
22+
[ ] CCL
23+
[ ] MPI
24+
[X] Gloo
25+
```
26+
527
## Installation
628

729
- [main/Dockerfile.base](main/Dockerfile.base): latest stable PyTorch, Ignite with minimal dependencies
@@ -16,6 +38,18 @@ We provide Dockerfiles in order to build containerized execution environment tha
1638
* `docker pull pytorchignite/apex-vision:latest`
1739
- [main/Dockerfile.apex-nlp](main/Dockerfile.nlp): base apex with useful NLP libraries
1840
* `docker pull pytorchignite/apex-nlp:latest`
41+
- [hvd/Dockerfile.hvd-base](hvd/Dockerfile.hvd-base): multi-stage Horovod build with latest stable PyTorch, Ignite with minimal dependencies
42+
* `docker pull pytorchignite/hvd-base:latest`
43+
- [hvd/Dockerfile.hvd-vision](hvd/Dockerfile.hvd-vision): base Horovod image with useful computer vision libraries
44+
* `docker pull pytorchignite/hvd-vision:latest`
45+
- [hvd/Dockerfile.hvd-nlp](hvd/Dockerfile.hvd-nlp): base Horovod image with useful NLP libraries
46+
* `docker pull pytorchignite/hvd-nlp:latest`
47+
- [hvd/Dockerfile.hvd-apex](hvd/Dockerfile.hvd-apex): multi-stage NVIDIA/apex and Horovod build with latest Pytorch, Ignite image with minimal dependencies
48+
* `docker pull pytorchignite/hvd-apex:latest`
49+
- [hvd/Dockerfile.hvd-apex-vision](hvd/Dockerfile.hvd-apex-vision): base Horovod apex with useful computer vision libraries
50+
* `docker pull pytorchignite/hvd-apex-vision:latest`
51+
- [hvd/Dockerfile.hvd-apex-nlp](hvd/Dockerfile.hvd-apex-nlp): base Horovod apex with useful NLP libraries
52+
* `docker pull pytorchignite/hvd-apex-nlp:latest`
1953

2054
## How to use
2155

docker/hvd/Dockerfile.hvd-apex

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Multi-stage build
2+
# 1/Building apex and hvd with pytorch:1.6.0-cuda10.1-cudnn7-devel
3+
FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel AS apex-hvd-builder
4+
5+
ARG ARG_TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.5"
6+
ENV TORCH_CUDA_ARCH_LIST=$ARG_TORCH_CUDA_ARCH_LIST
7+
8+
# Install git
9+
RUN apt-get update && apt-get install -y --no-install-recommends git
10+
11+
# Build apex
12+
RUN echo "Setup NVIDIA Apex" && \
13+
tmp_apex_path="/tmp/apex" && \
14+
rm -rf $tmp_apex_path && \
15+
git clone https://github.com/NVIDIA/apex $tmp_apex_path && \
16+
cd $tmp_apex_path && \
17+
pip wheel --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
18+
19+
# Build Horovod
20+
RUN apt-get update && apt-get install -y git && \
21+
git clone --recursive --depth 1 --branch v0.20.0 https://github.com/horovod/horovod.git /horovod && \
22+
conda install -y cmake=3.16 nccl=2.7 -c conda-forge && \
23+
cd /horovod && \
24+
HOROVOD_GPU_OPERATIONS=NCCL HOROVOD_NCCL_LINK=SHARED HOROVOD_WITHOUT_MPI=1 HOROVOD_WITH_PYTORCH=1 pip wheel --no-cache-dir . && \
25+
rm -rf /var/lib/apt/lists/*
26+
27+
# 2/ Build the runtime image
28+
FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-runtime
29+
30+
# Apex
31+
COPY --from=apex-hvd-builder /tmp/apex/apex-0.1-cp37-cp37m-linux_x86_64.whl apex-0.1-cp37-cp37m-linux_x86_64.whl
32+
RUN pip install --no-cache-dir apex-0.1-cp37-cp37m-linux_x86_64.whl && \
33+
rm apex-0.1-cp37-cp37m-linux_x86_64.whl
34+
35+
# Install tzdata / git
36+
RUN apt-get update && \
37+
ln -fs /usr/share/zoneinfo/America/New_York /etc/localtime && \
38+
apt-get install -y tzdata && \
39+
dpkg-reconfigure --frontend noninteractive tzdata && \
40+
apt-get -y install --no-install-recommends git && \
41+
rm -rf /var/lib/apt/lists/*
42+
43+
# Ignite main dependencies
44+
RUN pip install tensorboardX tensorboard trains tqdm && \
45+
# Horovod support is available in >0.4.1 and in nightly releases:
46+
pip install --pre pytorch-ignite
47+
48+
# Checkout Ignite examples only
49+
RUN mkdir -p pytorch-ignite-examples && \
50+
cd pytorch-ignite-examples && \
51+
git init && \
52+
git config core.sparsecheckout true && \
53+
echo examples >> .git/info/sparse-checkout && \
54+
git remote add -f origin https://github.com/pytorch/ignite.git && \
55+
git pull origin master
56+
57+
# Horovod
58+
RUN conda install -y nccl=2.7 -c conda-forge
59+
ENV LD_LIBRARY_PATH=/opt/conda/lib:$LD_LIBRARY_PATH
60+
COPY --from=apex-hvd-builder /horovod/horovod-*.whl /horovod/
61+
RUN cd /horovod && \
62+
pip install --no-cache-dir horovod-*.whl && \
63+
rm -fr /horovod
64+
65+
ENTRYPOINT ["/bin/bash"]

docker/hvd/Dockerfile.hvd-apex-nlp

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Dockerfile.hvd-apex-nlp
2+
FROM pytorchignite/hvd-apex:latest
3+
4+
# Ignite NLP dependencies
5+
RUN pip install --upgrade --no-cache-dir torchtext \
6+
transformers \
7+
spacy \
8+
nltk
9+
10+
ENTRYPOINT ["/bin/bash"]

docker/hvd/Dockerfile.hvd-apex-vision

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Dockerfile.hvd-apex-vision
2+
FROM pytorchignite/hvd-apex:latest
3+
4+
# Install opencv dependencies
5+
RUN apt-get update && \
6+
apt-get -y install --no-install-recommends libglib2.0 \
7+
libsm6 \
8+
libxext6 \
9+
libxrender-dev && \
10+
rm -rf /var/lib/apt/lists/*
11+
12+
# Ignite vision dependencies
13+
RUN pip install --upgrade --no-cache-dir albumentations \
14+
image-dataset-viz \
15+
numpy \
16+
opencv-python \
17+
py_config_runner \
18+
pillow \
19+
"trains>=0.15.0"
20+
21+
ENTRYPOINT ["/bin/bash"]

docker/hvd/Dockerfile.hvd-base

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Multi-stage build
2+
# Dockerfile.hvd-base
3+
FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel as builder
4+
5+
LABEL description="Latest PyTorch, Ignite and Horovod"
6+
7+
# Build Horovod
8+
RUN apt-get update && apt-get install -y git && \
9+
git clone --recursive --depth 1 --branch v0.20.0 https://github.com/horovod/horovod.git /horovod && \
10+
conda install -y cmake=3.16 nccl=2.7 -c conda-forge && \
11+
cd /horovod && \
12+
HOROVOD_GPU_OPERATIONS=NCCL HOROVOD_NCCL_LINK=SHARED HOROVOD_WITHOUT_MPI=1 HOROVOD_WITH_PYTORCH=1 pip wheel --no-cache-dir . && \
13+
rm -rf /var/lib/apt/lists/*
14+
15+
# Build runtime image
16+
FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-runtime
17+
18+
# Install tzdata / git
19+
RUN apt-get update && \
20+
ln -fs /usr/share/zoneinfo/America/New_York /etc/localtime && \
21+
apt-get install -y tzdata && \
22+
dpkg-reconfigure --frontend noninteractive tzdata && \
23+
apt-get -y install --no-install-recommends git && \
24+
rm -rf /var/lib/apt/lists/*
25+
26+
# Ignite main dependencies
27+
RUN pip install tensorboardX tensorboard trains tqdm && \
28+
# Horovod support is available in >0.4.1 and in nightly releases:
29+
pip install --pre pytorch-ignite
30+
31+
# Checkout Ignite examples only
32+
RUN mkdir -p pytorch-ignite-examples && \
33+
cd pytorch-ignite-examples && \
34+
git init && \
35+
git config core.sparsecheckout true && \
36+
echo examples >> .git/info/sparse-checkout && \
37+
git remote add -f origin https://github.com/pytorch/ignite.git && \
38+
git pull origin master
39+
40+
# Horovod
41+
RUN conda install -y nccl=2.7 -c conda-forge
42+
ENV LD_LIBRARY_PATH=/opt/conda/lib:$LD_LIBRARY_PATH
43+
COPY --from=builder /horovod/horovod-*.whl /horovod/
44+
RUN cd /horovod && \
45+
pip install --no-cache-dir horovod-*.whl && \
46+
rm -fr /horovod
47+
48+
WORKDIR /workspace
49+
50+
ENTRYPOINT ["/bin/bash"]

docker/hvd/Dockerfile.hvd-nlp

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Dockerfile.hvd-nlp
2+
FROM pytorchignite/hvd-base:latest
3+
4+
# Ignite NLP dependencies
5+
RUN pip install --upgrade --no-cache-dir torchtext \
6+
transformers \
7+
spacy \
8+
nltk
9+
10+
ENTRYPOINT ["/bin/bash"]

docker/hvd/Dockerfile.hvd-vision

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Dockerfile.hvd-vision
2+
FROM pytorchignite/hvd-base:latest
3+
4+
# Install opencv dependencies
5+
RUN apt-get update && \
6+
apt-get -y install --no-install-recommends libglib2.0 \
7+
libsm6 \
8+
libxext6 \
9+
libxrender-dev && \
10+
rm -rf /var/lib/apt/lists/*
11+
12+
# Ignite vision dependencies
13+
RUN pip install --upgrade --no-cache-dir albumentations \
14+
image-dataset-viz \
15+
numpy \
16+
opencv-python \
17+
py_config_runner \
18+
pillow \
19+
"trains>=0.15.0"
20+
21+
ENTRYPOINT ["/bin/bash"]

docs/source/index.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -78,12 +78,12 @@ Pull a pre-built docker image from `our Docker Hub <https://hub.docker.com/u/pyt
7878
7979
Available pre-built images are :
8080

81-
- ``pytorchignite/base:latest``
82-
- ``pytorchignite/apex:latest``
83-
- ``pytorchignite/vision:latest``
84-
- ``pytorchignite/apex-vision:latest``
85-
- ``pytorchignite/nlp:latest``
86-
- ``pytorchignite/apex-nlp:latest``
81+
- ``pytorchignite/base:latest | pytorchignite/hvd-base:latest``
82+
- ``pytorchignite/apex:latest | pytorchignite/hvd-apex:latest``
83+
- ``pytorchignite/vision:latest | pytorchignite/hvd-vision:latest``
84+
- ``pytorchignite/apex-vision:latest | pytorchignite/hvd-apex-vision:latest``
85+
- ``pytorchignite/nlp:latest | pytorchignite/hvd-nlp:latest``
86+
- ``pytorchignite/apex-nlp:latest | pytorchignite/hvd-apex-nlp:latest``
8787

8888
For more details, see `here <https://github.com/pytorch/ignite/tree/master/docker>`_.
8989

0 commit comments

Comments
 (0)