@@ -79,15 +79,15 @@ python3 examples/torchvision/image_classification.py \
79
79
-test_only
80
80
```
81
81
82
- #### L2 (CSE + L2)
82
+ #### L2 (CE + L2)
83
83
```
84
84
python3 examples/torchvision/image_classification.py \
85
- --config configs/official/ilsvrc2012/yoshitomo-matsubara/rrpr2020/cse_l2 -resnet18_from_resnet34.yaml \
85
+ --config configs/official/ilsvrc2012/yoshitomo-matsubara/rrpr2020/ce_l2 -resnet18_from_resnet34.yaml \
86
86
-test_only
87
87
```
88
88
89
89
#### PAD-L2 (2nd stage)
90
- Note that you first need to train a model with L2 (CSE + L2), and load the ckpt file designated in the following yaml file.
90
+ Note that you first need to train a model with L2 (CE + L2), and load the ckpt file designated in the following yaml file.
91
91
i.e., PAD-L2 is a two-stage training method.
92
92
93
93
```
@@ -159,19 +159,19 @@ torchrun --nproc_per_node=${NUM_GPUS} examples/torchvision/image_classification
159
159
--world_size ${NUM_GPUS}
160
160
```
161
161
162
- #### L2 (CSE + L2)
162
+ #### L2 (CE + L2)
163
163
If you use fewer or more GPUs for distributed training, you should update ` batch_size: 171 ` in ` train_data_loader ` entry
164
164
so that (batch size) * ${NUM_GPUS} = 512. (e.g., ` batch_size: 64 ` if you use 8 GPUs for distributed training.)
165
165
166
166
```
167
167
torchrun --nproc_per_node=${NUM_GPUS} examples/torchvision/image_classification.py \
168
- --config configs/official/ilsvrc2012/yoshitomo-matsubara/rrpr2020/cse_l2 -resnet18_from_resnet34.yaml \
169
- --run_log log/ilsvrc2012/cse_l2 -resnet18_from_resnet34.log \
168
+ --config configs/official/ilsvrc2012/yoshitomo-matsubara/rrpr2020/ce_l2 -resnet18_from_resnet34.yaml \
169
+ --run_log log/ilsvrc2012/ce_l2 -resnet18_from_resnet34.log \
170
170
--world_size ${NUM_GPUS}
171
171
```
172
172
173
173
#### PAD-L2 (2nd stage)
174
- Note that you first need to train a model with L2 (CSE + L2), and load the ckpt file designated in the following yaml file.
174
+ Note that you first need to train a model with L2 (CE + L2), and load the ckpt file designated in the following yaml file.
175
175
i.e., PAD-L2 is a two-stage training method.
176
176
177
177
If you use fewer or more GPUs for distributed training, you should update ` batch_size: 171 ` in ` train_data_loader ` entry
0 commit comments