Skip to content

gradient clipping doesn't work with dict params #6

Open
@EdwardTyantov

Description

@EdwardTyantov

When using per-layer LR I get an exception:

  File "/home/tyantov/workspace/kaggle-planet/planet/train.py", line 375, in main
    tr.run(config)
  File "/home/tyantov/workspace/kaggle-planet/planet/train.py", line 183, in run
    train_score = boilerplate.train(train_loader, self._model, criterion, optimizer, epoch)
  File "/home/tyantov/workspace/kaggle-planet/planet/boilerplate.py", line 217, in train
    optimizer.step()
  File "/home/tyantov/workspace/kaggle-planet/planet/generic_models/yellowfin.py", line 202, in step
    torch.nn.utils.clip_grad_norm(self._var_list, self._clip_thresh)
  File "/home/tyantov/anaconda2/lib/python2.7/site-packages/torch/nn/utils/clip_grad.py", line 17, in clip_grad_norm
    parameters = list(filter(lambda p: p.grad is not None, parameters))
  File "/home/tyantov/anaconda2/lib/python2.7/site-packages/torch/nn/utils/clip_grad.py", line 17, in <lambda>
    parameters = list(filter(lambda p: p.grad is not None, parameters))
AttributeError: 'dict' object has no attribute 'grad'

Code:

     if exact_layers:
        logger.info('Learning exact layers, number=%d', len(exact_layers))
        parameters = []
        for i, layer in enumerate(exact_layers):
            if isinstance(layer, tuple) and len(layer) == 2:
                layer, multiplier = layer
                init_multiplier = 1
            elif isinstance(layer, tuple) and len(layer) == 3:
                layer, init_multiplier, multiplier = layer
            else:
                multiplier = 1
                init_multiplier = 1
            lr = config.lr * multiplier
            init_lr = config.lr * multiplier * init_multiplier
            logger.info('Layer=%d, lr=%.5f', i, init_lr)
            parameters.append({'params': layer.parameters(), 'lr': init_lr, 'after_warmup_lr': lr})
    else:
        logger.info('Optimizing all parameters, lr=%.5f', config.lr)
        parameters = model.parameters()

Exact line: parameters.append({'params': layer.parameters(), 'lr': init_lr,
standart optimizers work with dict params, YF not.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions