How to port TensorFlow model weights to PyTorch? #144

jp7c5 · 2020-12-10T07:58:02Z

jp7c5
Dec 10, 2020

Firstly, many thanks to sharing this nice work!
I see that you ported weights from EfficientDet of TensorFlow models to PyTorch, and provided the ported PyTorch checkpoints here.
If possible, could you share the code or the way of how to make ported PyTorch checkpoint?
It would be much better if the opposite is also possible.

The motivation for doing this is as follows. From my experiments,
training time : TF >> PyTorch (I'm not 100% sure about this. I might have done something wrong)
inference time : TF << PyTorch (maybe due to slow implementation of PyTorch's depthwise separable conv).
Therefore, I guess that to finetune a model on custom data, by training the model based on this code and porting the weights to TensorFlow checkpoint, I can train a custom model enjoying shorter training time (that of PyTorch) with faster inference time (that of TF).
Is this a valid idea?

Answered by rwightman

Dec 12, 2020

My code for porting the weights is hideous, brittle, and liability for me to share in the sense that it is not self explanatory and not something I want to maintain or explain. It would require quite a bit of working to use to go go the other way.

I did find my PyTorch impl to be faster than the TF when training on GPU/multi-GPU... with mixed precision AMP & torchscript especially. I think theirs trains faster on TPU. Eager mode PyTorch in general is pretty slow for inference in the small batch regime.. I think if you get the batch sizes up it gets close, but at batch size 1 for real-time PyTorch is awful. Depthwise separable convs impact train and inference alike on GPU. The path to impr…

View full answer

rwightman · 2020-12-12T19:32:49Z

rwightman
Dec 12, 2020
Maintainer

My code for porting the weights is hideous, brittle, and liability for me to share in the sense that it is not self explanatory and not something I want to maintain or explain. It would require quite a bit of working to use to go go the other way.

I did find my PyTorch impl to be faster than the TF when training on GPU/multi-GPU... with mixed precision AMP & torchscript especially. I think theirs trains faster on TPU. Eager mode PyTorch in general is pretty slow for inference in the small batch regime.. I think if you get the batch sizes up it gets close, but at batch size 1 for real-time PyTorch is awful. Depthwise separable convs impact train and inference alike on GPU. The path to improve that would be ONNX export and running on an optimized ONNX runtime, or to TensorRT for GPU. That could be some work with some of the bits and pieces here working for that path though.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to port TensorFlow model weights to PyTorch? #144

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to port TensorFlow model weights to PyTorch? #144

Uh oh!

jp7c5 Dec 10, 2020

Replies: 1 comment

Uh oh!

rwightman Dec 12, 2020 Maintainer

jp7c5
Dec 10, 2020

rwightman
Dec 12, 2020
Maintainer