-
Firstly, many thanks to sharing this nice work! The motivation for doing this is as follows. From my experiments, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
My code for porting the weights is hideous, brittle, and liability for me to share in the sense that it is not self explanatory and not something I want to maintain or explain. It would require quite a bit of working to use to go go the other way. I did find my PyTorch impl to be faster than the TF when training on GPU/multi-GPU... with mixed precision AMP & torchscript especially. I think theirs trains faster on TPU. Eager mode PyTorch in general is pretty slow for inference in the small batch regime.. I think if you get the batch sizes up it gets close, but at batch size 1 for real-time PyTorch is awful. Depthwise separable convs impact train and inference alike on GPU. The path to improve that would be ONNX export and running on an optimized ONNX runtime, or to TensorRT for GPU. That could be some work with some of the bits and pieces here working for that path though. |
Beta Was this translation helpful? Give feedback.
My code for porting the weights is hideous, brittle, and liability for me to share in the sense that it is not self explanatory and not something I want to maintain or explain. It would require quite a bit of working to use to go go the other way.
I did find my PyTorch impl to be faster than the TF when training on GPU/multi-GPU... with mixed precision AMP & torchscript especially. I think theirs trains faster on TPU. Eager mode PyTorch in general is pretty slow for inference in the small batch regime.. I think if you get the batch sizes up it gets close, but at batch size 1 for real-time PyTorch is awful. Depthwise separable convs impact train and inference alike on GPU. The path to impr…