Description
I regret not being able to participate in the discussion laid out in #121 as it happened.
As it stands, the original author of TResnet @mrT23 was present.
Since this post is after the fact, with a TResnet(from the original author at that!) was pulled into here,
I just would like to lay my opinions on it and ask some questions that hopefully @mrT23 would be able to answer.
As summarized in the paper, I view that the fundamental contributions of TResnet boils down to the following
- Stem : SpaceToDepth
- Blocks selection
- Inplace-ABN
- Dedicated SE
- Antialiasing.
I will address these in increasing complexity.
5. Antialiasing (https://github.com/adobe/antialiased-cnns), is a well known and tested method of increasing accuracy and consistency. It was also used in assembled cnns
-
Dedicated SE : @mrT23 made great efforts to streamline and optimize Squeeze and Excite. I would like to find out how it fares against Efficient Channel Attention. In theory, ECA or (my own cECA) would be able to be optimized similarly to show better parameter and computational efficiency and accuracy.
As it is, Tresnet is not amenable to drop in replacements of attention, but it could be rendered as such easily. -
Inplace-ABN.
I wonder if such a method could be applied to EvoNorm(https://arxiv.org/pdf/2004.02967.pdf).
considering EvoNorm is itself an attempt to gather activation layers and Normalization layers and search for them in an end to end manner, its possible that it is not necessary for EvoNorm.
I havent seen or conducted for myself enough testing on EvoNorm to tell for myself, -
block selection.
The recent RegNet paper(https://arxiv.org/abs/2003.13678) showed that even for bottleneck layers a bottleneck of 1 (no bottleneck channel expansion) could be effective and uses such layers extensively to construct what is ostensibly a more simple resnet that is more effective.
However, it has only been tested (as per the original paper) in limited capacity without all the bells and whistles of modern CNNs so it remains to be seen what kind of performance it would show WITH all the bells and whistles.
Furthermore, while the RegNet paper compares the bottleneck block (1x1, 3x3 with or without expansion followed by another 1x1 with residual connections.) with the vanilla block(one 3x3 with or without residual connection), it is not a proper comparison.
The real Vanilla resnet block would have to have TWO 3x3 layers with a residual layer. Like TResnet.
It might be valuable to see what TResnets could do with all stages with basic blocks or bottlenecks without channel expansions. Such a comparison would require concomitant hyperparameter tuning to adjust channel widths and layer counts but maybe RegNet scaling might show valuable pointers.
- SpaceToDepth Stem
The Space to Depth stem is valuable tool to increase GPU throughput. The fact that it maintains or even increases accuracy is cherry on top.
My concern is that SpaceToDepth is hard to visual conceptually. I fear that this might lead to it being difficult to visualize functionally. For example, visualizing intermediate layer activations is an important tool to understand why a model functions the way they do. I'm not sure how the initial non visually intuitive stem would affect following layers from an interpretability standpoint.
In a similar vein, I'm concerned that SpaceToDepth might hinder TResnet's ability to be integrated to image segmentation or detection pipelines for the above reason.
One of the reasons that EfficientNets have took so long to be utilized in many image detection frameworks to displace ResNets was that a meaningful feature extractor was difficult to code.
There have been some attempts (like the version that exists in this very repo) but the difficulty of such endeavor, alongside difficulties in GPU throughput and fragile training, made it hard to vanquish ResNets.
I would love if @mrT23 could provide insights to these issues.
To be frank, my interests faded in (T)ResNets after seeing the RegNet paper. I hope that the eventual code and model releases will rekindle it.
In the end, TResnets are insightful, powerful, effective and efficient models in their own right. My points come more from curiosity than criticism and I hope @mrT23 understand my appreciation for their work and efforts (especially wrt incorporating their contributions to this beneficial code base).