
Inspired by A ConvNet for the 2020s by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie from Facebook AI Research (FAIR) and UC Berkeley.
Convolution and transformer are two approaches to design a deep neural network. Recently, the transformer seems to be becoming dominant in developing AI techniques. I have been asking myself: is that true? What makes the transformer special? I do not know the answer and hope to figure it out sooner or later. What I like to deliver in this blog is all about my experience with developing deep neural networks during my Ph.D. study.
What’s the secret to designing a state-of-the-art artificial deep neural network?
- Learning Structure – You need to tell the network how to extract features from the input layer by layer. For example,
Hierarchical representation by starting from small-sized patches and gradually increasing the size through merging to achieve scale-invariance
By Sieun Park
- Block Design – Play with the internal representation. For example,
Achieves efficient, linear computational complexity by computing self-attention locally. (shifted window approach)
By Sieun Park
- Size of Convolution Kernel. For example,
The researchers observed that the benefit of larger convolution kernels and the saturation point is reached at 7 × 7
from A ConvNet for the 2020s
- Pay Attention to Temporal Learning – Play with the representation embedding timely. For example,
The position of the spatial depth-wise Conv layer is moved up.
A ConvNet for the 2020s
- Training with Fine Hyper-parameters: Optimizer, Learning Rate, Batch-size, Activation functions, and so on. No secrets, but try.
Final thoughts
Thinking about the followings:
- Pay attention to each of them: Input->Representation->Output
- Spatial and Temporal Learning
Keep them in mind whenever designing a deep neural network for any task.