Inside Transformers: An In-depth Look at the Game-Changing Machine Learning Architecture — Part 5

Isaac Kargar
4 min readFeb 14, 2024

Note: AI tools are used as an assistant in this post!

Let’s continue with the components.

Add & Norm

After the multi-head attention block in the encoder and also in some other parts of the transformer architecture, we have a block called Add&Norm. The Add part of this block is basically a residual block, similar to ResNet, which is basically adding the input of a block to the output of that block: x+layer(x) .

--

--

Isaac Kargar

Co-Founder and CIO @ Resoniks | Ph.D. candidate at the Intelligent Robotics Group at Aalto University | https://kargarisaac.github.io/