Inside Transformers: An In-depth Look at the Game-Changing Machine Learning Architecture — Part 1

Isaac Kargar
7 min readMay 29, 2023

Note: AI tools are used as an assistant in this post!

Generated by Microsoft Bing Image Creator

As the field of artificial intelligence (AI) continues to change at a rapid pace, some designs have stood out for how much they have changed the field. The Transformer model has become a game-changer among these. It has changed not only natural language processing, but also many other parts of machine learning.

In their seminal work “Attention is All You Need” from 2017, Vaswani et al. introduced the Transformer, which changed the way we understand and process sequences. The attention system, which was the key innovation of the Transformer model, changed the way machine learning works by making sequence-to-sequence tasks easier to do and making it easier to deal with long-range dependencies in data.

But what is it that makes Transformers so strong? How does it use attention mechanisms to successfully store information about where things are and how they depend on other things? And why has it become the go-to architecture for many modern machine learning jobs, even outside of natural language processing?

In this blog post, we’ll take a close look at how the Transformer architecture works on the inside. We’ll look at its main parts, from inputs and…

--

--

Isaac Kargar

Co-Founder and CIO @ Resoniks | Ph.D. candidate at the Intelligent Robotics Group at Aalto University | https://kargarisaac.github.io/