Transformer²: Self-Adaptive LLMs

Isaac Kargar
2 min readJan 15, 2025

--

It’s getting exciting 👌

Sakana AI just released Transformer² (“Transformer-squared”), a framework that allows large language models (LLMs) to adapt dynamically to various tasks in real time. Unlike traditional static fine-tuning methods, Transformer² uses a two-step process to adjust its behavior based on task requirements:

1. 𝑇𝑎𝑠𝑘 𝐴𝑛𝑎𝑙𝑦𝑠𝑖𝑠: The model first identifies the type of task (e.g., math, coding, reasoning) using a dispatch system.

2. 𝐷𝑦𝑛𝑎𝑚𝑖𝑐 𝐴𝑑𝑎𝑝𝑡𝑎𝑡𝑖𝑜𝑛: It then mixes task-specific “expert” vectors (z which is a vector of weights for each expert), which are pre-trained using reinforcement learning, to adjust the model’s behavior for optimal performance.

𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐫𝐨𝐜𝐞𝐬𝐬

𝑆𝑖𝑛𝑔𝑢𝑙𝑎𝑟 𝑉𝑎𝑙𝑢𝑒 𝐹𝑖𝑛𝑒-𝑇𝑢𝑛𝑖𝑛𝑔 (𝑆𝑉𝐹):

- SVF decomposes each weight matrix in the LLM into three components (using SVD):

U: Left singular vectors.

V: Right singular vectors.

Σ: Singular values (diagonal matrix).

The training modifies only the singular values (Σ) using learnable vectors z, allowing for precise, targeted adjustments to the model’s weights.

𝑅𝑒𝑖𝑛𝑓𝑜𝑟𝑐𝑒𝑚𝑒𝑛𝑡 𝐿𝑒𝑎𝑟𝑛𝑖𝑛𝑔 (𝑅𝐿):

- The expert vectors (z) are optimized via RL using the REINFORCE algorithm.

- A reward function evaluates task-specific outputs and adjusts the vectors to maximize performance.

- Regularization is applied via a KL divergence penalty to maintain consistency with the base model and prevent overfitting.

𝐈𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 𝐏𝐫𝐨𝐜𝐞𝐬𝐬

𝑇𝑤𝑜-𝑃𝑎𝑠𝑠 𝑀𝑒𝑐ℎ𝑎𝑛𝑖𝑠𝑚:

- First Pass: The model observes the input prompt and identifies the task’s requirements. This involves either prompt engineering, a trained classifier, or a mixture-based approach to select or combine relevant expert vectors.

- Second Pass: Based on the identified task, the system dynamically adjusts the model’s weights using the selected expert vectors and generates the final response.

𝐴𝑑𝑎𝑝𝑡𝑎𝑡𝑖𝑜𝑛 𝑆𝑡𝑟𝑎𝑡𝑒𝑔𝑖𝑒𝑠:

- Prompt-Based: Constructs a specific prompt to classify tasks and select pre-trained expert vectors.

- Classifier-Based: Uses a trained task classifier to identify the most relevant expert vector.

- Mixture-Based: Combines multiple expert vectors dynamically for more complex tasks.

Read more in the following links if you are interested!

Blog:

paper:

https://arxiv.org/abs/2501.06252

code:

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Isaac Kargar
Isaac Kargar

Written by Isaac Kargar

Co-Founder and Chief AI Officer @ Resoniks | Ph.D. candidate at the Intelligent Robotics Group at Aalto University | https://kargarisaac.github.io/

No responses yet

Write a response