SELF-REFINE — A New Milestone in the AI Era?

Isaac Kargar
6 min readApr 4, 2023

--

Note: ChatGPT is used in this post as an assistant.

When I found this work, I got super excited! A bunch of questions came to my mind, and I knew I had to write a blog post on it. It might be a game-changer like the Transformer paper was. This could take AI to new levels. So, let’s jump in and see what this paper’s all about.

source

Introduction

Large language models (LLMs) can produce coherent outputs, but they often struggle with more complex tasks that involve multiple objectives or less-defined goals. Current advanced techniques for refining LLM-generated text rely on external supervision and reward models, which require significant amounts of training data or costly human annotations. This highlights the need for a more flexible and effective method that can handle a range of tasks without extensive supervision.

To address these limitations, a new method called SELF-REFINE has been proposed. It better mimics the human creative generation process without the need for an expensive human feedback loop. SELF-REFINE consists of an iterative loop between two components, FEEDBACK and REFINE, that work together to produce high-quality outputs. The process starts with an initial draft output generated by a model, which is then passed back to the same model for feedback and refinement. This iterative process continues until the model determines no further refinement is needed or a specified number of iterations have been reached. The same underlying language model performs both feedback and refinement in a few-shot setup.

SELF-REFINE has been applied to various tasks across diverse domains that require different feedback and revision strategies, such as review rewriting, acronym generation, constrained generation, story generation, code rewriting, response generation, and toxicity removal. By using a few-shot prompting approach, the model can learn from a small number of examples. SELF-REFINE is the first method to offer an iterative approach for improving generation using natural language feedback. It is hoped that this iterative framework will encourage further research in the area.

The contributions of this work can be summarized as follows:

  • SELF-REFINE is a novel approach allowing LLMs to iteratively refine outputs using their own feedback, improving performance on diverse tasks without requiring supervised training data or reinforcement learning and using a single LLM.
  • Extensive experiments on 7 varied tasks demonstrate that SELF-REFINE outperforms direct generation from strong generators like GPT-3.5 and GPT-4, achieving at least a 5% to over 40% improvement.

Self-Refine Framework

source

The SELF-REFINE process starts with an input (x) and an initial output (y0), and then iteratively refines the output through a FEEDBACK → REFINE → FEEDBACK loop. The initial output y0 is generated by a model, which could be a specialized fine-tuned model or a few-shot prompted model. For example, in a Sentiment Reversal task, if the input is “the pizza was bad” with a target sentiment of positive, the generator might produce “the pizza was good”. This output is then passed on for iterative refinement through the SELF-REFINE loop, which consists of feedback (FEEDBACK) and improvement (REFINE) stages.

FEEDBACK takes the initial output (y0) and offers suggestions to improve it based on the specific task. It usually addresses multiple aspects of the input, such as sentiment level and vividness. The feedback is actionable, guiding the model to identify and refine particular areas, like neutral sentiment or specific phrases causing neutrality.

REFINE adjusts the output (yt) using the received feedback and the previous output. In the given example, the model might boost positivity by replacing “good” with “amazing” after identifying the neutral sentiment caused by certain phrases.

The FEEDBACK → REFINE → FEEDBACK loop in the SELF-REFINE process can be applied multiple times. The stopping criterion is determined either by setting a fixed number of iterations or by evaluating the feedback. Iterations may stop when the feedback is positive or when a numerical score, like positivity, exceeds a threshold. A key aspect of SELF-REFINE is retaining a history of past experiences by continuously appending previous outputs to the prompt, enabling the system to learn from past mistakes and prevent repetition.

A crucial aspect of SELF-REFINE is its use of actionable and multi-aspect feedback. The feedback identifies reasons for the output’s success or failure in meeting requirements, addressing both problem localization and improvement instructions. Localization can be task-dependent, with specific tokens highlighted in tasks like Sentiment Reversal, while being less explicit in tasks like Acronym Generation. Feedback can emphasize phrases affecting sentiment or suggest optimizations for tasks like Code Optimization. The in-context examples within FEEDBACK prompts are tailored to each task, and carefully chosen for the experiments.

Following is an example of the SELF-REFINE process:

source

Results

They tested the SELF-REFINE technique on many tasks, and I just show some of the results here. There are many more analyses and experiments in the paper that you can read if you are interested.

The following table shows the improvement gained by using sELF-REFINE technique in some tasks:

The following table shows the effect of iterative refinement in each iteration:

The difference between generic feedback and actionable feedback, as proposed in this paper, is shown in the table below:

The comparison between the preferred outputs of SELF-REFINE and a few powerful baseline generator models is shown in the following figure (left).

Results to the right of the figure indicate that output also gets better between iterations: The majority of gains occur in the first iterations (the starting point is shown on the right for the zeroth iteration).

Conclusion

This work introduced SELF-REFINE, which allows large language models to perform iterative refinement and self-assessment for improved output quality. Operating within a single LLM, it requires neither extra training data nor reinforcement learning. SELF-REFINE’s simplicity and effectiveness across various tasks demonstrate its versatility and adaptability. I hope it can help remove biased human feedback and help AI reach new heights, but I’m concerned about how powerful and limitless it can be. Excited to see what happens!

Thank you for taking the time to read my post. If you found it helpful or enjoyable, please consider giving it a like and sharing it with your friends. Your support means the world to me and helps me to continue creating valuable content for you.

--

--

Isaac Kargar
Isaac Kargar

Written by Isaac Kargar

Co-Founder and Chief AI Officer @ Resoniks | Ph.D. candidate at the Intelligent Robotics Group at Aalto University | https://kargarisaac.github.io/

Responses (1)