In this blog post, which is part of my blog post series on conversational AI and chatbots, I will review WebGPT proposed by OpenAI.
WebGPT article discusses the challenge of long-form question-answering (LFQA) in natural language processing (NLP). LFQA systems have the potential to become a primary source of learning, as I use ChatGPT a lot these days to learn about different topics, but they currently perform below human standards. The focus of this work is on information retrieval and synthesis, and the article presents a solution that outsources document retrieval to the Microsoft Bing Web Search API and fine-tunes GPT-3 through unsupervised pre-training for the synthesis part. The article also emphasizes combining these components using more faithful training objectives and optimizing answer quality through human feedback, to achieve competitive performance with humans.
The main contributions of this work are as follows:
- Creating a text-based web-browsing environment that a fine-tuned language model can interact with. This allows for improving both retrieval and synthesis parts in an end-to-end fashion using general methods such as imitation learning and reinforcement learning.
- Generating answers with references: passages extracted by the model from web pages while browsing. This is crucial for allowing labelers to judge the factual accuracy of answers, without engaging in a difficult and subjective process of independent research.
Text-Based Web-Browsing Environment
As mentioned before, they developed a text-based web-browsing environment. The language model is presented with a written summary of the current state of the environment, including the question, the text of the current page at the cursor location, and other relevant information. The model must then choose from a set of commands, such as running a Bing search, clicking a link, or scrolling around. During the browsing process, the model may quote an extract from the current page for later reference. The browsing process continues until the model decides to end browsing or reaches the maximum number of actions or reference length. Once the browsing process is complete, the model is prompted with the question and references to compose its final answer.