OpenAI Announces WebGPT Q&A AI


OpenAI has developed WebGPT, an AI model for long question-answering based on GPT-3. WebGPT can use web search queries to collect references in support of its answer, and on Reddit questions its answers were preferred by human judges to the most voted answer 69% of the time.

The announcement was made on the OpenAI blog. WebGPT is a version of OpenAI’s pre-trained GPT-3 natural language processing (NLP) model that has been refined to use a web browser to perform search engine queries, follow links, and cite sources. The model is trained on a dataset collected from the Explain Like I’m 5 (ELI5) subreddit using a combination of supervised learning and reinforcement learning (RL) incorporating human feedback, and can generate responses from paragraph length to open-ended questions on a wide range of topics. According to OpenAI:

Human feedback and tools such as web browsers offer a promising path to general-purpose, reliable AI systems. Our current system is struggling with difficult or unknown circumstances, but still represents significant progress in this direction.

Although question-answering (QA) has long been a topic of AI research, most datasets have focused on simple “trivia” type questions with short answers. In 2019, in an effort to create smarter digital assistants, a team of researchers from Facebook and Google proposed a Long Question Answering Task (LFQA), which requires AI to produce richer answers to more complex questions. and open. The team also collected a large dataset taken from the ELI5 subreddit for training and benchmarking LFQA models, consisting of questions (and associated answers) ranging from the mundane (Why do item prices always end with “.99” instead of “.00”?) to the imponderable (Why do people give Reddit Gold to admins?).

OpenAI’s GPT-3 model performed quite well when evaluated on QA benchmarks, scoring up to 71.2% on the TriviaQA benchmark without adjustment. However, like many language models, GPT-3 hallucinates; that is, it generates answers that seem reasonable but are factually incorrect. To address this problem, many researchers have augmented deep learning QA models with an information retrieval mechanism that can query a knowledge base to provide additional context to the model’s decoder mechanism that generates answers. .

OpenAI used a similar approach, but instead of including information retrieval in the model, they trained their model to interact directly with a web search engine: a task “that humans can do well and a language model can imitate”. The team first developed a web browsing environment that can be controlled via text commands produced by a pre-trained GPT-3 model. The model is then operated as an RL agent: given the environment consisting of a question and the current web browser page, the agent generates a command, such as issuing a search query, tracking a link, extracting context from a page, or generating a final result. This agent is refined using a combination of supervised learning on human-generated examples and RL using a reward model.

The team evaluated WebGPT on both the ELI5 dataset and TriviaQA. For the ELI5 evaluation, OpenAI collected the top-voted response from Reddit and also had human demonstrators generate responses using the same web browsing environment as the model. The researchers hired contractors to compare the WebGPT responses to these human-created responses, with the WebGPT responses being preferred over the Reddit response 69% of the time and the demonstrator responses 56% of the time. On the TriviaQA benchmark, WebGPT outperformed GPT-3, producing answers that were true 75% of the time and “both true and informative” 54% of the time.

InfoQ has previously covered other efforts to improve the performance of the AI ​​language model using external knowledgebases, including Baidu’s ERNIE 3.0, which is trained on a knowledge graph, and Facebook’s BlenderBot 2.0 Chatbot. , which can use internet searches for additional conversational context. More recently, DeepMind developed Retrieval Enhanced TRansfOrmers (RETRO), a method to augment a pre-trained Transformer model by incorporating information retrieval into the model’s attention mechanism.


Comments are closed.