Monday, November 13, 2023

Chatbots Might Be 'Hallucinating' More Often Than Many Realize


When San Francisco startup OpenAI debuted its online chatbot ChatGPT late last year, millions were captivated by the quasi-human way it answered questions, wrote poetry and conversed on almost any topic. But what most people took a while to realize is that this new type of chatbot often makes things up. ‌ When Google introduced a similar chatbot several weeks later, it generated meaningless data about the James Webb Space Telescope. The next day, Microsoft's new Bing chatbot offered all kinds of false information about Gap, Mexican nightlife, and singer Billie Eilish. Then, in March, ChatGPT cited a half-dozen bogus court cases while drafting a 10-page legal brief that a lawyer filed before a federal judge in Manhattan. ‌ Now , a new startup called Vectara, founded by former Google employees, is trying to find out how often chatbots deviate from the truth. The company's research estimates that even in situations designed to prevent this from happening, chatbots make up information at least 3 percent of the time and up to 27 percent. ‌ Experts call this chatbot behavior “hallucination.” 

It may not be a problem for people who play with chatbots on their personal computers, but it is a serious issue for anyone who uses this technology with court documents, medical information, or sensitive business data. ‌ Because these chatbots can respond to almost any request in an unlimited number of ways, there is no way to determine with complete certainty how often they hallucinate. “You would have to look at all the information in the world,” said Simon Hughes, the Vectara researcher who led the project. ‌ Hughes and his team asked these systems to perform a single, simple task that could be easily verified: summarize news articles. Even in these cases, chatbots persistently invented information. ‌ “ We provide the system with 10 to 20 pieces of data and ask for a summary of that data,” said Amr Awadallah, CEO of Vectara and former Google executive. “That the system can still introduce errors is a fundamental problem.” ‌ 

Researchers maintain that when these chatbots perform other tasks—beyond mere summarization—hallucination rates may be higher. ‌ Their research also showed that hallucination rates vary widely among major AI companies. OpenAI technologies had the lowest rate, around 3 percent. The systems of Meta, owner of Facebook and Instagram, were around 5 percent. The Claude 2 system offered by Anthropic, an OpenAI rival also based in San Francisco, topped 8 percent. A Google system, Palmchat, had the highest rate at 27 percent. ‌ An Anthropic spokesperson, Sally Aldous, stated: “Making our systems useful, honest and harmless, which includes preventing hallucinations, is one of our main goals as a company.” ‌ Google declined to comment, and OpenAI and Meta did not immediately respond to requests for comment. With this research, Hughes and Awadallah want to show people that they should be careful with the information that comes from chatbots and even the service that Vectara sells to companies. Many companies currently offer this type of technology for business use. ‌ 

Based in Palo Alto, California, Vectara is a 30-person startup backed by $28.5 million in seed funding. One of its founders, Amin Ahmad, a former Google artificial intelligence researcher, has been working with this type of technology since 2017, when it was incubated within Google and a handful of other companies. ‌ Just as Microsoft's Bing search chatbot can retrieve information from the open Internet, Vectara's service can retrieve information from a company's private collection of emails, documents and other files. ‌ 

The researchers also hope that their methods — which they share publicly and will continue to update — will help spur industry-wide efforts to reduce hallucinations. OpenAI, Google and others are working to minimize the problem using a variety of techniques, although it is unclear if they will be able to eliminate it. Chatbots like ChatGPT are powered by a technology called large language models (LLMs) that gain their skills by analyzing huge amounts of digital text, including books, Wikipedia articles, and online chat logs. By identifying patterns in all that data, an LLM learns to do one thing in particular: guess the next word in a sequence of words. ‌ 

Because the internet is filled with false information, these systems repeat the same falsehoods. They are also based on probabilities: what is the mathematical probability that the next word is “playwright”? Occasionally what they guess is wrong. ‌ New research from Vectara shows how this can happen. When summarizing news articles, chatbots do not repeat falsehoods from other parts of the internet. They simply get the summary wrong.

No comments:

Post a Comment