The AI revolution is already impacting productivity, with 20 percent of U.S. adults reportedly using ChatGPT for work. However, ChatGPT is only as good as it is reliable. As generative artificial intelligence is integrated into everyday use the price of inaccuracies rises. Thankfully, by incorporating the technology into existing liability rules regulators can simultaneously avoid overly burdensome regulations and protect consumers.
Large Language Models (LLMs), while useful in certain cases, are not infallible. Mistakes, known as hallucinations, occur when the LLM incorrectly sees patterns in data and produces an flawed response. The possibility of hallucinations was put on display during Google’s debut of Bard where it made a false claim about the James Webb Space Telescope. This example illustrates that while LLMs have the potential to serve as time-saving tools, their model outputs must be trustworthy.
The frequency of AI hallucinations depends on the model. Vectara’s AI hallucinations leaderboard helps track these inaccuracies. For instance, hallucinations afflict ChatGPT4 Turbo just 2.5 percent of the time compared to 22.4 percent of the time for Apple’s OpenELM-3B-Instruct. Certain topics also seem to trip up LLMs more than others and lead to more inaccuracies, such as complex legal questions.
For this reason, the risks of using LLMs should always be weighed carefully alongside the technology’s advantages. A failure to proceed cautiously could, in a worst-case scenario, lead people to unknowingly break the law. This very thing recently occurred in New York City when a chatbot deployed to assist users of municipal services provided false guidance on food safety, public safety, and sexual harassment. In another disturbing example, it was recently discovered that a lawyer in British Columbia cited made-up cases produced by ChatGPT.
The reality is that using LLMs carries risk. However, some of this risk can be mitigated. There are simple solutions that make LLMs more accurate. One approach is retrieval-augmented generation or RAG. RAG tries to compensate for some of LLM’s shortcomings by fact-checking outputs with authoritative sources, thereby enhancing user trust.
However, there are still limitations to what is possible. An article in Scientific American argues that hallucinations are inevitable and inseparable from their creative applications.
Read the full article here.
Trey Price is a policy analyst with the American Consumer Institute, a nonprofit education and research organization. For more information about the Institute, visit us at www.TheAmericanConsumer.Org or follow us on X @ConsumerPal.