What are hallucinations?
Why are sophisticated AIs exposed to them?
What are the consequences of these hallucinations?
We tell you all about it in this article, starting with what an LLM (Large Language Model) is.
A quick reminder of the nature of LLMs
So let’s start by looking at what an LLM is.
An LLM is a (very) deep specialized neural network, trained on (very) large amounts of information to make good quality predictions about “the next word”.
- It’s a marvellous machine for producing texts of very high syntactic quality. In other words, they are very well formed and respect the grammar rules of the language in which they operate.
- On the other hand, they are not intelligent, if that notion can be clearly defined. Nor does he possess knowledge. In fact, it doesn’t “know” anything; it’s not a semantic machine.
They are mostly trained in an unsupervised way, requiring (very) high computational capacities but few manual operations. In some cases, the LLM editor uses semi-supervised learning. For example, when a mathematical quality is required.
In the current state of technology, training an LLM involves calculating between a few billion (10^9) and a few hundred billion (10^11) parameters. This requires the implementation of thousands of specialized servers over a period of weeks or even months. In short, these are heavy devices, and not at all real-time.
Once trained, the LLM is prompted in natural language (most of the time). This prompt defines the context and the mission proposed by the human or computer user. Clearly, the quality of this prompt has a decisive effect on the quality of the LLM’s response.
Pour étendre ses compétences à des champs spécifiques, le LLM peut être complété par des bases de connaissances contextuelles à l’usage visé. Par exemple des notices utilisateurs ou des documents de référence (plan comptable, code du commerce, jurisprudence, etc.).
Where does LLM “knowledge” come from?
As a joke (or a reassurance), LLMs are sometimes referred to as “stochastic parrots”.
But this is not true, as LLMs are capable of modulating their answers according to the wording of the question. What’s more, they even adapt the way they conduct dialogue with the user.
Since the LLM is a syntactic machine, how can it be that its answers are most of the time meaningful, and not just statistical (or jabberwocky).
Well, this knowledge comes from the databases used by LLMs.
For example, the web for chatGPT, Mistral, Llama and many others. As it happens, the web content (after a bit of cleaning up) makes sense.
In fact, it’s what we experience ourselves when we surf the web.
After all, text is a series of words and has no intrinsic knowledge. It’s the way it’s constructed that conveys meaning. In fact, you can read a previous blogpost about language on this topic.
The (almost won) bet of the LLMs is that it’s possible to go the other way round. In other words, it’s possible to derive meaning from text statistics. And this in a programmatic, unsupervised way.
Why do LLMs seem so magical?
Because they use natural language, our language, the one we’ve all been using since our early childhood.
No clicks, no forms to fill in, no “Press 1 to continue”…
We can question, converse and argue with an LLM. Even with grammatical and spelling mistakes: they come as close as possible to the human experience.
What’s more, they are sometimes “multi-modal”, able to generate sounds, images and computer code.
But it’s their linguistic capacity that really amazes most people: they “talk” to us.
What are hallucinations?
Hallucinations are false or misleading answers that seem plausible. And LLMs expose them as certain facts.
Some evocative examples:
- Announcing sales figures without having any data on the subject;
- Inventing fictitious quotes;
- Inventing historical facts that never happened;
- Confusing personalities and/or events;
- Creating false place names, etc.
The word “hallucination” is well chosen for what it immediately evokes, but as it refers to a human process, it contributes to the confusion.
Some prefer the terms confabulation or semantic drift.
What causes hallucinations?
According to ChatGPT :
Hallucinations often occur because language models, despite being trained on vast data sets, don't understand the world in the same way as humans and may combine information incorrectly or imaginatively.
ChatGPT
This answer is interesting for its formal quality, but also because chatGPT tells us that it can “understand the world”, even in an incorrect way. Having said that, we take the liberty of contradicting him by saying that he doesn’t understand anything, but that he calculates, which is quite different. He’ll get over it.
LLMs aren’t “responsible”: they don’t know anything about syntax, they’re not semantic machines. What’s more, they don’t know that they don’t know. All they do is calculate optimization functions (“what is the most likely sequence of words?”). Which is actually very different from “what is the most exact word sequence”.
Other causes are also involved in the production of hallucinations, including :
- insufficient, biased or even deliberately misleading training bases ;
- ambiguous, insufficiently precise prompting.
A kind of convergence around hallucinations
To date, hallucinations have been the subject of a great deal of scientific and technical research.
Unfortunately or fortunately, they all seem to converge towards the idea that they are inherent to LLM technology, and that we’re going to have to learn to live with them.
Here are just a few of the contents whose opinions are converging:
- Hallucination is Inevitable: An Innate Limitation of Large Language Models, from Ziwei Xu, Sanjay Jain and Mohan Kankanhalli
- Why Large Language Models Hallucinate from IBM Technologies
- LLM lies : hallucinations are not bugs, but features as adversarial examples , de Jia-Yu Yao, Kun-Peng Ning, Zhen-Hui Liu, Mu-Nan Ning, Yu-Yang Liu and Li Yuan.
Work-related hallucinations
Without doubt, Generative AI (i.e. LLM-based AI) is being used more and more in professional environments.
In particular, to provide simple conversational access to business knowledge bases (see RAG architectures).
But these solutions are not perfect, and mistakes can have significant consequences. Take a study in a legal context, for example.
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools from Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Manning and Daniel E. Ho (University de Stanford and Yale)
RAG juridique et hallucinations (in French), from Raphaël d’Assignies.
This study proposes a typology of errors in legal RAG systems, and a classification of errors encountered in comparative tests:
- Retrieval errors: relevant documents are not retrieved.
- Interpretation errors: the model misinterprets retrieved documents and draws erroneous conclusions.
- Synthesis errors: the model incorrectly combines information from several documents, e.g. mixes facts corresponding to independent situations.
- Contextualization errors: the model lacks the necessary legal context (legal subtlety, for example).
Potential consequences of hallucinations
Let’s take a look at some of the possible implications for the workplace:
- Dissemination of false information: propagation of incorrect information, problematic in fields where accuracy is crucial, such as journalism, scientific research or medicine.
- Legal risk and liability: erroneous data generated by an LLM could lead to legal action if the information provided causes damage or financial loss. For example, financial advice based on incorrect data could result in significant losses.
- Loss of trust: professionals and customers can lose confidence in the organization or tool using LLMs if they discover that they are frequently generating and sharing incorrect information.
- Unintentional misinformation: hallucinations can lead to misinformation, where false facts are presented as true. This can affect strategic decisions, planning, etc.
- Ethical and social impacts: the propagation of false information in sensitive contexts such as public health, politics or the social sciences can have serious ethical impacts.
To minimize these consequences, it is crucial to put in place mechanisms for checking and validating the information generated by LLMs, and to train users to identify and correct potential errors.
Hallucinations: how to get rid of them?
The truth is, hallucinations are like bedbugs.
They’re hard to get rid of, but impossible to live with!
In the current state of LLM technology, there is unfortunately no definitive solution. Depending on the context and implementation, various approaches can be used alone or in combination. Other mechanisms include:
- Adding semantic competence, for example through the notion of intentions, the aim of which is to understand the question asked and compare it with the semantic field of the application.
- Contextual consistency analysis can help detect inconsistencies in the generated text. Hallucinations are often manifested by information that does not align with the general context or the model’s prior knowledge.
- Expert mixing (MoE) divides an artificial intelligence (AI) model into distinct sub-networks (or “experts”), each specialized in a subset of the input data, to jointly perform a task.
- In a similar vein, the use of multiple LLMs enables outputs to be compared to identify discrepancies. If several models converge on the same response, but only one diverges significantly, it’s possible that this divergence is due to hallucination.
- And of course, the use of a set of human annotations to identify hallucinations in model output. These annotations can then be used to train specialized models or refine LLMs to reduce errors of this type.
What's in it for me?
In short, Large Language Models (LLMs) are revolutionizing conversational AI by generating fluid, relevant text. But they have their limits, notably “hallucinations”, or the production of incorrect information presented as factual.
As a technology based on statistics rather than understanding, it’s crucial to understand that LLMs don’t possess any real “knowledge”.
To minimize risks, it is essential to implement robust verification mechanisms and to make users aware of possible errors.
And this vigilance is particularly important in sectors where the precision and accuracy of information is paramount, such as legal, finance and healthcare.
In conclusion, we can only recommend investing 30 minutes and watching this excerpt from a conference at “Collège de France”. Xavier Leroy gives numerous examples of hallucinations and biases, and shares his somewhat irritated opinion on the subject along with an original position on programmers.
_____________
Are you a software publisher? We hope this article has given you a better understanding of what hallucinations are.
Agora Software is the publisher of conversational AI solutions dedicated to editors. We rapidly deploy conversational application interfaces that enhance the user experience of applications and platforms. By integrating Agora Software into your application, your users benefit from rich, multilingual and omnichannel interactions.
Would you like to integrate conversational AI into your applications?
Let’s talk: contact@agora.software
If you enjoyed this article, you might like to read 8 questions to address when integrating AI into your application.
Join us on our Linkedin page to follow our news!
Want to understand how our conversational AI platform
optimizes your users’ productivity and engagement by effectively complementing your business applications?