What Causes AI Hallucinations & How Vibe Bio Delivers Results You Can Trust

What causes AI hallucinations & how Vibe Bio delivers results you can trust

Artificial Intelligence (AI) is a broad umbrella of technologies, covering a huge swathe of algorithmic applications. Since 2022, companies like OpenAI and Anthropic have brought a specific subset into the mainstream through the release of their large-language models, or LLMs. Companies have thrown all manner of questions or problems at these still-maturing technologies (for example: “write me code for an app to plan a party,” and, “how do I make my own cleaning solution?”), only to get back an answer that only looks right (code that’s semantically accurate but completely unusable), if not obviously, painfully incorrect (“1. mix bleach and ammonia”).

What happened? The LLM in question “hallucinated.”

What are AI hallucinations?

Algorithmic word generation

AI hallucinations, in the context of LLMs, refer to incorrect answers generated in response to a question or prompt. Understanding why this happens requires you to understand exactly what LLMs really are. This is a topic that could require its own post but can be boiled down to two words: algorithmic word generators. They’re trained on huge volumes of text (books, literature, blogs, newspapers), and designed to be able to algorithmically generate the “correct” response based on what they’ve been trained on. But for the LLM, “correct” isn’t determined by the question itself; it’s determined by what the LLM has been trained on and what it’s programmatically capable of generating from it. To an LLM never specifically trained on scientific literature, for instance, the answer to “what’s a good treatment for esophageal cancer” is entirely based on its non-scientific training data, at the user’s peril.

Context windows

These algorithmic responses aren’t the only way LLMs can hallucinate, however. Another problem area is “context windows” (the maximum amount of data the LLM can “see” and “process”), where the correct data is present, but can’t be seen or incorporated by the LLM—so it generates its response based solely on what is “visible.” Both of these issues are compounded by the fact that many AI companies understandably prefer to keep their algorithms and technological specifics obscured within “black boxes.” This makes it hard to see or understand what the LLM has been trained on and what the context window is. This further limits your ability to quickly trace and understand what’s going awry once you start getting bad data (more AI hallucinations).

Fast, trustworthy results

LLMs can rapidly trawl, retrieve, and summarize across large quantities of data—but that’s no good if you can’t trust you’ll get good data back out of them. So how do you deploy this technology to reliably get the information and insights you need?

At Vibe Bio, LLMs are just one AI technology in our toolkit. Using them well requires understanding the pitfalls and limitations we mentioned above, as well as where they truly excel. Here are three key strategies we use to ensure our LLM tooling is accurate, reliable, and trustworthy.

Primary data sourcing

The “black box” nature of LLMs means that we, the users, often have no idea what a given model has been trained on, and whether it’s accurate and relevant to meet our needs. To circumvent this problem, the team at Vibe Bio requires an LLM to retrieve and answer questions from a primary data source that we’ve provided to it ourselves. As we’ll get into later, having control over what exactly an LLM is generating its response from allows us to more accurately target our prompts and minimize AI hallucinations. It also makes it much simpler to triage if things go wrong and we see results that don’t make sense.

Enforced citations

For similar reasons, we require our LLM tooling to include robust, quoted citations for its answers, providing us the exact data and context it used to arrive at a given answer. This allows us to understand upfront if a given answer was on the mark, lessening the fact-checking burden down the line. If things do go wrong, citations make triage much faster: not only do we understand what the source data is, we can also target specifically what parts of it were used, and correct as needed.

Human-in-the-loop

Finally, we never let our LLM tools (or any capability) operate unmonitored. From setting up the data to extraction to generation, we have a series of touch points along the process where an appropriate expert reviews the work to ensure quality and credibility. This can include an engineer reviewing a chain of processes to verify every step has executed properly, or a drug development expert reading the output at different stages to ensure it’s accurate. If anything is amiss, our engineers and drug experts work together to discover what the failure was, correct it, and incorporate those learnings into our product development.

AI you can trust, delivering every cure

Incorporating AI into drug development has the potential to transform how we make decisions in pharma, but harnessing it effectively requires a deep understanding of both its strengths and limitations. At Vibe Bio, we recognize the abilities of LLMs to process and analyze vast amounts of data quickly and why we need a structured approach to ensure accuracy and reliability. By grounding our LLMs with primary data sourcing, enforcing strict citation requirements, and maintaining a human-in-the-loop process, we minimize the risk of misleading “AI hallucinations” and increase the trustworthiness of AI outputs. This approach allows us to provide deep insights into the drug pipeline analysis process and push the boundaries of drug development, especially for rare diseases, where every insight can make a critical difference.

Share the Post: