The next step in Generative AI: Retrieval Augmented Generation using Graph Database

David Goad
Dec 2, 2024
8 min read

Since its inception, Generative AI has intrigued its users with its ability to respond fluently and human-like to complex questions on various topics. People now ask if the Turing Test for machine intelligence ( see here ) has been passed or if it will soon be passed?

The challenge, however, is that fluency often does not equate to accuracy. These days, one of the easier ways to tell whether you are talking to a computer (GenAI or not) vs a human being has been to check the accuracy of the answers provided. Is your question actually being answered, and is the answer based on current, correct facts and information? Or, are you being provided with an answer that "sounds good" but where the facts provided are totally fabricated, and the answer is entirely wrong?

Early versions of GenAI agents were often inaccurate or completely wrong, even though their responses were fluent, confident-sounding, and human-like. As a result, inexperienced GenAI users were beguiled into using GenAI for business-critical activities, assuming what the bot said was always true when, much of the time, it wasn't (see here).

Undeterred, technology companies, researchers, and everyday businesses have been experimenting with various ways to improve the accuracy of Generative AI. Or, in lieu of accuracy, adapt their business processes so that they can still use GenAI to generate benefits but allow for the bot to not always be correct. This usually involves having a human supervise the bot and always be in the loop. There is a reason Microsoft refers to it's GenAI product line as CoPilot.

Retrieval Augmented Generation (RAG) is one method used to improve the accuracy of GenAI responses and helps avoid business process modification. Its use has grown considerably over the last year based on a large stream of ongoing research and experimentation, and many/ most commercial deployments of GenAI today involve some form of RAG.

One of the newer forms of RAG is something called Graph RAG. Graph RAG recognises that answering a question correctly often requires connecting multiple complex concepts together to form a single answer. Until recently, this was an ability reserved for humans.

The following article attempts to explain, in a non-technical way, the concept of Graph RAG and demonstrate why businesses should consider using both RAG and Graph RAG to improve the benefits realisation of their GenAI programs.

The Challenges with Large Language Models and GenAI in General:

Generative AI and the LLMs that power it face many challenges. In no particular order, some of the challenges are...

Hallucination - When the model generates information that wasn’t present in the input or its training data, leading to potentially inaccurate or misleading outputs.
Lack of Explainability - Refers to the difficulty in understanding why a model made a particular prediction or decision, making it challenging to diagnose errors, understand biases, or predict future behaviour.
Language models are trained on diverse data—web Content, Social Media, Books, Open-source datasets, and scientific literature that has often not been fact-checked. They only have general knowledge that might generate outputs reflecting the biases in their training data. These days, much useful and relevant data sits behind the corporate firewall and, therefore, hasn't been made available to train the more popular LLMs.
Language models have a data cut-off period - e.g. GPT 4o – Nov 2023. Any events or information that have come to light after the cut-off date will not be known to the model
Language models do not know your data (lack of domain knowledge) - Specific use cases require specific datasets (corporate/private data). Also, LLMs cannot provide responses based on data from real-time sources.

All these critical limitations of GenAI lead it to provide answers that humans would often consider incorrect or inaccurate. This lack of accuracy has slowed GenAI's adoption and limited the realisation of business benefits from this new technology until recently.

There are now several ways that organisations can improve the accuracy of GenAI responses. One is fine-tuning, which is the topic of other blog articles. Another is Retrieval Augmented Generation which I'm discussing here.

Retrieval Augmented Generation (RAG) as a Solution to the Challenges with GenAI:

Retrieval-augmented generation (RAG) is an architecture that augments the capabilities of a large language model (LLM) like ChatGPT by adding to the LLM an information retrieval system that provides data to "ground" the LLM's responses in factual information.

In essence, when we ask the LLM a question, we simultaneously give it the information (context) it needs to answer the question correctly. This process is called "grounding" and helps ensure that the LLM's answers are correct, assuming, of course, that the information provided to the LLM has been vetted for accuracy first.

Adding an information retrieval system allows you to control the grounding data used by an LLM when it formulates a response. For an enterprise solution, RAG architecture means that you can constrain generative AI to your internal enterprise content sourced from curated documents and images, thereby improving the accuracy of the LLM's responses and controlling how it responds to questions.

Retrieval Augmented Generation as a way of improving LLM accuracy

However, the use of Retrieval Augmented Generation itself has its Challenges:

For a RAG to work, not only does it need to provide the LLM with information to answer the question, but it also has to provide the correct information, often from a potentially large pool of information. Traditional (Vector) RAG systems assume a "best answer first" mentality by breaking all the available data into chunks and then searching for and picking a selection of chunks of information that semantically best match the question being asked to provide the LLM. This process attempts to choose the "best" information to answer a question.

However, these traditional RAG systems need help connecting the dots. When answering a question effectively requires traversing disparate pieces of information through their shared attributes to provide new synthesized insights, traditional (vector) RAG architecture tends to offer a substandard result. Vector RAG also performs poorly when asked to holistically understand summarized semantic concepts over extensive data collections or even singular large documents.

The solution is to use a Graph RAG. Graph RAG uses LLM-generated knowledge graphs to substantially improve question-and-answer performance when conducting document analysis of complex information. This is often called a "breadth-first" approach.

Graph Database in Retrieval Augmented Generation:

Unstructured data can vary significantly in format, content, and quality, posing challenges for consistent understanding and extraction by an LLM. By definition, unstructured data, such as free-form text, also lacks a predefined structure, making it difficult for LLMs to process and establish relationships amongst the data. The Graph RAG process uses an LLM to convert inconsistent unstructured data into a structured, consistent format called a Knowledge Graph that can be used to feed relevant information into the LLM.

Conceptually, a graph data structure consists of nodes (discrete objects) that can be connected by relationships. Nodes describe entities (discrete objects) of a domain. Nodes can have zero or more labels to define (classify) what kind of nodes they are. Relationships describe a connection between a source node and a target node. Relationships always have a direction (one direction). Relationships must have a type (one type) to define (classify) the type of relationship Nodes and relationships can have properties (key-value pairs), which further describe them.

The following is a Graph representation of an unstructured piece of text. In this example, John is identified as an entity or node. He has the label of "a person." He has an employee relationship with IBM. IBM is another node with the label of "a company". Ray is a further node that also has the label of "a person" and has a relationship with John as a friend.

Converting Unstructured Data to Structured Data

The advantage of representing information in the form of a Graph is that the structure is very flexible. It can also represent very complex data sets where thousands, if not hundreds of thousands, of nodes are present with hundreds of thousands of relationship types between the nodes.

Some Technical Bits: Graph Database Options and Querying the Graph for the LLM

To convert data into a Graph format that can be used in a Graph RAG, a specialised graph database is often preferable to organise and store the information. Neo4J is probably the most popular Graph Database option. There are many others, including but not limited to Arango, Amazon Neptune, OrientDB, GraphDB, and TigerGraph. However, you don't necessarily need a specialised graph database, and it is not unusual for structured graph data to be stored in simple Parquet files.

How you query the Graph RAG database to find the appropriate information to feed the LLM when attempting to answer a question can also be important. In traditional vector RAG it has been shown that the effective use of search is critically important in producing an usable result from the RAG. The same is the case for Graph RAG. Several new search methods specifically for GraphRAG have recently been released that one should consider as part of our testing of a new Graph Rag. These include Dynamic Global Search and Drift Search.

Vector Database vs. Graph DB - Best Answer vs. Breath Answer

So, to recap, I've talked about the need to address the issues of accuracy and hallucination with Generative AI and the large language models used in Generative AI. One way of doing this is by grounding the LLMs with information that we know is accurate and relevant to the question or task the LLM is being asked to address. This improves the accuracy of the response and is known as Retrieval Augmented Generation (RAG).

I have highlighted that traditional (vector) RAG breaks the data into chunks. Those chunks are embedded to identify their semantic meaning. The question that LLM is asked is also embedded to infer a semantic meaning. The vector database is then searched to find the chunks of information that have the closest semantic meaning to the question being asked, and then that information is passed to the LLM along with the question for the LLM to base its answer on. This is often referred to as a "best first" approach in terms of RAG.

I've further described the concept of Graph RAG. In this case, an LLM is used to convert the unstructured information into structured information in a graph format. Like traditional (vector) RAG, the Graph RAG is then searched for information relevant to answering the question posed to the LLM. But, because the information has been stored in a Graph structure, the relationships within the data are exposed and information more relevant to the question being asked is retrieved. Consequently, the accuracy and quality of the responses of a Graph RAG may be better depending on the complexity of the data. This is often referred to as a "breadth-first" approach.

Both approaches are illustrated in the diagram below....

Vector RAF vs. Graph RAG

The Hybrid Solution: Vector Database plus Graph Database:

Now, the use of Vector RAG and Graph RAG are not mutually exclusive. One can combine the two techniques together storing the same data in both a vector database and a graph database. When a question is asked of an LLM, both databases are queried, and both sets of information are combined and fed to the LLM for it to base its answer on. You can think of this as a "best + breadth" approach and is commonly referred to as Hybrid RAG. An example of this approach is illustrated below...

The selection of the RAG approach, whether vector-based RAG, Graph RAG, or Hybrid, will depend on many factors and may require some experimentation to find the right solution.

Microsoft Azure Graph RAG Accelerator:

Given the growing importance of RAG techniques and Graph RAG in particular, in March 2024, Neo4J announced a collaboration with Microsoft Open AI. Microsoft developed an open-source project called the GraphRag Accelerator to facilitate working with Neo4J on Azure. This accelerator builds on top of the Graph Rag python package. It exposes API endpoints hosted on Azure, which can be used to trigger indexing pipelines and enable querying of the GraphRag knowledge graph. This repository presents a methodology for running a hosted service using knowledge graph memory structures to enhance LLM outputs. The provided code serves as a demonstration and is not an officially supported Microsoft offering. You can find more information here.

Conclusions:

So, it is clear that the use of RAG and Graph RAG is of growing importance to the successful use of GenAI. Without these techniques, GenAI has challenges with hallucination and inaccuracy, and the benefits realisation achieved through GenAI will be limited.

It's important for organisations serious about their GenAI investments to understand these GenAI techniques and to start learning and experimenting now with RAG and Graph RAG as they begin to develop their GenAI programs of work.

If you have questions about this topic or would like further information or assistance, please contact me at david@ai-savvy.com.au . Y