Dummies Guide to RAG: Minimum technical jargon
If you are like me any new concept or any new research paper terrifies you then this guide is for you on RAG i.e., Retrieval Augmented Generation. The technical jargon is too complicated so with this blog we will build the intuition behind RAG and use minimum technical jargon.
Due to the popularity of OpenAI’s ChaptGPT, everyone knows a little bit about LLM (Large Language Model). We also know how expensive and time-consuming training an LLM is and the model only works on the training data if new data comes in, on which the model was not trained it doesn't perform well and hallucinates which means we get unrelated output to what we queried.
To solve these issues RAG was introduced, by utilizing it we can augment the capabilities of LLM with our own data.
Let’s understand it with an analogy, You want to make an Italian dish for dinner but there are piles of cookbooks and recipes, to skim through all those books to get the recipe would take a lot of time. Now you are left with 1000s of recipes but no dinner leaving you hungry and overwhelmed but don't worry we have Chef RAG!!
The first thing RAG does is create “chunks” which are the individual recipes present in all books(corpus) then it doesn't just pick any chunk for the recipe but looks at chunks that have keywords related to Italian cuisine such as “pasta”, “Pizza”, “Parmesan” etc and assigns high similarity score to chunks that have these keywords.
These chunks are then fed to LLMs there are multiple ways the LLM can generate its response, to help tailor the response a concept called prompt engineering is used. Prompt engineering allows us to add context to our query which in this case would be “You are an Italian chef who only prepares vegetarian dishes”. The LLM then would make a concise and coherent response and thus we can have our delicious dinner.
Conclusion
- Retrieval: RAG first retrieves relevant passages or documents from a large corpus using a retrieval model (like a Dense Passage Retriever or other embedding-based methods). The retrieval model finds the most similar chunks of text to the query.
- Augmentation: The retrieved information is then passed to a generative model. This model uses the retrieved chunks as context to generate a response. The augmentation helps ensure that the generated content is grounded in actual data, making the output more accurate and contextually relevant.
- Generation: The generative model (like a GPT-based model) creates the final response using the retrieved data as context, ensuring that the output is both coherent and relevant to the user’s query.
To learn more in-depth concepts and math behind RAG click here.