What is RAG?

Hashir Nouman
2 min readJun 9, 2024

--

The era of generative AI is revolutionizing almost every industry. Almost every day we see a new AI product launched or a startup getting millions of dollars in funding.

In different industries like Technical support, Customer support, Medicine etc we can see new AI-based chatbots that are not traditional chatbots or chatbots with premade responses rather they generate valuable information based on the use case they are developed for.

So what’s the technology behind these chatbots that are not generic like chatbots instead they are domain-specific and answer only those queries that are relevant. For example, you can’t get information about auto parts from a chatbot developed to answer medicine queries. It is called Retrieval Augmented generation or RAG.

Suppose there is a company that has different catalogues and guidelines based on which they train their employees and use the information from these guides daily. Their data is private and can’t be searched on the internet. Now when they need some guidance, they have to go through the documents, guides and catalogues and it takes a lot of time. They require an intelligent chatbot that can understand their private data and answer user queries. For this case, we will implement a RAG. A RAG is an intelligent generative-based chatbot that can only answer based on the data provided. Whether the data is public or private. This application of generative AI is becoming popular.

How you can develop your own?

So here is the basic overview of how you can develop your own RAG.

  1. LLM
  2. Embedding model.
  3. Vector database.
  4. data framework like langchain or llama index (recommended)

load data, make its embedding and store it in a vector database. Embedding models like cohere and vector databases like weaviate are good choices. After that, we need to do prompt engineering for LLM. Here we define rules like don’t make up answers or respond in a certain way or if you don’t find an answer just write I don’t know. We can assign a role to LLM like you are a medical practitioner and your job is to give medical guidance to the user.
Now how do these things work together? The user asks the question and it goes to LLM from that a search is performed (e.g. similarity search). Similarly, embedding is fed to LLM again and a human-understandable response is the final output.
In the next blog, I will explain the development of RAG in depth.
stay tune for more :-).

--

--

Hashir Nouman

Experienced Full Stack Developer proficient in React, Node, Next.js, TypeScript, and Tailwind. Passionate about sharing latest tech knowledge