Introduction

If you have ever used tools like ChatGPT, you might have noticed that they sometimes don’t know your personal data or company documents. That’s where RAG (Retrieval-Augmented Generation) comes in.

RAG helps an AI system answer questions using your own data.

In simple words:

The AI does not rely only on its training
It first retrieves relevant data
Then it generates an answer based on that data

This makes responses more accurate and useful.

What You Will Build

In this guide, you will build a simple RAG pipeline that:

Loads your documents
Converts them into embeddings
Stores them in a vector database
Retrieves relevant chunks
Uses an LLM to generate answers

Basic Concepts (Simple Explanation)

What is RAG?

RAG = Retrieval + Generation

Retrieval → find relevant data
Generation → create answer using that data

What is LangChain?

LangChain is a framework that helps you connect:

LLMs (like OpenAI)
Data sources
Tools

It makes building AI pipelines easier.

What is a Vector Database?

A vector database stores data as numbers (embeddings).

Why?
Because AI understands numbers better than text.

Example vector DBs:

FAISS (local)
Chroma (local)
Pinecone (cloud)

Step 1: Setup Your Environment

Install required libraries:

pip install langchain openai chromadb tiktoken

(Optional but recommended)

pip install python-dotenv

Step 2: Set Your API Key

Create a .env file:

OPENAI_API_KEY=your_api_key_here

Load it in Python:

from dotenv import load_dotenv
load_dotenv()

Step 3: Load Your Data

Let’s load a simple text file.

from langchain.document_loaders import TextLoader

loader = TextLoader("data.txt")
documents = loader.load()

Step 4: Split Documents into Chunks

LLMs work better with smaller chunks.

from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)

texts = text_splitter.split_documents(documents)

Step 5: Convert Text into Embeddings

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

Step 6: Store in Vector Database

Using Chroma (simple and local):

from langchain.vectorstores import Chroma

vectorstore = Chroma.from_documents(texts, embeddings)

Step 7: Create Retriever

Retriever finds relevant chunks based on query.

retriever = vectorstore.as_retriever()

Step 8: Create RAG Chain

from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI()

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever
)

Step 9: Ask Questions

query = "What is this document about?"
response = qa_chain.run(query)

print(response)

How It Works (Simple Flow)

You ask a question
Retriever finds similar text chunks
Those chunks are passed to LLM
LLM generates final answer

Improving Your RAG Pipeline

Once basic setup is done, you can improve it:

Better Chunking

Try different chunk sizes
Maintain context with overlap

Better Embeddings

Use advanced embedding models

Better Retrieval

Top-k results
Filtering

Add Metadata

Source tracking
Document tags

Common Mistakes

Using very large chunks
Not cleaning data
Ignoring overlap
Using weak embeddings

When Should You Use RAG?

Use RAG when:

You want AI to use your own data
Data changes frequently
You don’t want to retrain models

Conclusion

RAG is one of the most practical ways to build useful AI applications today.

With just a few steps, you can connect your data with an LLM and create a smart system that answers questions based on real information.

Start small, experiment with different configurations, and gradually improve your pipeline.

This is the foundation of many modern AI applications like chatbots, knowledge assistants, and internal tools.

How to set up a RAG pipeline with LangChain and a vector database