How to set up a RAG pipeline with LangChain and a vector database

Introduction

If you have ever used tools like ChatGPT, you might have noticed that they sometimes don’t know your personal data or company documents. That’s where RAG (Retrieval-Augmented Generation) comes in.

RAG helps an AI system answer questions using your own data.

In simple words:

  • The AI does not rely only on its training
  • It first retrieves relevant data
  • Then it generates an answer based on that data

This makes responses more accurate and useful.


What You Will Build

In this guide, you will build a simple RAG pipeline that:

  1. Loads your documents
  2. Converts them into embeddings
  3. Stores them in a vector database
  4. Retrieves relevant chunks
  5. Uses an LLM to generate answers

Basic Concepts (Simple Explanation)

What is RAG?

RAG = Retrieval + Generation

  • Retrieval → find relevant data
  • Generation → create answer using that data

What is LangChain?

LangChain is a framework that helps you connect:

  • LLMs (like OpenAI)
  • Data sources
  • Tools

It makes building AI pipelines easier.

What is a Vector Database?

A vector database stores data as numbers (embeddings).

Why?
Because AI understands numbers better than text.

Example vector DBs:

  • FAISS (local)
  • Chroma (local)
  • Pinecone (cloud)

Step 1: Setup Your Environment

Install required libraries:

pip install langchain openai chromadb tiktoken

(Optional but recommended)

pip install python-dotenv

Step 2: Set Your API Key

Create a .env file:

OPENAI_API_KEY=your_api_key_here

Load it in Python:

from dotenv import load_dotenv
load_dotenv()

Step 3: Load Your Data

Let’s load a simple text file.

from langchain.document_loaders import TextLoader

loader = TextLoader("data.txt")
documents = loader.load()

Step 4: Split Documents into Chunks

LLMs work better with smaller chunks.

from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)

texts = text_splitter.split_documents(documents)

Step 5: Convert Text into Embeddings

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

Step 6: Store in Vector Database

Using Chroma (simple and local):

from langchain.vectorstores import Chroma

vectorstore = Chroma.from_documents(texts, embeddings)

Step 7: Create Retriever

Retriever finds relevant chunks based on query.

retriever = vectorstore.as_retriever()

Step 8: Create RAG Chain

from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI()

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever
)

Step 9: Ask Questions

query = "What is this document about?"
response = qa_chain.run(query)

print(response)

How It Works (Simple Flow)

  1. You ask a question
  2. Retriever finds similar text chunks
  3. Those chunks are passed to LLM
  4. LLM generates final answer

Improving Your RAG Pipeline

Once basic setup is done, you can improve it:

Better Chunking

  • Try different chunk sizes
  • Maintain context with overlap

Better Embeddings

  • Use advanced embedding models

Better Retrieval

  • Top-k results
  • Filtering

Add Metadata

  • Source tracking
  • Document tags

Common Mistakes

  • Using very large chunks
  • Not cleaning data
  • Ignoring overlap
  • Using weak embeddings

When Should You Use RAG?

Use RAG when:

  • You want AI to use your own data
  • Data changes frequently
  • You don’t want to retrain models

Conclusion

RAG is one of the most practical ways to build useful AI applications today.

With just a few steps, you can connect your data with an LLM and create a smart system that answers questions based on real information.

Start small, experiment with different configurations, and gradually improve your pipeline.

This is the foundation of many modern AI applications like chatbots, knowledge assistants, and internal tools.

Leave a Comment