AI/ML

    Building a RAG System with DeepSeek R1, Ollama and LangChain

    deepseek

    Deepseek Model for your Business?

    • check icon

      Cost Efficiency (Open Source)

    • check icon

      Lower Long Term costs

    • check icon

      Customised data control

    • check icon

      Pre-trained model

    Read More

    Get Your Deepseek AI Model Running in a Day


    Free Installation Guide - Step by Step Instructions Inside!

    Overview

    A step by step guide to setting up a local Retrieval Augmented Generation (RAG) system using DeepSeek R1 as the LLM, Ollama as the model server and LangChain for retrieval.

     

    RAG (Retrieval Augmented Generation) enhances LLMs by integrating a document retrieval mechanism, allowing them to generate more accurate and context aware responses. In this guide, we will:

    • Load DeepSeek R1 using Ollama.
    • Process and store document embeddings.
    • Retrieve relevant documents based on user queries.
    • Generate responses using retrieved context.

     

    Step 1: Install Required Dependencies

    Before setting up the system, install the necessary dependencies:

    pip install langchain langchain-community chromadb pypdf streamlit ollama
    • LangChain: Framework for retrieval-based LLM applications.
    • Chromadb: Vector database for storing and searching embeddings.
    • PyPDF: Used for loading and parsing PDF documents.
    • Ollama: Runs the DeepSeek R1 model locally.

    Installing DeepSeek R1 in Ollama

    Run the following command to download DeepSeek R1 to your machine:

    ollama pull deepseek-r1

     

    Step 2: Project Structure

    Below is the recommended project structure:

    rag-system/│── embeddings/│ ├── __init__.py│ ├── text_splitter.py # Splits documents into smaller chunks│ ├── vector_store.py # Handles embeddings and storage│── ollama_model/│ ├── __init__.py│ ├── deepseek_r1.py # Loads DeepSeek R1 with Ollama│── app/│ ├── __init__.py│ ├── retriever.py # Retrieves relevant document chunks│ ├── rag_chain.py # Generates final response│ ├── streamlit_app.py # Web UI for interaction│── data/│ ├── sample.pdf # Example document for testing│── requirements.txt # Required dependencies│── .env # API keys (if needed)│── main.py # Main entry point 

    Step 3: Load and Process Documents

    To ensure efficient retrieval, we need to split large documents into small chunks before storing embeddings.

    File: “embeddings/text_splitter.py”

    from langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.document_loaders import PyPDFLoaderdef split_text(file_path): loader = PyPDFLoader(file_path) documents = loader.load() splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50) return splitter.split_documents(documents)

     

    This script reads a PDF file, extracts text, and splits it into chunks of 500 characters.

    Step 4: Generate and Store Embeddings

    Now, we need to convert the text chunks into embeddings and store them in a vector database.

    File: “embeddings/vector_store.py”

    from langchain.vectorstores import Chromafrom langchain.embeddings import OllamaEmbeddingsdef store_embeddings(chunks): embeddings = OllamaEmbeddings(model="deepseek-r1") vector_store = Chroma.from_documents(chunks, embeddings, persist_directory="./vector_db") vector_store.persist()

     

    Uses ChromaDB to store text embeddings.

    DeepSeek R1 is used to generate embeddings via Ollama.

    Step 5: Retrieve Relevant Information

    When a user asks a question, we retrieve the most relevant text chunks from the vector database.

    File: “app/retriever.py”

    from langchain.vectorstores import Chromadef retrieve_chunks(query): vector_store = Chroma(persist_directory="./vector_db") return vector_store.similarity_search(query, k=3)

     

    Uses cosine similarity to find the top 3 most relevant text chunks.

     

    Step 6: Load DeepSeek R1 in Ollama

    To process user queries, we need to load the DeepSeek R1 model using Ollama.

    File: “ollama_model/deepseek_r1.py”

    import ollama def load_llm(): return ollama.Chat(model="deepseek-r1")

     

    Initializes DeepSeek R1 as the primary language model.

    Step 7: RAG Chain – Combining Retrieval with LLM

    Once we retrieve the relevant chunks, we pass them to the LLM to generate a response.

    File: “app/rag_chain.py”

    from ollama_model.deepseek_r1 import load_llmfrom app.retriever import retrieve_chunksdef get_rag_response(query): retrieved_chunks = retrieve_chunks(query) context = "\n".join([chunk.page_content for chunk in retrieved_chunks]) llm = load_llm() response = llm.run(f"Use the following context to answer:\n{context}\n\nQuestion: {query}") return response

     

    This function retrieves relevant text chunks and uses them as context for DeepSeek R1 to generate a response.

    Step 8: Create a Web UI with Streamlit

    To allow users to interact with the system, we use Streamlit for a simple web interface.

    File: “app/streamlit_app.py”

    import streamlit as stfrom app.rag_chain import get_rag_responsest.title("RAG System with DeepSeek R1")query = st.text_input("Ask a question:")if query: response = get_rag_response(query) st.write("### Response:") st.write(response)

     

    The app provides a text input for user queries and displays responses.

    Run the UI:

    streamlit run app/streamlit_app.py

     

    Step 9: Running the Complete RAG System

    File: “main.py”

    from embeddings.text_splitter import split_textfrom embeddings.vector_store import store_embeddingsdef main(): print("[1/2] Splitting and processing documents...") chunks = split_text("data/sample.pdf") print("[2/2] Generating and storing embeddings...") store_embeddings(chunks) print("Embeddings stored. You can now run the Streamlit app with:\n") print("   streamlit run app/streamlit_app.py")if __name__ == "__main__": main()

    Once all components are ready, follow these steps to run the full system.

    Start Ollama and Ensure DeepSeek R1 is Available

    ollama pull deepseek-r1

    Run the Main Pipeline

    python main.py

    Launch the Web UI

    streamlit run app/streamlit_app.py

    System Requirements

    • CPU: 8-core processor (Intel/AMD)
    • RAM: 16GB+
    • GPU: NVIDIA RTX 3090+ (for faster inference)
    • Disk Space: 20GB+ (for model and embeddings)
    • OS: Ubuntu 20.04 / 22.04

    Summary

    • Documents are split into smaller chunks.
    • Embeddings are stored using ChromaDB.
    • User queries retrieve relevant document chunks.
    • DeepSeek R1 generates answers using context aware retrieval.
    • A Streamlit UI enables user interaction.

    This completes the setup of a RAG system with DeepSeek R1 using Ollama and LangChain.

     Ready to transform your business with our technology solutions? Contact Us  today to Leverage Our AI/ML Expertise. 

    Experts in AI, ML, and automation at OneClick IT Consultancy

    AI Force

    AI Force at OneClick IT Consultancy pioneers artificial intelligence and machine learning solutions. We drive COE initiatives by developing intelligent automation, predictive analytics, and AI-driven applications that transform businesses.

    Share

    facebook
    LinkedIn
    Twitter
    Mail
    AI/ML

    Related Center Of Excellence