AI/ML

    Building a RAG System with DeepSeek R1, Ollama and LangChain

    deepseek

    Deepseek Model for your Business?

    • check icon

      Cost Efficiency (Open Source)

    • check icon

      Lower Long Term costs

    • check icon

      Customised data control

    • check icon

      Pre-trained model

    Read More

    Get Your Deepseek AI Model Running in a Day


    Free Installation Guide - Step by Step Instructions Inside!

    Overview

    A step by step guide to setting up a local Retrieval Augmented Generation (RAG) system using DeepSeek R1 as the LLM, Ollama as the model server and LangChain for retrieval.

     

    RAG (Retrieval Augmented Generation) enhances LLMs by integrating a document retrieval mechanism, allowing them to generate more accurate and context aware responses. In this guide, we will:

    • Load DeepSeek R1 using Ollama.
    • Process and store document embeddings.
    • Retrieve relevant documents based on user queries.
    • Generate responses using retrieved context.

     

    Step 1: Install Required Dependencies

    Before setting up the system, install the necessary dependencies:

    pip install langchain langchain-community chromadb pypdf streamlit ollama
    • LangChain: Framework for retrieval-based LLM applications.
    • Chromadb: Vector database for storing and searching embeddings.
    • PyPDF: Used for loading and parsing PDF documents.
    • Ollama: Runs the DeepSeek R1 model locally.

    Installing DeepSeek R1 in Ollama

    Run the following command to download DeepSeek R1 to your machine:

    ollama pull deepseek-r1

     

    Step 2: Project Structure

    Below is the recommended project structure:

    rag-system/│── embeddings/│ ├── __init__.py│ ├── text_splitter.py # Splits documents into smaller chunks│ ├── vector_store.py # Handles embeddings and storage│── ollama_model/│ ├── __init__.py│ ├── deepseek_r1.py # Loads DeepSeek R1 with Ollama│── app/│ ├── __init__.py│ ├── retriever.py # Retrieves relevant document chunks│ ├── rag_chain.py # Generates final response│ ├── streamlit_app.py # Web UI for interaction│── data/│ ├── sample.pdf # Example document for testing│── requirements.txt # Required dependencies│── .env # API keys (if needed)│── main.py # Main entry point 

    Step 3: Load and Process Documents

    To ensure efficient retrieval, we need to split large documents into small chunks before storing embeddings.

    File: “embeddings/text_splitter.py”

    from langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.document_loaders import PyPDFLoaderdef split_text(file_path): loader = PyPDFLoader(file_path) documents = loader.load() splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50) return splitter.split_documents(documents)

     

    This script reads a PDF file, extracts text, and splits it into chunks of 500 characters.

    Step 4: Generate and Store Embeddings

    Now, we need to convert the text chunks into embeddings and store them in a vector database.

    File: “embeddings/vector_store.py”

    from langchain.vectorstores import Chromafrom langchain.embeddings import OllamaEmbeddingsdef store_embeddings(chunks): embeddings = OllamaEmbeddings(model="deepseek-r1") vector_store = Chroma.from_documents(chunks, embeddings, persist_directory="./vector_db") vector_store.persist()

     

    Uses ChromaDB to store text embeddings.

    DeepSeek R1 is used to generate embeddings via Ollama.

    Step 5: Retrieve Relevant Information

    When a user asks a question, we retrieve the most relevant text chunks from the vector database.

    File: “app/retriever.py”

    from langchain.vectorstores import Chromadef retrieve_chunks(query): vector_store = Chroma(persist_directory="./vector_db") return vector_store.similarity_search(query, k=3)

     

    Uses cosine similarity to find the top 3 most relevant text chunks.

     

    Step 6: Load DeepSeek R1 in Ollama

    To process user queries, we need to load the DeepSeek R1 model using Ollama.

    File: “ollama_model/deepseek_r1.py”

    import ollama def load_llm(): return ollama.Chat(model="deepseek-r1")

     

    Initializes DeepSeek R1 as the primary language model.

    Step 7: RAG Chain – Combining Retrieval with LLM

    Once we retrieve the relevant chunks, we pass them to the LLM to generate a response.

    File: “app/rag_chain.py”

    from ollama_model.deepseek_r1 import load_llmfrom app.retriever import retrieve_chunksdef get_rag_response(query): retrieved_chunks = retrieve_chunks(query) context = "\n".join([chunk.page_content for chunk in retrieved_chunks]) llm = load_llm() response = llm.run(f"Use the following context to answer:\n{context}\n\nQuestion: {query}") return response

     

    This function retrieves relevant text chunks and uses them as context for DeepSeek R1 to generate a response.

    Step 8: Create a Web UI with Streamlit

    To allow users to interact with the system, we use Streamlit for a simple web interface.

    File: “app/streamlit_app.py”

    import streamlit as stfrom app.rag_chain import get_rag_responsest.title("RAG System with DeepSeek R1")query = st.text_input("Ask a question:")if query: response = get_rag_response(query) st.write("### Response:") st.write(response)

     

    The app provides a text input for user queries and displays responses.

    Run the UI:

    streamlit run app/streamlit_app.py

     

    Step 9: Running the Complete RAG System

    File: “main.py”

    from embeddings.text_splitter import split_textfrom embeddings.vector_store import store_embeddingsdef main(): print("[1/2] Splitting and processing documents...") chunks = split_text("data/sample.pdf") print("[2/2] Generating and storing embeddings...") store_embeddings(chunks) print("Embeddings stored. You can now run the Streamlit app with:\n") print("   streamlit run app/streamlit_app.py")if __name__ == "__main__": main()

    Once all components are ready, follow these steps to run the full system.

    Start Ollama and Ensure DeepSeek R1 is Available

    ollama pull deepseek-r1

    Run the Main Pipeline

    python main.py

    Launch the Web UI

    streamlit run app/streamlit_app.py

    System Requirements

    • CPU: 8-core processor (Intel/AMD)
    • RAM: 16GB+
    • GPU: NVIDIA RTX 3090+ (for faster inference)
    • Disk Space: 20GB+ (for model and embeddings)
    • OS: Ubuntu 20.04 / 22.04

    Summary

    • Documents are split into smaller chunks.
    • Embeddings are stored using ChromaDB.
    • User queries retrieve relevant document chunks.
    • DeepSeek R1 generates answers using context aware retrieval.
    • A Streamlit UI enables user interaction.

    This completes the setup of a RAG system with DeepSeek R1 using Ollama and LangChain.

     Ready to transform your business with our technology solutions? Contact Us  today to Leverage Our AI/ML Expertise. 

    Share

    facebook
    LinkedIn
    Twitter
    Mail
    AI/ML

    Related Center Of Excellence