AI/ML

    How to Run Kimi K2 on RunPod - Step by Step Setup Guide


    Why Choose RunPod?

    RunPod is a cloud GPU platform trusted by the open source AI community. It’s:

    • Affordable (per-minute GPU pricing)
    • Simple to launch (no DevOps needed)
    • Perfect for LLMs like Kimi K2, Mistral, Mixtral, LLaMA, etc.

    RunPod offers prebuilt templates, JupyterLab and Docker container runtimes, making it ideal for developers and researchers.

    • PrerequisitesFree RunPod account: https://runpod.io 
    • Hugging Face account (to accept Kimi K2 license)
    • Basic familiarity with Docker or Python CLI (optional)

    Step 1: Log in to RunPod & Choose a GPU

    1. Go to https://runpod.io/console 

    2. Click on 'Deploy a Pod'

    3. Under Template, choose:

    • Container: Custom Image OR
    • Prebuilt: Hugging Face Text Generation

    4. Select a GPU type (suggested: A100, RTX 4090, 3090)

    5. Choose Storage (at least 40 - 80 GB)

    Step 2: Configure Your Pod

    If using Custom Container:

    Container Image:

    nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04

    Add CMD if needed:

    sleep infinity

    Enable:

    • Public IP
    • Docker Support
    • Volume Persistence

    Click 'Deploy Pod'

    Step 3: Access the Pod Terminal

    Once the pod is running:

    1. Click 'Connect → Terminal'

    2. Update the system:

    apt update && apt install -y git-lfs python3-pip git

     

    3. Install libraries:

    pip3 install torch torchvision transformers accelerate huggingface_hub

    Step 4: Clone & Load Kimi K2

    1. Clone the Kimi K2 instruct repo:

    git lfs installgit clone https://huggingface.co/moonshotai/Kimi-K2-Instructcd Kimi-K2-Instruct

    2. Accept the model license at: https://huggingface.co/moonshotai/Kimi-K2-Instruct 

    3. Test model load script:

    from transformers import AutoModelForCausalLM, AutoTokenizerimport torchmodel_id = "moonshotai/Kimi-K2-Instruct"tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True,torch_dtype=torch.float16).cuda()prompt = "What is quantum computing?"inputs = tokenizer(prompt, return_tensors="pt").to("cuda")outputs = model.generate(**inputs, max_new_tokens=200)print(tokenizer.decode(outputs[0], skip_special_tokens=True)) 

    Step 5 (Optional): Use vLLM for Fast Inference Server

    Install vLLM:

    pip install vllm

    Run OpenAI-compatible server:

    python3 -m vllm.entrypoints.openai.api_server \ --model moonshotai/Kimi-K2-Instruct \ --tokenizer moonshotai/Kimi-K2-Instruct

    Bonus: JupyterLab UI

    Use RunPod's Jupyter template.

    Paste your Hugging Face token in .env or login with:

    huggingface-cli login

    Load and run Kimi K2 from a notebook (great for rapid prototyping)

    Final Thoughts

    Running Kimi K2 on RunPod gives you a blazing fast, budget-friendly setup to experiment with one of the most powerful open-source LLMs without needing DevOps or hardware. Whether you’re building AI tools, researching language models, or just exploring prompts, RunPod + Kimi K2 is a perfect match.

    Need enterprise grade deployment or DevOps help? Contact our expert AI Squad at OneClick IT Consultancy and let’s build something powerful together.

    Share

    facebook
    LinkedIn
    Twitter
    Mail
    AI/ML

    Related Center Of Excellence