AI/ML

    How to Run Kimi K2 on RunPod - Step by Step Setup Guide


    Why Choose RunPod?

    RunPod is a cloud GPU platform trusted by the open source AI community. It’s:

    • Affordable (per-minute GPU pricing)
    • Simple to launch (no DevOps needed)
    • Perfect for LLMs like Kimi K2, Mistral, Mixtral, LLaMA, etc.

    RunPod offers prebuilt templates, JupyterLab and Docker container runtimes, making it ideal for developers and researchers.

    • PrerequisitesFree RunPod account: https://runpod.io 
    • Hugging Face account (to accept Kimi K2 license)
    • Basic familiarity with Docker or Python CLI (optional)

    Step 1: Log in to RunPod & Choose a GPU

    1. Go to https://runpod.io/console 

    2. Click on 'Deploy a Pod'

    3. Under Template, choose:

    • Container: Custom Image OR
    • Prebuilt: Hugging Face Text Generation

    4. Select a GPU type (suggested: A100, RTX 4090, 3090)

    5. Choose Storage (at least 40 - 80 GB)

    Step 2: Configure Your Pod

    If using Custom Container:

    Container Image:

    nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04

    Add CMD if needed:

    sleep infinity

    Enable:

    • Public IP
    • Docker Support
    • Volume Persistence

    Click 'Deploy Pod'

    Step 3: Access the Pod Terminal

    Once the pod is running:

    1. Click 'Connect → Terminal'

    2. Update the system:

    apt update && apt install -y git-lfs python3-pip git

     

    3. Install libraries:

    pip3 install torch torchvision transformers accelerate huggingface_hub

    Step 4: Clone & Load Kimi K2

    1. Clone the Kimi K2 instruct repo:

    git lfs installgit clone https://huggingface.co/moonshotai/Kimi-K2-Instructcd Kimi-K2-Instruct

    2. Accept the model license at: https://huggingface.co/moonshotai/Kimi-K2-Instruct 

    3. Test model load script:

    from transformers import AutoModelForCausalLM, AutoTokenizerimport torchmodel_id = "moonshotai/Kimi-K2-Instruct"tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True,torch_dtype=torch.float16).cuda()prompt = "What is quantum computing?"inputs = tokenizer(prompt, return_tensors="pt").to("cuda")outputs = model.generate(**inputs, max_new_tokens=200)print(tokenizer.decode(outputs[0], skip_special_tokens=True)) 

    Step 5 (Optional): Use vLLM for Fast Inference Server

    Install vLLM:

    pip install vllm

    Run OpenAI-compatible server:

    python3 -m vllm.entrypoints.openai.api_server \ --model moonshotai/Kimi-K2-Instruct \ --tokenizer moonshotai/Kimi-K2-Instruct

    Bonus: JupyterLab UI

    Use RunPod's Jupyter template.

    Paste your Hugging Face token in .env or login with:

    huggingface-cli login

    Load and run Kimi K2 from a notebook (great for rapid prototyping)

    Final Thoughts

    Running Kimi K2 on RunPod gives you a blazing fast, budget-friendly setup to experiment with one of the most powerful open-source LLMs without needing DevOps or hardware. Whether you’re building AI tools, researching language models, or just exploring prompts, RunPod + Kimi K2 is a perfect match.

    Need enterprise grade deployment or DevOps help? Contact our expert AI Squad at OneClick IT Consultancy and let’s build something powerful together.

    Contact Us

    Comment

    Share

    facebook
    LinkedIn
    Twitter
    Mail
    AI/ML

    Related Center Of Excellence