AI/ML

    How to Install Kimi K2 Locally via Hugging Face - Step by Step Guide


    How to Install Kimi K2 Locally via Hugging Face (Step by Step Guide)

    Kimi K2, the trillion parameter open - source LLM by Moonshot AI is a game changer for developers, researchers and startups looking to deploy self hosted AI models. You can run it locally using Hugging Face Transformers or high performance backends like vLLM or SGLang.

    Kimi K2, the trillion parameter open- source LLM by Moonshot AI is a game changer for developers, researchers and startups looking to deploy self hosted AI models. You can run it locally using Hugging Face Transformers or high performance backends like vLLM or SGLang.

    Prerequisites

    • Python 3.10+

    • CUDA-compatible GPU (NVIDIA A100, 3090, 4090 or better recommended)

    • Linux or WSL2 (for Windows users)

    • Git and pip

    Optional but recommended

    • Docker

    • Conda environment

    Step 1: Create a Python Environment

    Using conda:

    conda create -n kimi-k2 python=3.10 -yconda activate kimi-k2

    OR using venv:

    python3 -m venv kimi-k2source kimi-k2/bin/activate

    Step 2: Install Required Libraries

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121pip install transformers accelerate huggingface_hub

    Step 3: Clone the Model Repo from Hugging Face

    git lfs install

    git clone https://huggingface.co/moonshotai/Kimi-K2-Instruct

    cd Kimi-K2-Instruct

    Make sure you have a Hugging Face account and have accepted the model license at:Huggingface 

    Step 4: Load the Model with Transformers

    from transformers import AutoModelForCausalLM, AutoTokenizerimport torchmodel_id = "moonshotai/Kimi-K2-Instruct"tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.float16).cuda()prompt = "Explain the theory of relativity in simple terms."inputs = tokenizer(prompt, return_tensors="pt").to("cuda")outputs = model.generate(**inputs, max_new_tokens=200)print(tokenizer.decode(outputs[0], skip_special_tokens=True))

    Step 5 (Optional): Use vLLM for Faster Inference

    pip install vllmpython3 -m vllm.entrypoints.openai.api_server \ --model moonshotai/Kimi-K2-Instruct \ --tokenizer moonshotai/Kimi-K2-Instruct

    Step 6: Test Your Local Setup

    curl http://localhost:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "kimi-k2", "prompt": "Write a summary of quantum physics.", "max_tokens": 200}'

     

    Bonus

    Run with SGLang or Docker (Advanced Users)

    Final Thoughts

    With this center of excellence guide, you’ve now installed Kimi K2 locally and have a powerful trillion parameter LLM at your fingertips without relying on paid APIs. Whether you're building a chatbot, AI coding assistant or NLP application, Kimi K2 is ready to power your ideas.

    Need help with production deployment or integration into your product? Contact our expert AI Squad at OneClick IT Consultancy and let’s build something powerful together.

    Share

    facebook
    LinkedIn
    Twitter
    Mail
    AI/ML

    Related Center Of Excellence