AI/ML

How to Install Kimi K2 Locally via Hugging Face - Step by Step Guide

Need technical help?

Our experts will get back to you within 24 hours.

How to Install Kimi K2 Locally via Hugging Face (Step by Step Guide)

Kimi K2, the trillion parameter open - source LLM by Moonshot AI is a game changer for developers, researchers and startups looking to deploy self hosted AI models. You can run it locally using Hugging Face Transformers or high performance backends like vLLM or SGLang.

Kimi K2, the trillion parameter open- source LLM by Moonshot AI is a game changer for developers, researchers and startups looking to deploy self hosted AI models. You can run it locally using Hugging Face Transformers or high performance backends like vLLM or SGLang.

Prerequisites

Python 3.10+
CUDA-compatible GPU (NVIDIA A100, 3090, 4090 or better recommended)
Linux or WSL2 (for Windows users)
Git and pip

Optional but recommended

Docker
Conda environment

Step 1: Create a Python Environment

Using conda:

conda create -n kimi-k2 python=3.10 -yconda activate kimi-k2

OR using venv:

python3 -m venv kimi-k2source kimi-k2/bin/activate

Step 2: Install Required Libraries

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121pip install transformers accelerate huggingface_hub

Step 3: Clone the Model Repo from Hugging Face

git lfs install

git clone https://huggingface.co/moonshotai/Kimi-K2-Instruct

cd Kimi-K2-Instruct

Make sure you have a Hugging Face account and have accepted the model license at:Huggingface

Step 4: Load the Model with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizerimport torchmodel_id = "moonshotai/Kimi-K2-Instruct"tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.float16).cuda()prompt = "Explain the theory of relativity in simple terms."inputs = tokenizer(prompt, return_tensors="pt").to("cuda")outputs = model.generate(**inputs, max_new_tokens=200)print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Step 5 (Optional): Use vLLM for Faster Inference

pip install vllmpython3 -m vllm.entrypoints.openai.api_server \ --model moonshotai/Kimi-K2-Instruct \ --tokenizer moonshotai/Kimi-K2-Instruct

Step 6: Test Your Local Setup

curl http://localhost:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "kimi-k2", "prompt": "Write a summary of quantum physics.", "max_tokens": 200}'

Bonus

Run with SGLang or Docker (Advanced Users)

SGLang Setup Guide:SGLang
TensorRT-LLM/Docker Setup: TensorRT-LLM

Final Thoughts

With this center of excellence guide, you’ve now installed Kimi K2 locally and have a powerful trillion parameter LLM at your fingertips without relying on paid APIs. Whether you're building a chatbot, AI coding assistant or NLP application, Kimi K2 is ready to power your ideas.

Need help with production deployment or integration into your product? Contact our expert AI Squad at OneClick IT Consultancy and let’s build something powerful together.

AI/ML

Related Center Of Excellence

See all