Kimi K2, the trillion parameter open - source LLM by Moonshot AI is a game changer for developers, researchers and startups looking to deploy self hosted AI models. You can run it locally using Hugging Face Transformers or high performance backends like vLLM or SGLang.
Kimi K2, the trillion parameter open- source LLM by Moonshot AI is a game changer for developers, researchers and startups looking to deploy self hosted AI models. You can run it locally using Hugging Face Transformers or high performance backends like vLLM or SGLang.
Prerequisites
Python 3.10+
CUDA-compatible GPU (NVIDIA A100, 3090, 4090 or better recommended)
Linux or WSL2 (for Windows users)
Git and pip
Optional but recommended
Docker
Conda environment
Using conda:
conda create -n kimi-k2 python=3.10 -y
conda activate kimi-k2
OR using venv:
python3 -m venv kimi-k2
source kimi-k2/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install transformers accelerate huggingface_hub
git lfs install
git clone https://huggingface.co/moonshotai/Kimi-K2-Instruct
cd Kimi-K2-Instruct
Make sure you have a Hugging Face account and have accepted the model license at:Huggingface
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "moonshotai/Kimi-K2-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.float16).cuda()
prompt = "Explain the theory of relativity in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
pip install vllm
python3 -m vllm.entrypoints.openai.api_server \
--model moonshotai/Kimi-K2-Instruct \
--tokenizer moonshotai/Kimi-K2-Instruct
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2",
"prompt": "Write a summary of quantum physics.",
"max_tokens": 200
}'
Run with SGLang or Docker (Advanced Users)
With this center of excellence guide, you’ve now installed Kimi K2 locally and have a powerful trillion parameter LLM at your fingertips without relying on paid APIs. Whether you're building a chatbot, AI coding assistant or NLP application, Kimi K2 is ready to power your ideas.
Need help with production deployment or integration into your product? Contact our expert AI Squad at OneClick IT Consultancy and let’s build something powerful together.