Kimi K2, the trillion parameter open - source LLM by Moonshot AI is a game changer for developers, researchers and startups looking to deploy self hosted AI models. You can run it locally using Hugging Face Transformers or high performance backends like vLLM or SGLang.
Kimi K2, the trillion parameter open- source LLM by Moonshot AI is a game changer for developers, researchers and startups looking to deploy self hosted AI models. You can run it locally using Hugging Face Transformers or high performance backends like vLLM or SGLang.
Prerequisites
Python 3.10+
CUDA-compatible GPU (NVIDIA A100, 3090, 4090 or better recommended)
Linux or WSL2 (for Windows users)
Git and pip
Optional but recommended
Docker
Conda environment
Using conda:
conda create -n kimi-k2 python=3.10 -yconda activate kimi-k2OR using venv:
python3 -m venv kimi-k2source kimi-k2/bin/activatepip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121pip install transformers accelerate huggingface_hubgit lfs install
git clone https://huggingface.co/moonshotai/Kimi-K2-Instruct
cd Kimi-K2-Instruct
Make sure you have a Hugging Face account and have accepted the model license at:Huggingface
from transformers import AutoModelForCausalLM, AutoTokenizerimport torchmodel_id = "moonshotai/Kimi-K2-Instruct"tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.float16).cuda()prompt = "Explain the theory of relativity in simple terms."inputs = tokenizer(prompt, return_tensors="pt").to("cuda")outputs = model.generate(**inputs, max_new_tokens=200)print(tokenizer.decode(outputs[0], skip_special_tokens=True))pip install vllmpython3 -m vllm.entrypoints.openai.api_server \ --model moonshotai/Kimi-K2-Instruct \ --tokenizer moonshotai/Kimi-K2-Instructcurl http://localhost:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "kimi-k2", "prompt": "Write a summary of quantum physics.", "max_tokens": 200}'
Run with SGLang or Docker (Advanced Users)
With this center of excellence guide, you’ve now installed Kimi K2 locally and have a powerful trillion parameter LLM at your fingertips without relying on paid APIs. Whether you're building a chatbot, AI coding assistant or NLP application, Kimi K2 is ready to power your ideas.
Need help with production deployment or integration into your product? Contact our expert AI Squad at OneClick IT Consultancy and let’s build something powerful together.
Contact Us