RunPod is a cloud GPU platform trusted by the open source AI community. It’s:
RunPod offers prebuilt templates, JupyterLab and Docker container runtimes, making it ideal for developers and researchers.
1. Go to https://runpod.io/console
2. Click on 'Deploy a Pod'
3. Under Template, choose:
4. Select a GPU type (suggested: A100, RTX 4090, 3090)
5. Choose Storage (at least 40 - 80 GB)
If using Custom Container:
Container Image:
nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04
Add CMD if needed:
sleep infinity
Enable:
Click 'Deploy Pod'
Once the pod is running:
1. Click 'Connect → Terminal'
2. Update the system:
apt update && apt install -y git-lfs python3-pip git
3. Install libraries:
pip3 install torch torchvision transformers accelerate huggingface_hub
1. Clone the Kimi K2 instruct repo:
git lfs install
git clone
https://huggingface.co/moonshotai/Kimi-K2-Instruct
cd Kimi-K2-Instruct
2. Accept the model license at: https://huggingface.co/moonshotai/Kimi-K2-Instruct
3. Test model load script:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "moonshotai/Kimi-K2-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True,
torch_dtype=torch.float16).cuda()
prompt = "What is quantum computing?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Install vLLM:
pip install vllm
Run OpenAI-compatible server:
python3 -m vllm.entrypoints.openai.api_server \
--model moonshotai/Kimi-K2-Instruct \
--tokenizer moonshotai/Kimi-K2-Instruct
Use RunPod's Jupyter template.
Paste your Hugging Face token in .env or login with:
huggingface-cli login
Load and run Kimi K2 from a notebook (great for rapid prototyping)
Running Kimi K2 on RunPod gives you a blazing fast, budget-friendly setup to experiment with one of the most powerful open-source LLMs without needing DevOps or hardware. Whether you’re building AI tools, researching language models, or just exploring prompts, RunPod + Kimi K2 is a perfect match.
Need enterprise grade deployment or DevOps help? Contact our expert AI Squad at OneClick IT Consultancy and let’s build something powerful together.