Kimi K2, a trillion parameter open source LLM by Moonshot AI, is making waves for developers seeking privacy, performance and cost efficiency. With Docker on AWS EC2, you can deploy Kimi K2 with GPU acceleration for a high-performance, cloud hosted AI setup.
Prerequisites
1. AWS account with EC2 access
2. Create an EC2 instance (recommended: g5.2xlarge, p3.2xlarge, or p4d)
3. SSH access to the instance
4. Security group with ports 22, 8000, 7860 open
5. NVIDIA GPU support with NVIDIA drivers installed
6. Docker & NVIDIA Container Toolkit
Use Deep Learning Base AMI (Ubuntu 20.04 / CUDA 12 pre-installed). Choose instance type: g5.2xlarge, p3.2xlarge, or p4d. Add security group rules for SSH (22), HTTP (80), and custom TCP (8000, 7860). Launch and connect via SSH.
sudo apt updatesudo apt install -y docker.io nvidia-driver-525sudo systemctl start dockersudo systemctl enable dockerdistribution=$(. /etc/os-release;echo $ID$VERSION_ID)curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.listsudo apt updatesudo apt install -y nvidia-docker2sudo systemctl restart dockersudo apt install git-lfsgit lfs installgit clone https://huggingface.co/moonshotai/Kimi-K2-Instructcd Kimi-K2-Instruct
Make sure you have accepted the model license at: https://huggingface.co/moonshotai/Kimi-K2-Instruct
Create a file named Dockerfile in your project:
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04RUN apt update && apt install -y python3 python3-pip gitRUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121RUN pip3 install transformers accelerate huggingface_hubWORKDIR /appCOPY . .CMD ["python3", "app.py"]sudo docker build -t kimi-k2 .sudo docker run --gpus all -p 7860:7860 kimi-k2curl http://<your-ec2-ip>:7860Or run an example Python script from within the container.
pip install vllmpython3 -m vllm.entrypoints.openai.api_server --model moonshotai/Kimi-K2-InstructBy following this guide, you now have a cloud hosted, GPU accelerated, self-hosted Kimi K2 model running on AWS EC2 with Docker. This is perfect for building chatbots, coding assistants, semantic search engines and more without relying on paid APIs or closed platforms.
Need enterprise grade deployment or DevOps help? Contact our expert AI Squad at OneClick IT Consultancy and let’s build something powerful together.
Contact Us