Kimi K2, a trillion parameter open source LLM by Moonshot AI, is making waves for developers seeking privacy, performance and cost efficiency. With Docker on AWS EC2, you can deploy Kimi K2 with GPU acceleration for a high-performance, cloud hosted AI setup.
Prerequisites
1. AWS account with EC2 access
2. Create an EC2 instance (recommended: g5.2xlarge, p3.2xlarge, or p4d)
3. SSH access to the instance
4. Security group with ports 22, 8000, 7860 open
5. NVIDIA GPU support with NVIDIA drivers installed
6. Docker & NVIDIA Container Toolkit
Use Deep Learning Base AMI (Ubuntu 20.04 / CUDA 12 pre-installed). Choose instance type: g5.2xlarge, p3.2xlarge, or p4d. Add security group rules for SSH (22), HTTP (80), and custom TCP (8000, 7860). Launch and connect via SSH.
sudo apt update
sudo apt install -y docker.io nvidia-driver-525
sudo systemctl start docker
sudo systemctl enable docker
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update
sudo apt install -y nvidia-docker2
sudo systemctl restart docker
sudo apt install git-lfs
git lfs install
git clone https://huggingface.co/moonshotai/Kimi-K2-Instruct
cd Kimi-K2-Instruct
Make sure you have accepted the model license at: https://huggingface.co/moonshotai/Kimi-K2-Instruct
Create a file named Dockerfile in your project:
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04
RUN apt update && apt install -y python3 python3-pip git
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
RUN pip3 install transformers accelerate huggingface_hub
WORKDIR /app
COPY . .
CMD ["python3", "app.py"]
sudo docker build -t kimi-k2 .
sudo docker run --gpus all -p 7860:7860 kimi-k2
curl http://<your-ec2-ip>:7860
Or run an example Python script from within the container.
pip install vllm
python3 -m vllm.entrypoints.openai.api_server --model moonshotai/Kimi-K2-Instruct
By following this guide, you now have a cloud hosted, GPU accelerated, self-hosted Kimi K2 model running on AWS EC2 with Docker. This is perfect for building chatbots, coding assistants, semantic search engines and more without relying on paid APIs or closed platforms.
Need enterprise grade deployment or DevOps help? Contact our expert AI Squad at OneClick IT Consultancy and let’s build something powerful together.