Prerequisites:
AWS account
IAM role with EKS and EC2 permissions
GPU-enabled EC2 instance types (e.g., p3.2xlarge)
Kubectl, eksctl, Helm installed
Docker & Git installed locally
Step 1: Create an EKS Cluster with GPU Nodes
eksctl create cluster \
--name kimi-k2-cluster \
--region us-east-1 \
--nodegroup-name gpu-nodes \
--node-type p3.2xlarge \
--nodes 2 \
--nodes-min 1 \
--nodes-max 3 \
--managed
Step 2: Build and Push Docker Image for Kimi K2
git clone https://huggingface.co/moonshotai/Kimi-K2-Instruct
cd Kimi-K2-Instruct
Dockerfile:
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04
RUN apt update && apt install -y python3 python3-pip git
RUN pip3 install torch torchvision transformers accelerate huggingface_hub
WORKDIR /app
COPY . .
CMD ["python3", "app.py"]
aws ecr create-repository --repository-name kimi-k2
$(aws ecr get-login --no-include-email --region us-east-1)
docker build -t kimi-k2 .
docker tag kimi-k2:latest <account-id>.dkr.ecr.us-east-1.amazonaws.com/kimi-k2:latest
docker push <account-id>.dkr.ecr.us-east-1.amazonaws.com/kimi-k2:latest
Step 3: Create Kubernetes YAML Deployment Files
deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: kimi-k2
spec:
replicas: 1
selector:
matchLabels:
app: kimi-k2
template:
metadata:
labels:
app: kimi-k2
spec:
containers:
- name: kimi-k2
image: <your-ecr-repo-url>
resources:
limits:
nvidia.com/gpu: 1
ports:
- containerPort: 7860
service.yaml:
apiVersion: v1
kind: Service
metadata:
name: kimi-k2-service
spec:
type: LoadBalancer
selector:
app: kimi-k2
ports:
- protocol: TCP
port: 80
targetPort: 7860
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
Step 4: Access Kimi K2 on Public IP
kubectl get svc kimi-k2-service
Prerequisites:
AWS CLI configured
IAM roles for ECS + ECR
Docker installed
ECS Fargate or EC2 cluster created
ECR repository created
Step 1: Build Docker Image
Same steps as above. Push image to Amazon ECR.
Step 2: Create ECS Task Definition
task-definition.json:
{
"family": "kimi-k2-task",
"containerDefinitions": [
{
"name": "kimi-k2",
"image": "<account-id>.dkr.ecr.us-east-1.amazonaws.com/kimi-k2:latest",
"memory": 30720,
"cpu": 2048,
"essential": true,
"portMappings": [
{
"containerPort": 7860,
"hostPort": 7860
}
]
}
],
"requiresCompatibilities": ["EC2"],
"networkMode": "bridge",
"cpu": "2048",
"memory": "30720"
}
aws ecs register-task-definition --cli-input-json file://task-definition.json
Step 3: Run Task on ECS Cluster
aws ecs run-task \
--cluster kimi-k2-cluster \
--launch-type EC2 \
--task-definition kimi-k2-task \
--count 1
Deploying Kimi K2 on EKS or ECS gives you the power to scale open-source LLMs efficiently in the cloud.
Kubernetes allows for autoscaling, GPU scheduling and production-grade LLM APIs all while keeping you in control.
Need enterprise grade deployment or DevOps help? Contact our AI DevOps experts at OneClick IT Consultancy.
Contact Us