Prerequisites:
AWS account
IAM role with EKS and EC2 permissions
GPU-enabled EC2 instance types (e.g., p3.2xlarge)
Kubectl, eksctl, Helm installed
Docker & Git installed locally
Step 1: Create an EKS Cluster with GPU Nodes
eksctl create cluster \ --name kimi-k2-cluster \ --region us-east-1 \ --nodegroup-name gpu-nodes \ --node-type p3.2xlarge \ --nodes 2 \ --nodes-min 1 \ --nodes-max 3 \ --managed
Step 2: Build and Push Docker Image for Kimi K2
git clone https://huggingface.co/moonshotai/Kimi-K2-Instructcd Kimi-K2-Instruct
Dockerfile:
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04RUN apt update && apt install -y python3 python3-pip gitRUN pip3 install torch torchvision transformers accelerate huggingface_hubWORKDIR /appCOPY . .CMD ["python3", "app.py"]aws ecr create-repository --repository-name kimi-k2$(aws ecr get-login --no-include-email --region us-east-1)docker build -t kimi-k2 .docker tag kimi-k2:latest <account-id>.dkr.ecr.us-east-1.amazonaws.com/kimi-k2:latestdocker push <account-id>.dkr.ecr.us-east-1.amazonaws.com/kimi-k2:latest
Step 3: Create Kubernetes YAML Deployment Files
deployment.yaml:
apiVersion: apps/v1kind: Deploymentmetadata: name: kimi-k2spec: replicas: 1 selector: matchLabels: app: kimi-k2 template: metadata: labels: app: kimi-k2 spec: containers: - name: kimi-k2 image: <your-ecr-repo-url> resources: limits: nvidia.com/gpu: 1 ports: - containerPort: 7860
service.yaml:
apiVersion: v1kind: Servicemetadata: name: kimi-k2-servicespec: type: LoadBalancer selector: app: kimi-k2 ports: - protocol: TCP port: 80 targetPort: 7860kubectl apply -f deployment.yamlkubectl apply -f service.yaml
Step 4: Access Kimi K2 on Public IP
kubectl get svc kimi-k2-service
Prerequisites:
AWS CLI configured
IAM roles for ECS + ECR
Docker installed
ECS Fargate or EC2 cluster created
ECR repository created
Step 1: Build Docker Image
Same steps as above. Push image to Amazon ECR.
Step 2: Create ECS Task Definition
task-definition.json:
{ "family": "kimi-k2-task", "containerDefinitions": [ { "name": "kimi-k2", "image": "<account-id>.dkr.ecr.us-east-1.amazonaws.com/kimi-k2:latest", "memory": 30720, "cpu": 2048, "essential": true, "portMappings": [ { "containerPort": 7860, "hostPort": 7860 } ] } ], "requiresCompatibilities": ["EC2"], "networkMode": "bridge", "cpu": "2048", "memory": "30720"}aws ecs register-task-definition --cli-input-json file://task-definition.json
Step 3: Run Task on ECS Cluster
aws ecs run-task \ --cluster kimi-k2-cluster \ --launch-type EC2 \ --task-definition kimi-k2-task \ --count 1Deploying Kimi K2 on EKS or ECS gives you the power to scale open-source LLMs efficiently in the cloud.
Kubernetes allows for autoscaling, GPU scheduling and production-grade LLM APIs all while keeping you in control.
Need enterprise grade deployment or DevOps help? Contact our AI DevOps experts at OneClick IT Consultancy.
Contact Us