Kubernetes Cost Optimization: Cutting Your Cloud Bill Without Cutting Corners
on Kubernetes, Cloud cost, Finops, Devops, Aws, Gke
Kubernetes Cost Optimization: Cutting Your Cloud Bill Without Cutting Corners
Kubernetes is phenomenal at abstracting away infrastructure. It’s also phenomenal at running up your cloud bill.
The average Kubernetes-powered organization wastes 30-45% of their cloud spend. Not because Kubernetes is inefficient — because teams don’t configure it well for cost. The defaults optimize for simplicity, not cost. Over-provisioned nodes, idle replicas, oversized requests, forgotten staging clusters still running on Friday afternoon — it adds up fast.
This post is the practical guide to finding and eliminating that waste.
Photo by Growtika on Unsplash
Step 1: Know Where Your Money Is Going
You can’t optimize what you can’t measure. Before changing anything, get visibility.
Kubecost (or OpenCost)
OpenCost is the CNCF-incubating open standard for Kubernetes cost measurement. Kubecost builds on it with a production-grade UI.
helm install kubecost cost-analyzer \
--repo https://kubecost.github.io/cost-analyzer/ \
--namespace kubecost --create-namespace \
--set kubecostToken="your-token"
Within hours, you’ll have cost breakdown by namespace, deployment, label, and team. The most common finding: 20% of workloads account for 80% of cost.
Cloud Native FinOps Labels
Establish a labeling taxonomy before you go further. Cost allocation is only useful if you can attribute costs to teams and products.
metadata:
labels:
team: platform-engineering
product: api-gateway
environment: production
cost-center: "1234"
Step 2: Fix Resource Requests and Limits
This is the highest-leverage change most teams can make.
The problem: Teams set requests based on peaks or guesses, not actual usage. Kubernetes schedules pods based on requests, so over-requesting means nodes fill up with “reserved” capacity that’s never actually used.
Typical findings from VPA analysis:
- CPU requests 3-5x higher than actual usage
- Memory requests 2-3x higher than P99 usage
Use Vertical Pod Autoscaler (VPA) in Recommendation Mode
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # Recommendation only — don't auto-apply yet
# See recommendations after a week of data
kubectl describe vpa my-app-vpa
# Output shows:
# Container: app
# Recommended:
# Lower Bound: cpu: 50m, memory: 128Mi
# Target: cpu: 100m, memory: 256Mi
# Upper Bound: cpu: 500m, memory: 512Mi
#
# Your current request: cpu: 2000m, memory: 2Gi ← 20x over-provisioned
Right-size based on recommendations. This alone commonly reduces cluster resource consumption by 40-60%.
Step 3: Horizontal Pod Autoscaler Tuning
The default HPA scales on CPU utilization at 50% target. For most applications, this is too conservative.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Increased from default 50
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Don't scale down too aggressively
policies:
- type: Percent
value: 50
periodSeconds: 60
For web services with predictable traffic patterns, consider KEDA (Kubernetes Event Driven Autoscaling) for scaling based on queue depth, request rate, or custom metrics rather than just CPU.
Step 4: Cluster Autoscaler and Node Selection
Use Spot/Preemptible Instances
Spot instances (AWS), Preemptible VMs (GCP), and Spot VMs (Azure) offer 60-90% discount over on-demand pricing. The catch: they can be terminated with ~2 minutes notice.
Architecture for spot tolerance:
- Run stateless workloads on spot nodes
- Graceful shutdown handling with
preStophooks andterminationGracePeriodSeconds - Never run databases or stateful workloads on spot
# Node selector for spot instances
spec:
nodeSelector:
node.kubernetes.io/lifecycle: spot
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
For AWS: Karpenter has replaced the Cluster Autoscaler as the recommendation for node provisioning. It provisions nodes in seconds (not minutes), selects optimal instance types automatically, and handles spot fallback gracefully.
# Karpenter NodePool — automatically chooses cheapest compatible instance
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 1m
ARM64 / Graviton Instances
AWS Graviton (arm64) instances offer 20-40% better price/performance than x86 equivalents. Most containerized workloads run on arm64 without code changes.
spec:
nodeSelector:
kubernetes.io/arch: arm64
Step 5: Namespace-Level Policies
Resource Quotas
Prevent any single team from accidentally consuming unbounded cluster resources.
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-alpha
spec:
hard:
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
limits.memory: 80Gi
count/pods: "50"
LimitRange
Set default requests/limits for pods that don’t specify them. This prevents un-configured pods from consuming unlimited resources.
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: team-alpha
spec:
limits:
- default:
cpu: "500m"
memory: "256Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
type: Container
Step 6: Kill Idle Resources
Environments that run 24/7 when they should run 8/5:
- Dev clusters running on weekends
- Staging environments idle overnight
# Scale down non-production workloads outside business hours
# Using Kubernetes CronJob or external scheduler
# Scale down
kubectl scale deployment --all --replicas=0 -n staging
# Scale up
kubectl scale deployment --all --replicas=1 -n staging
Or use Kube Downscaler — annotate deployments with their schedule:
metadata:
annotations:
downscaler/uptime: "Mon-Fri 08:00-20:00 Asia/Seoul"
Step 7: Storage Cost Reduction
Persistent Volume claims are often forgotten. Teams delete applications but leave PVCs (and their associated cloud disks) running.
# Find unbound PVCs — likely orphaned
kubectl get pvc --all-namespaces | grep -v Bound
# Find PVs in Released state (disk exists but no claim)
kubectl get pv | grep Released
Set up regular audits. In large clusters, orphaned storage can add up to hundreds of dollars monthly.
The Cost Optimization Roadmap
| Priority | Action | Typical Savings |
|---|---|---|
| 1 | Right-size resource requests (VPA) | 30-50% |
| 2 | Enable spot/preemptible instances | 40-70% of node costs |
| 3 | Fix HPA targets | 10-20% |
| 4 | Kill idle environments (nights/weekends) | 30-60% on non-prod |
| 5 | ARM64 migration | 20-40% on affected nodes |
| 6 | Audit orphaned storage | 5-15% |
Conclusion
Kubernetes cost optimization isn’t one big fix — it’s a collection of incremental improvements that compound. Teams that implement these strategies systematically typically reduce cloud spend by 35-55% without impacting reliability.
The foundational principle: measure first. Kubecost or OpenCost will show you exactly where money is going. From there, the optimizations follow logically.
The cloud bill doesn’t have to be a mystery. Make it a managed metric like any other engineering KPI.
References:
이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)
