Kubernetes Cost Optimization in 2026: Karpenter, Spot Instances, and Right-Sizing at Scale
on Kubernetes, Karpenter, Finops, Cloud cost, Devops, Aws, Cost optimization
Kubernetes Cost Optimization in 2026: Karpenter, Spot Instances, and Right-Sizing at Scale
Kubernetes unlocks incredible operational power, but it also unlocks incredible waste. Teams that deploy Kubernetes often see their cloud bills double — not because the platform is expensive, but because the default configuration is designed for reliability, not efficiency.
The engineers who master Kubernetes cost optimization in 2026 treat compute as a fluid resource, not a static allocation. They use autoscaling aggressively, mix instance types intelligently, and measure utilization obsessively. Here’s how.
Photo by Luke Chesser on Unsplash
The Hidden Cost Drivers in Kubernetes Clusters
Before optimizing, understand where money actually goes:
| Category | Typical Waste | Root Cause |
|---|---|---|
| Over-provisioned requests | 40-60% of compute | Engineers pad requests “just in case” |
| Idle node capacity | 20-30% overhead | Bin-packing inefficiency |
| Always-on non-prod clusters | 30% of total spend | Clusters running nights/weekends |
| Data transfer | Often invisible | Cross-AZ communication, egress |
| Persistent volumes | Orphaned PVCs | No cleanup policy |
Most teams are wasting 50-70% of their Kubernetes compute spend. The fix is systematic.
Karpenter: The Modern Node Autoscaler
Karpenter replaced Cluster Autoscaler as the default choice for most teams by 2025. The architectural difference is fundamental:
- Cluster Autoscaler: scales node groups (pre-defined instance types, AZs)
- Karpenter: provisions individual nodes based on exact pod requirements
This means Karpenter can bin-pack pods much more efficiently and react to scheduling demands in seconds rather than minutes.
Basic Karpenter Setup
# NodePool — defines what kind of nodes Karpenter can provision
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: node.kubernetes.io/instance-type
operator: In
values:
- m5.large
- m5.xlarge
- m5a.large
- m5a.xlarge
- m6i.large
- m6i.xlarge
- m6a.large
- key: topology.kubernetes.io/zone
operator: In
values: ["us-east-1a", "us-east-1b", "us-east-1c"]
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
name: default
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s
limits:
cpu: 1000
memory: 4000Gi
# EC2NodeClass — AWS-specific configuration
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
amiSelectorTerms:
- alias: al2023@latest
role: "KarpenterNodeRole-${CLUSTER_NAME}"
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
encrypted: true
Spot-First Strategy with Fallback
The key to safe spot usage is defining fallback priority:
# High-priority NodePool: Spot first, on-demand fallback
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: spot-optimized
spec:
template:
metadata:
labels:
billing/class: "spot"
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: node.kubernetes.io/instance-type
operator: In
# Wide selection = better spot availability and lower interruption rate
values:
- m5.2xlarge
- m5a.2xlarge
- m5d.2xlarge
- m6i.2xlarge
- m6a.2xlarge
- m6id.2xlarge
- m7i.2xlarge
- m7a.2xlarge
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
name: default
weight: 100 # Prefer this pool
---
# Fallback NodePool: On-demand for when spot isn't available
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: on-demand-fallback
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: node.kubernetes.io/instance-type
operator: In
values:
- m6i.2xlarge
- m6a.2xlarge
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
name: default
weight: 1 # Only use when spot pool can't schedule
Right-Sizing: The Foundation of Cost Efficiency
Karpenter can only provision the right-sized node if your pod requests are accurate. Most production clusters have severely over-provisioned requests.
VPA (Vertical Pod Autoscaler) for Recommendations
Run VPA in “Off” mode to get recommendations without automatic changes:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Off" # Recommendation only — don't auto-apply yet
resourcePolicy:
containerPolicies:
- containerName: api-server
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4000m
memory: 8Gi
After a week of data:
kubectl describe vpa api-server-vpa
# Look for: Recommendation > Container Recommendations
# Lower Bound: minimum observed
# Target: recommended setting
# Upper Bound: maximum observed
Goldilocks: VPA Dashboard at Scale
Goldilocks runs VPA in recommendation mode across all namespaces and provides a web dashboard:
helm install goldilocks fairwinds-stable/goldilocks --namespace goldilocks --create-namespace
# Label namespaces to enable
kubectl label namespace production goldilocks.fairwinds.com/enabled=true
kubectl label namespace staging goldilocks.fairwinds.com/enabled=true
After enabling, check the dashboard to find workloads with the biggest gap between requested and actual resources.
Practical Right-Sizing Workflow
#!/bin/bash
# Get actual vs requested resources for all pods
kubectl top pods --all-namespaces --no-headers | \
awk '{print $1, $2, $3, $4}' | \
sort -k3 -rh | \
head -50
# Compare with requests:
kubectl get pods --all-namespaces -o json | \
jq -r '.items[] |
.metadata.namespace + " " +
.metadata.name + " " +
(.spec.containers[0].resources.requests.cpu // "none") + " " +
(.spec.containers[0].resources.requests.memory // "none")'
HPA with Custom Metrics: Scale on What Actually Matters
CPU-based HPA is a blunt instrument. Scale on the metric your users experience:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 50
metrics:
# Scale on request queue depth — more meaningful than CPU
- type: External
external:
metric:
name: sqs_queue_depth
selector:
matchLabels:
queue: api-requests
target:
type: AverageValue
averageValue: "30"
# Keep CPU as a secondary guard
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies:
- type: Percent
value: 25
periodSeconds: 60 # Remove at most 25% per minute
scaleUp:
stabilizationWindowSeconds: 0 # Scale up immediately
policies:
- type: Percent
value: 100
periodSeconds: 30 # Can double in 30 seconds
KEDA: Event-Driven Autoscaling
KEDA enables scaling to zero and from zero — crucial for batch workloads and non-production environments.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: batch-processor
spec:
scaleTargetRef:
name: batch-processor-deployment
minReplicaCount: 0 # Scale to zero when no messages
maxReplicaCount: 100
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/123456789/batch-jobs
queueLength: "5" # 1 replica per 5 messages
awsRegion: us-east-1
scaleOnInFlight: "true"
With KEDA + Karpenter, a batch workload that’s idle 80% of the time costs 80% less. The nodes spin down when there are no pods, and pods spin down when there are no jobs.
Cluster Schedules: Turn Off Non-Production Clusters
The single highest-ROI action for most teams:
# Using Karpenter's disruption budget + CronJobs to scale down dev clusters
---
# Scale down every weeknight at 7 PM
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-down-dev
namespace: kube-system
spec:
schedule: "0 19 * * 1-5" # 7 PM weekdays
jobTemplate:
spec:
template:
spec:
serviceAccountName: cluster-scaler
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
# Scale all deployments to 0 in dev namespace
kubectl get deployments -n dev -o name | \
xargs -I {} kubectl scale {} --replicas=0 -n dev
# Wait for pods to terminate (Karpenter will deprovision nodes)
sleep 120
echo "Dev cluster scaled down"
restartPolicy: OnFailure
---
# Scale back up at 8 AM
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-up-dev
namespace: kube-system
spec:
schedule: "0 8 * * 1-5" # 8 AM weekdays
jobTemplate:
spec:
template:
spec:
serviceAccountName: cluster-scaler
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
kubectl scale deployment --all --replicas=1 -n dev
restartPolicy: OnFailure
This alone saves 70% of dev cluster costs (off for 13 hours/day + weekends).
Cost Allocation and Showback
You can’t optimize what you can’t measure. Tag everything:
# Enforce cost tags via OPA/Gatekeeper
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: require-cost-labels
spec:
match:
kinds:
- apiGroups: ["apps"]
kinds: ["Deployment", "StatefulSet"]
namespaces:
- production
- staging
parameters:
labels:
- key: billing/team
allowedRegex: "^[a-z-]+$"
- key: billing/product
allowedRegex: "^[a-z-]+$"
- key: billing/environment
allowedRegex: "^(production|staging|dev)$"
Then use OpenCost for real-time cost allocation:
helm install opencost opencost/opencost \
--namespace opencost \
--create-namespace \
--set opencost.prometheus.internal.serviceName=prometheus \
--set opencost.prometheus.internal.namespaceName=monitoring
# Query cost by namespace
kubectl port-forward service/opencost 9003:9003 -n opencost
curl "http://localhost:9003/allocation?window=7d&aggregate=namespace"
Real-World Results
Photo by Carlos Muza on Unsplash
Teams implementing this full stack typically see:
| Initiative | Cost Reduction |
|---|---|
| Spot instances (with fallback) | 60-70% on compute |
| Right-sizing requests | 20-40% reduction |
| Scale-to-zero for batch | 70-90% on batch workloads |
| Dev cluster schedules | 65-75% on non-prod |
| Orphaned PVC cleanup | 5-15% on storage |
Combined: 50-65% total Kubernetes spend reduction without compromising production reliability.
Key Takeaways
- Karpenter provides far better bin-packing and faster response than Cluster Autoscaler
- Wide instance family selection is essential for high spot availability
- VPA + Goldilocks reveals your right-sizing opportunities in days
- KEDA enables scale-to-zero for batch and event-driven workloads
- Cluster schedules are the highest-ROI intervention for non-production
- OpenCost provides the visibility to sustain ongoing optimization
Kubernetes cost engineering is a practice, not a one-time project. The teams winning at FinOps review their cost dashboards weekly and treat efficiency as a first-class engineering concern alongside reliability and performance.
이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)
