Kubernetes Cost Optimization in 2026: Karpenter, Spot Instances, and Right-Sizing at Scale



Kubernetes Cost Optimization in 2026: Karpenter, Spot Instances, and Right-Sizing at Scale

Kubernetes unlocks incredible operational power, but it also unlocks incredible waste. Teams that deploy Kubernetes often see their cloud bills double — not because the platform is expensive, but because the default configuration is designed for reliability, not efficiency.

The engineers who master Kubernetes cost optimization in 2026 treat compute as a fluid resource, not a static allocation. They use autoscaling aggressively, mix instance types intelligently, and measure utilization obsessively. Here’s how.

Data center servers with glowing lights representing cloud infrastructure and compute resources Photo by Luke Chesser on Unsplash


The Hidden Cost Drivers in Kubernetes Clusters

Before optimizing, understand where money actually goes:

CategoryTypical WasteRoot Cause
Over-provisioned requests40-60% of computeEngineers pad requests “just in case”
Idle node capacity20-30% overheadBin-packing inefficiency
Always-on non-prod clusters30% of total spendClusters running nights/weekends
Data transferOften invisibleCross-AZ communication, egress
Persistent volumesOrphaned PVCsNo cleanup policy

Most teams are wasting 50-70% of their Kubernetes compute spend. The fix is systematic.


Karpenter: The Modern Node Autoscaler

Karpenter replaced Cluster Autoscaler as the default choice for most teams by 2025. The architectural difference is fundamental:

  • Cluster Autoscaler: scales node groups (pre-defined instance types, AZs)
  • Karpenter: provisions individual nodes based on exact pod requirements

This means Karpenter can bin-pack pods much more efficiently and react to scheduling demands in seconds rather than minutes.

Basic Karpenter Setup

# NodePool — defines what kind of nodes Karpenter can provision
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: 
            - m5.large
            - m5.xlarge
            - m5a.large
            - m5a.xlarge
            - m6i.large
            - m6i.xlarge
            - m6a.large
        - key: topology.kubernetes.io/zone
          operator: In
          values: ["us-east-1a", "us-east-1b", "us-east-1c"]
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1
        kind: EC2NodeClass
        name: default
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s
  limits:
    cpu: 1000
    memory: 4000Gi
# EC2NodeClass — AWS-specific configuration
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiSelectorTerms:
    - alias: al2023@latest
  role: "KarpenterNodeRole-${CLUSTER_NAME}"
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        encrypted: true

Spot-First Strategy with Fallback

The key to safe spot usage is defining fallback priority:

# High-priority NodePool: Spot first, on-demand fallback
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-optimized
spec:
  template:
    metadata:
      labels:
        billing/class: "spot"
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: node.kubernetes.io/instance-type
          operator: In
          # Wide selection = better spot availability and lower interruption rate
          values:
            - m5.2xlarge
            - m5a.2xlarge
            - m5d.2xlarge
            - m6i.2xlarge
            - m6a.2xlarge
            - m6id.2xlarge
            - m7i.2xlarge
            - m7a.2xlarge
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1
        kind: EC2NodeClass
        name: default
  weight: 100  # Prefer this pool

---
# Fallback NodePool: On-demand for when spot isn't available
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: on-demand-fallback
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - m6i.2xlarge
            - m6a.2xlarge
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1
        kind: EC2NodeClass
        name: default
  weight: 1  # Only use when spot pool can't schedule

Right-Sizing: The Foundation of Cost Efficiency

Karpenter can only provision the right-sized node if your pod requests are accurate. Most production clusters have severely over-provisioned requests.

VPA (Vertical Pod Autoscaler) for Recommendations

Run VPA in “Off” mode to get recommendations without automatic changes:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"  # Recommendation only — don't auto-apply yet
  resourcePolicy:
    containerPolicies:
      - containerName: api-server
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 4000m
          memory: 8Gi

After a week of data:

kubectl describe vpa api-server-vpa
# Look for: Recommendation > Container Recommendations
# Lower Bound: minimum observed
# Target: recommended setting  
# Upper Bound: maximum observed

Goldilocks: VPA Dashboard at Scale

Goldilocks runs VPA in recommendation mode across all namespaces and provides a web dashboard:

helm install goldilocks fairwinds-stable/goldilocks --namespace goldilocks --create-namespace

# Label namespaces to enable
kubectl label namespace production goldilocks.fairwinds.com/enabled=true
kubectl label namespace staging goldilocks.fairwinds.com/enabled=true

After enabling, check the dashboard to find workloads with the biggest gap between requested and actual resources.

Practical Right-Sizing Workflow

#!/bin/bash
# Get actual vs requested resources for all pods
kubectl top pods --all-namespaces --no-headers | \
  awk '{print $1, $2, $3, $4}' | \
  sort -k3 -rh | \
  head -50

# Compare with requests:
kubectl get pods --all-namespaces -o json | \
  jq -r '.items[] | 
    .metadata.namespace + " " + 
    .metadata.name + " " + 
    (.spec.containers[0].resources.requests.cpu // "none") + " " + 
    (.spec.containers[0].resources.requests.memory // "none")'

HPA with Custom Metrics: Scale on What Actually Matters

CPU-based HPA is a blunt instrument. Scale on the metric your users experience:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 50
  metrics:
    # Scale on request queue depth — more meaningful than CPU
    - type: External
      external:
        metric:
          name: sqs_queue_depth
          selector:
            matchLabels:
              queue: api-requests
        target:
          type: AverageValue
          averageValue: "30"
    # Keep CPU as a secondary guard
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
        - type: Percent
          value: 25
          periodSeconds: 60  # Remove at most 25% per minute
    scaleUp:
      stabilizationWindowSeconds: 0  # Scale up immediately
      policies:
        - type: Percent
          value: 100
          periodSeconds: 30  # Can double in 30 seconds

KEDA: Event-Driven Autoscaling

KEDA enables scaling to zero and from zero — crucial for batch workloads and non-production environments.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: batch-processor
spec:
  scaleTargetRef:
    name: batch-processor-deployment
  minReplicaCount: 0  # Scale to zero when no messages
  maxReplicaCount: 100
  triggers:
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.us-east-1.amazonaws.com/123456789/batch-jobs
        queueLength: "5"  # 1 replica per 5 messages
        awsRegion: us-east-1
        scaleOnInFlight: "true"

With KEDA + Karpenter, a batch workload that’s idle 80% of the time costs 80% less. The nodes spin down when there are no pods, and pods spin down when there are no jobs.


Cluster Schedules: Turn Off Non-Production Clusters

The single highest-ROI action for most teams:

# Using Karpenter's disruption budget + CronJobs to scale down dev clusters
---
# Scale down every weeknight at 7 PM
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-down-dev
  namespace: kube-system
spec:
  schedule: "0 19 * * 1-5"  # 7 PM weekdays
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: cluster-scaler
          containers:
            - name: kubectl
              image: bitnami/kubectl:latest
              command:
                - /bin/sh
                - -c
                - |
                  # Scale all deployments to 0 in dev namespace
                  kubectl get deployments -n dev -o name | \
                    xargs -I {} kubectl scale {} --replicas=0 -n dev
                  
                  # Wait for pods to terminate (Karpenter will deprovision nodes)
                  sleep 120
                  echo "Dev cluster scaled down"
          restartPolicy: OnFailure
---
# Scale back up at 8 AM
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-up-dev
  namespace: kube-system
spec:
  schedule: "0 8 * * 1-5"  # 8 AM weekdays
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: cluster-scaler
          containers:
            - name: kubectl
              image: bitnami/kubectl:latest
              command:
                - /bin/sh
                - -c
                - |
                  kubectl scale deployment --all --replicas=1 -n dev
          restartPolicy: OnFailure

This alone saves 70% of dev cluster costs (off for 13 hours/day + weekends).


Cost Allocation and Showback

You can’t optimize what you can’t measure. Tag everything:

# Enforce cost tags via OPA/Gatekeeper
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: require-cost-labels
spec:
  match:
    kinds:
      - apiGroups: ["apps"]
        kinds: ["Deployment", "StatefulSet"]
    namespaces:
      - production
      - staging
  parameters:
    labels:
      - key: billing/team
        allowedRegex: "^[a-z-]+$"
      - key: billing/product
        allowedRegex: "^[a-z-]+$"
      - key: billing/environment
        allowedRegex: "^(production|staging|dev)$"

Then use OpenCost for real-time cost allocation:

helm install opencost opencost/opencost \
  --namespace opencost \
  --create-namespace \
  --set opencost.prometheus.internal.serviceName=prometheus \
  --set opencost.prometheus.internal.namespaceName=monitoring

# Query cost by namespace
kubectl port-forward service/opencost 9003:9003 -n opencost
curl "http://localhost:9003/allocation?window=7d&aggregate=namespace"

Real-World Results

Bar chart showing cloud cost reduction from Kubernetes optimization initiatives Photo by Carlos Muza on Unsplash

Teams implementing this full stack typically see:

InitiativeCost Reduction
Spot instances (with fallback)60-70% on compute
Right-sizing requests20-40% reduction
Scale-to-zero for batch70-90% on batch workloads
Dev cluster schedules65-75% on non-prod
Orphaned PVC cleanup5-15% on storage

Combined: 50-65% total Kubernetes spend reduction without compromising production reliability.


Key Takeaways

  • Karpenter provides far better bin-packing and faster response than Cluster Autoscaler
  • Wide instance family selection is essential for high spot availability
  • VPA + Goldilocks reveals your right-sizing opportunities in days
  • KEDA enables scale-to-zero for batch and event-driven workloads
  • Cluster schedules are the highest-ROI intervention for non-production
  • OpenCost provides the visibility to sustain ongoing optimization

Kubernetes cost engineering is a practice, not a one-time project. The teams winning at FinOps review their cost dashboards weekly and treat efficiency as a first-class engineering concern alongside reliability and performance.

이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)