Kubernetes Cost Optimization: A FinOps Engineering Guide



Kubernetes makes scaling easy. Too easy. Teams often overprovision by 3-5x without realizing it. This guide shows you how to find and eliminate waste.

Cost analysis Photo by Carlos Muza on Unsplash

The Typical Kubernetes Waste Pattern

Most clusters look like this:

Requested CPU:  1000 cores
Actually Used:   200 cores  ← 80% waste

Requested RAM:  2000 GB
Actually Used:   600 GB    ← 70% waste

Why? Developers request resources based on fear, not data.

Step 1: Measure Everything

You can’t optimize what you don’t measure. Install cost visibility:

Option A: Kubecost (Open Source)

helm install kubecost cost-analyzer \
  --repo https://kubecost.github.io/cost-analyzer/ \
  --namespace kubecost --create-namespace

Option B: OpenCost (CNCF)

helm install opencost opencost \
  --repo https://opencost.github.io/opencost-helm-chart \
  --namespace opencost --create-namespace

Dashboard Photo by Luke Chesser on Unsplash

Step 2: Right-Size Requests and Limits

Find Over-Provisioned Workloads

# Using kubectl-view-allocations plugin
kubectl view-allocations -u

# Or query Prometheus directly
# CPU: requested vs used
sum(kube_pod_container_resource_requests{resource="cpu"}) by (namespace)
/
sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace)

Apply Recommendations

# Before: Developer's guess
resources:
  requests:
    cpu: "2"
    memory: "4Gi"
  limits:
    cpu: "4"
    memory: "8Gi"

# After: Based on P95 usage + 20% buffer
resources:
  requests:
    cpu: "200m"
    memory: "512Mi"
  limits:
    cpu: "500m"
    memory: "1Gi"

Automate with VPA

Vertical Pod Autoscaler adjusts resources automatically:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  updatePolicy:
    updateMode: "Auto"  # or "Off" for recommendations only
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        cpu: "50m"
        memory: "64Mi"
      maxAllowed:
        cpu: "2"
        memory: "4Gi"

Step 3: Use Spot/Preemptible Nodes

Spot instances cost 60-90% less. Use them for:

  • Stateless workloads
  • Batch jobs
  • Dev/staging environments

Node Pool Setup (GKE)

apiVersion: container.google.com/v1
kind: NodePool
spec:
  config:
    spot: true
  autoscaling:
    enabled: true
    minNodeCount: 0
    maxNodeCount: 10

Workload Tolerations

spec:
  tolerations:
  - key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
    effect: "NoSchedule"
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
          - key: "node.kubernetes.io/lifecycle"
            operator: "In"
            values: ["spot"]

Step 4: Scale Down Non-Production

Dev and staging clusters don’t need to run 24/7.

Kube-downscaler

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    # Scale to 0 outside business hours
    downscaler/downtime: "Mon-Fri 20:00-08:00 UTC, Sat-Sun 00:00-24:00 UTC"

Cluster Auto-Sleep

# Scale all deployments to 0 replicas
kubectl get deploy -A -o name | xargs -I {} kubectl scale {} --replicas=0

# Or use a CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-down-dev
spec:
  schedule: "0 20 * * 1-5"  # 8 PM weekdays
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: kubectl
            image: bitnami/kubectl
            command:
            - /bin/sh
            - -c
            - kubectl scale deploy --all --replicas=0 -n dev

Step 5: Optimize Storage

Delete Orphaned PVCs

# Find unbound PVCs
kubectl get pvc -A | grep -v Bound

# Find PVs without claims
kubectl get pv | grep Released

Use Appropriate Storage Classes

# Don't use SSD for logs
kind: PersistentVolumeClaim
spec:
  storageClassName: standard  # Not premium-ssd
  resources:
    requests:
      storage: 100Gi

Enable Storage Auto-Expansion

allowVolumeExpansion: true  # In StorageClass

Step 6: Network Cost Reduction

Cross-AZ traffic is expensive. Keep pods close:

spec:
  affinity:
    podAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: database
          topologyKey: topology.kubernetes.io/zone

Cost Monitoring Queries

Prometheus/Grafana Alerts

# Alert if namespace cost exceeds budget
- alert: NamespaceCostOverBudget
  expr: |
    sum(
      container_memory_working_set_bytes{namespace="production"} 
      * on(node) group_left() node_ram_hourly_cost
    ) > 1000  # $1000/hour threshold
  for: 1h
  labels:
    severity: warning

Daily Cost Report

-- Kubecost SQL query
SELECT 
  namespace,
  SUM(cpu_cost + ram_cost + pv_cost + network_cost) as total_cost
FROM allocations
WHERE window_start >= NOW() - INTERVAL '24 hours'
GROUP BY namespace
ORDER BY total_cost DESC
LIMIT 10;

Quick Wins Summary

ActionEffortSavings
Right-size requestsMedium20-40%
Spot nodes for statelessLow30-60%
Scale down non-prodLow50-70%
Delete orphaned resourcesLow5-10%
Pod topology awarenessMedium10-20%

Tools Comparison

ToolTypeBest For
KubecostFull platformEnterprise visibility
OpenCostOpen sourceCost allocation
GoldilocksVPA helperRight-sizing
kube-downscalerSchedulerNon-prod savings
CAST AIAutomationHands-off optimization

Action Plan

  1. Week 1: Install Kubecost/OpenCost, get baseline
  2. Week 2: Apply VPA recommendations to top 10 workloads
  3. Week 3: Add spot nodes, migrate stateless workloads
  4. Week 4: Set up non-prod downscaling
  5. Ongoing: Monthly cost reviews, budget alerts

The goal isn’t minimum cost—it’s efficient cost. Pay for what you use, and use what you pay for.

이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)