Kubernetes Cost Optimization: A FinOps Engineering Guide

Kubernetes makes scaling easy. Too easy. Teams often overprovision by 3-5x without realizing it. This guide shows you how to find and eliminate waste.

Cost analysis Photo by Carlos Muza on Unsplash

The Typical Kubernetes Waste Pattern

Most clusters look like this:

Requested CPU:  1000 cores
Actually Used:   200 cores  ← 80% waste

Requested RAM:  2000 GB
Actually Used:   600 GB    ← 70% waste

Why? Developers request resources based on fear, not data.

Step 1: Measure Everything

You can’t optimize what you don’t measure. Install cost visibility:

Option A: Kubecost (Open Source)

helm install kubecost cost-analyzer \
  --repo https://kubecost.github.io/cost-analyzer/ \
  --namespace kubecost --create-namespace

Option B: OpenCost (CNCF)

helm install opencost opencost \
  --repo https://opencost.github.io/opencost-helm-chart \
  --namespace opencost --create-namespace

Dashboard Photo by Luke Chesser on Unsplash

Step 2: Right-Size Requests and Limits

Find Over-Provisioned Workloads

# Using kubectl-view-allocations plugin
kubectl view-allocations -u

# Or query Prometheus directly
# CPU: requested vs used
sum(kube_pod_container_resource_requests{resource="cpu"}) by (namespace)
/
sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace)

Apply Recommendations

# Before: Developer's guess
resources:
  requests:
    cpu: "2"
    memory: "4Gi"
  limits:
    cpu: "4"
    memory: "8Gi"

# After: Based on P95 usage + 20% buffer
resources:
  requests:
    cpu: "200m"
    memory: "512Mi"
  limits:
    cpu: "500m"
    memory: "1Gi"

Automate with VPA

Vertical Pod Autoscaler adjusts resources automatically:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  updatePolicy:
    updateMode: "Auto"  # or "Off" for recommendations only
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        cpu: "50m"
        memory: "64Mi"
      maxAllowed:
        cpu: "2"
        memory: "4Gi"

Step 3: Use Spot/Preemptible Nodes

Spot instances cost 60-90% less. Use them for:

Stateless workloads
Batch jobs
Dev/staging environments

Node Pool Setup (GKE)

apiVersion: container.google.com/v1
kind: NodePool
spec:
  config:
    spot: true
  autoscaling:
    enabled: true
    minNodeCount: 0
    maxNodeCount: 10

Workload Tolerations

spec:
  tolerations:
  - key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
    effect: "NoSchedule"
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
          - key: "node.kubernetes.io/lifecycle"
            operator: "In"
            values: ["spot"]

Step 4: Scale Down Non-Production

Dev and staging clusters don’t need to run 24/7.

Kube-downscaler

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    # Scale to 0 outside business hours
    downscaler/downtime: "Mon-Fri 20:00-08:00 UTC, Sat-Sun 00:00-24:00 UTC"

Cluster Auto-Sleep

# Scale all deployments to 0 replicas
kubectl get deploy -A -o name | xargs -I {} kubectl scale {} --replicas=0

# Or use a CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-down-dev
spec:
  schedule: "0 20 * * 1-5"  # 8 PM weekdays
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: kubectl
            image: bitnami/kubectl
            command:
            - /bin/sh
            - -c
            - kubectl scale deploy --all --replicas=0 -n dev

Step 5: Optimize Storage

Delete Orphaned PVCs

# Find unbound PVCs
kubectl get pvc -A | grep -v Bound

# Find PVs without claims
kubectl get pv | grep Released

Use Appropriate Storage Classes

# Don't use SSD for logs
kind: PersistentVolumeClaim
spec:
  storageClassName: standard  # Not premium-ssd
  resources:
    requests:
      storage: 100Gi

Enable Storage Auto-Expansion

allowVolumeExpansion: true  # In StorageClass

Step 6: Network Cost Reduction

Cross-AZ traffic is expensive. Keep pods close:

spec:
  affinity:
    podAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: database
          topologyKey: topology.kubernetes.io/zone

Cost Monitoring Queries

Prometheus/Grafana Alerts

# Alert if namespace cost exceeds budget
- alert: NamespaceCostOverBudget
  expr: |
    sum(
      container_memory_working_set_bytes{namespace="production"} 
      * on(node) group_left() node_ram_hourly_cost
    ) > 1000  # $1000/hour threshold
  for: 1h
  labels:
    severity: warning

Daily Cost Report

-- Kubecost SQL query
SELECT 
  namespace,
  SUM(cpu_cost + ram_cost + pv_cost + network_cost) as total_cost
FROM allocations
WHERE window_start >= NOW() - INTERVAL '24 hours'
GROUP BY namespace
ORDER BY total_cost DESC
LIMIT 10;

Quick Wins Summary

Action	Effort	Savings
Right-size requests	Medium	20-40%
Spot nodes for stateless	Low	30-60%
Scale down non-prod	Low	50-70%
Delete orphaned resources	Low	5-10%
Pod topology awareness	Medium	10-20%

Tools Comparison

Tool	Type	Best For
Kubecost	Full platform	Enterprise visibility
OpenCost	Open source	Cost allocation
Goldilocks	VPA helper	Right-sizing
kube-downscaler	Scheduler	Non-prod savings
CAST AI	Automation	Hands-off optimization

Action Plan

Week 1: Install Kubecost/OpenCost, get baseline
Week 2: Apply VPA recommendations to top 10 workloads
Week 3: Add spot nodes, migrate stateless workloads
Week 4: Set up non-prod downscaling
Ongoing: Monthly cost reviews, budget alerts

The goal isn’t minimum cost—it’s efficient cost. Pay for what you use, and use what you pay for.

이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)