Kubernetes FinOps: Slash Your Cloud Bill by 40% with Smart Resource Management

Kubernetes makes it easy to deploy applications—perhaps too easy. Without proper governance, your cloud bill can spiral out of control. This guide covers battle-tested strategies for optimizing Kubernetes costs while maintaining the performance and reliability your applications need.

Cloud Computing Photo by Pero Kalimero on Unsplash

The Cost Visibility Problem

Most teams don’t know what they’re spending on Kubernetes. The first step is gaining visibility.

Implementing Cost Allocation

# Require cost-allocation labels on all deployments
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: cost-labels-validator
webhooks:
  - name: validate-cost-labels.example.com
    rules:
      - operations: ["CREATE", "UPDATE"]
        apiGroups: ["apps"]
        apiVersions: ["v1"]
        resources: ["deployments"]
    clientConfig:
      service:
        name: cost-label-validator
        namespace: kube-system
        path: "/validate"

Required labels for every workload:

metadata:
  labels:
    cost-center: "engineering"
    team: "platform"
    environment: "production"
    app: "payment-service"

Cost Monitoring Stack

Deploy a cost monitoring solution:

apiVersion: helm.sh/v3
kind: HelmRelease
metadata:
  name: kubecost
  namespace: monitoring
spec:
  chart:
    repository: https://kubecost.github.io/cost-analyzer/
    name: cost-analyzer
    version: 1.106.0
  values:
    prometheus:
      enabled: true
    grafana:
      enabled: true
    networkCosts:
      enabled: true

Right-Sizing Workloads

The biggest waste comes from over-provisioned resources.

Vertical Pod Autoscaler (VPA)

Let VPA recommend optimal resource settings:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: api
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 2
          memory: 4Gi
        controlledResources: ["cpu", "memory"]

Analyzing Resource Waste

Query Prometheus for underutilized pods:

# CPU utilization below 20% for 24 hours
avg_over_time(
  (
    rate(container_cpu_usage_seconds_total{container!=""}[5m]) /
    on(pod, container) container_spec_cpu_quota / 100000
  )[24h:5m]
) < 0.2

# Memory utilization below 30%
avg_over_time(
  (
    container_memory_working_set_bytes{container!=""} /
    on(pod, container) container_spec_memory_limit_bytes
  )[24h:5m]
) < 0.3

Data Center Photo by Taylor Vick on Unsplash

Smart Scaling Strategies

Horizontal Pod Autoscaler with Custom Metrics

Scale based on business metrics, not just CPU:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 50
  metrics:
    - type: Pods
      pods:
        metric:
          name: requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"
    - type: External
      external:
        metric:
          name: queue_messages_ready
          selector:
            matchLabels:
              queue: orders
        target:
          type: AverageValue
          averageValue: "30"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60

KEDA for Event-Driven Scaling

Scale to zero for batch workloads:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: batch-processor
spec:
  scaleTargetRef:
    name: batch-processor
  minReplicaCount: 0  # Scale to zero!
  maxReplicaCount: 100
  cooldownPeriod: 300
  triggers:
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.us-east-1.amazonaws.com/123456789/orders
        queueLength: "5"
        awsRegion: us-east-1

Node Optimization

Spot/Preemptible Instances

Use spot instances for fault-tolerant workloads:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: spot-provisioner
spec:
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["spot"]
    - key: kubernetes.io/arch
      operator: In
      values: ["amd64"]
    - key: node.kubernetes.io/instance-type
      operator: In
      values: ["m5.large", "m5.xlarge", "m5a.large", "m5a.xlarge"]
  limits:
    resources:
      cpu: 1000
  providerRef:
    name: default
  ttlSecondsAfterEmpty: 30

Configure workloads for spot tolerance:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: worker
spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: karpenter.sh/capacity-type
                    operator: In
                    values: ["spot"]
      tolerations:
        - key: "karpenter.sh/spot"
          operator: "Exists"
      terminationGracePeriodSeconds: 30
      containers:
        - name: worker
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 20"]

Node Consolidation

Karpenter automatically consolidates underutilized nodes:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  consolidation:
    enabled: true
  ttlSecondsUntilExpired: 604800  # 7 days - force refresh for patches

Namespace Resource Quotas

Prevent runaway costs with quotas:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-budget
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "50"
    requests.memory: 100Gi
    limits.cpu: "100"
    limits.memory: 200Gi
    persistentvolumeclaims: "20"
    services.loadbalancers: "5"

Limit ranges for individual pods:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-alpha
spec:
  limits:
    - type: Container
      default:
        cpu: "500m"
        memory: "512Mi"
      defaultRequest:
        cpu: "100m"
        memory: "128Mi"
      max:
        cpu: "4"
        memory: "8Gi"

Storage Cost Optimization

Tiered Storage Classes

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: hot-storage
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cold-storage
provisioner: ebs.csi.aws.com
parameters:
  type: sc1
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Automated PVC Cleanup

apiVersion: batch/v1
kind: CronJob
metadata:
  name: pvc-cleanup
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: cleanup
              image: bitnami/kubectl
              command:
                - /bin/sh
                - -c
                - |
                  # Find PVCs not bound to any pod for 7+ days
                  kubectl get pvc --all-namespaces -o json | \
                  jq -r '.items[] | select(.metadata.annotations["last-used"] < (now - 604800)) | .metadata.name' | \
                  xargs -I {} kubectl delete pvc {}
          restartPolicy: OnFailure

Cost Governance Automation

Budget Alerts with Prometheus

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: cost-alerts
spec:
  groups:
    - name: cost-alerts
      rules:
        - alert: NamespaceBudgetExceeded
          expr: |
            sum by (namespace) (
              container_memory_working_set_bytes * on(node) group_left() node_ram_hourly_cost
              + rate(container_cpu_usage_seconds_total[1h]) * 3600 * on(node) group_left() node_cpu_hourly_cost
            ) > 1000
          for: 1h
          labels:
            severity: warning
          annotations:
            summary: "Namespace  exceeding budget"

Quick Wins Checklist

Action	Potential Savings	Effort
Enable VPA recommendations	20-30%	Low
Use spot instances for non-critical	60-70% on those nodes	Medium
Right-size based on actual usage	30-40%	Medium
Scale to zero for dev/staging	50-70% for those envs	Low
Implement resource quotas	Prevents overruns	Low
Delete unused PVCs	5-10%	Low
Use ARM instances where possible	20-30%	Medium

Conclusion

Kubernetes cost optimization isn’t a one-time effort—it’s an ongoing practice. Start with visibility, implement guardrails, and continuously right-size your workloads. The 40% savings target is achievable for most organizations with consistent effort.

What cost optimization strategies have worked for your team? Share your experiences in the comments.

이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)