Kubernetes FinOps: Slash Your Cloud Bill by 40% with Smart Resource Management
on Kubernetes, Finops, Cloud cost, Devops, Cost optimization, Cloud native
Kubernetes FinOps: Slash Your Cloud Bill by 40% with Smart Resource Management
Kubernetes makes it easy to deploy applications—perhaps too easy. Without proper governance, your cloud bill can spiral out of control. This guide covers battle-tested strategies for optimizing Kubernetes costs while maintaining the performance and reliability your applications need.
Photo by Pero Kalimero on Unsplash
The Cost Visibility Problem
Most teams don’t know what they’re spending on Kubernetes. The first step is gaining visibility.
Implementing Cost Allocation
# Require cost-allocation labels on all deployments
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: cost-labels-validator
webhooks:
- name: validate-cost-labels.example.com
rules:
- operations: ["CREATE", "UPDATE"]
apiGroups: ["apps"]
apiVersions: ["v1"]
resources: ["deployments"]
clientConfig:
service:
name: cost-label-validator
namespace: kube-system
path: "/validate"
Required labels for every workload:
metadata:
labels:
cost-center: "engineering"
team: "platform"
environment: "production"
app: "payment-service"
Cost Monitoring Stack
Deploy a cost monitoring solution:
apiVersion: helm.sh/v3
kind: HelmRelease
metadata:
name: kubecost
namespace: monitoring
spec:
chart:
repository: https://kubecost.github.io/cost-analyzer/
name: cost-analyzer
version: 1.106.0
values:
prometheus:
enabled: true
grafana:
enabled: true
networkCosts:
enabled: true
Right-Sizing Workloads
The biggest waste comes from over-provisioned resources.
Vertical Pod Autoscaler (VPA)
Let VPA recommend optimal resource settings:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 4Gi
controlledResources: ["cpu", "memory"]
Analyzing Resource Waste
Query Prometheus for underutilized pods:
# CPU utilization below 20% for 24 hours
avg_over_time(
(
rate(container_cpu_usage_seconds_total{container!=""}[5m]) /
on(pod, container) container_spec_cpu_quota / 100000
)[24h:5m]
) < 0.2
# Memory utilization below 30%
avg_over_time(
(
container_memory_working_set_bytes{container!=""} /
on(pod, container) container_spec_memory_limit_bytes
)[24h:5m]
) < 0.3
Photo by Taylor Vick on Unsplash
Smart Scaling Strategies
Horizontal Pod Autoscaler with Custom Metrics
Scale based on business metrics, not just CPU:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 50
metrics:
- type: Pods
pods:
metric:
name: requests_per_second
target:
type: AverageValue
averageValue: "1000"
- type: External
external:
metric:
name: queue_messages_ready
selector:
matchLabels:
queue: orders
target:
type: AverageValue
averageValue: "30"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
KEDA for Event-Driven Scaling
Scale to zero for batch workloads:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: batch-processor
spec:
scaleTargetRef:
name: batch-processor
minReplicaCount: 0 # Scale to zero!
maxReplicaCount: 100
cooldownPeriod: 300
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/123456789/orders
queueLength: "5"
awsRegion: us-east-1
Node Optimization
Spot/Preemptible Instances
Use spot instances for fault-tolerant workloads:
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: spot-provisioner
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["m5.large", "m5.xlarge", "m5a.large", "m5a.xlarge"]
limits:
resources:
cpu: 1000
providerRef:
name: default
ttlSecondsAfterEmpty: 30
Configure workloads for spot tolerance:
apiVersion: apps/v1
kind: Deployment
metadata:
name: worker
spec:
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
tolerations:
- key: "karpenter.sh/spot"
operator: "Exists"
terminationGracePeriodSeconds: 30
containers:
- name: worker
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 20"]
Node Consolidation
Karpenter automatically consolidates underutilized nodes:
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
consolidation:
enabled: true
ttlSecondsUntilExpired: 604800 # 7 days - force refresh for patches
Namespace Resource Quotas
Prevent runaway costs with quotas:
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-budget
namespace: team-alpha
spec:
hard:
requests.cpu: "50"
requests.memory: 100Gi
limits.cpu: "100"
limits.memory: 200Gi
persistentvolumeclaims: "20"
services.loadbalancers: "5"
Limit ranges for individual pods:
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: team-alpha
spec:
limits:
- type: Container
default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
max:
cpu: "4"
memory: "8Gi"
Storage Cost Optimization
Tiered Storage Classes
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: hot-storage
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "3000"
throughput: "125"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: cold-storage
provisioner: ebs.csi.aws.com
parameters:
type: sc1
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
Automated PVC Cleanup
apiVersion: batch/v1
kind: CronJob
metadata:
name: pvc-cleanup
spec:
schedule: "0 2 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: cleanup
image: bitnami/kubectl
command:
- /bin/sh
- -c
- |
# Find PVCs not bound to any pod for 7+ days
kubectl get pvc --all-namespaces -o json | \
jq -r '.items[] | select(.metadata.annotations["last-used"] < (now - 604800)) | .metadata.name' | \
xargs -I {} kubectl delete pvc {}
restartPolicy: OnFailure
Cost Governance Automation
Budget Alerts with Prometheus
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: cost-alerts
spec:
groups:
- name: cost-alerts
rules:
- alert: NamespaceBudgetExceeded
expr: |
sum by (namespace) (
container_memory_working_set_bytes * on(node) group_left() node_ram_hourly_cost
+ rate(container_cpu_usage_seconds_total[1h]) * 3600 * on(node) group_left() node_cpu_hourly_cost
) > 1000
for: 1h
labels:
severity: warning
annotations:
summary: "Namespace exceeding budget"
Quick Wins Checklist
| Action | Potential Savings | Effort |
|---|---|---|
| Enable VPA recommendations | 20-30% | Low |
| Use spot instances for non-critical | 60-70% on those nodes | Medium |
| Right-size based on actual usage | 30-40% | Medium |
| Scale to zero for dev/staging | 50-70% for those envs | Low |
| Implement resource quotas | Prevents overruns | Low |
| Delete unused PVCs | 5-10% | Low |
| Use ARM instances where possible | 20-30% | Medium |
Conclusion
Kubernetes cost optimization isn’t a one-time effort—it’s an ongoing practice. Start with visibility, implement guardrails, and continuously right-size your workloads. The 40% savings target is achievable for most organizations with consistent effort.
What cost optimization strategies have worked for your team? Share your experiences in the comments.
이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)
