Kubernetes 2026: Why FinOps and Cost Observability Are Now First-Class Concerns
on Kubernetes, Finops, Cloud, Devops, Cost optimization
There’s a dirty secret about Kubernetes that nobody talks about at KubeCon: most clusters are wildly over-provisioned, and most engineering teams have no idea what their workloads actually cost.
This isn’t a new problem. But in 2026, with cloud costs now regularly appearing as a line item in board-level conversations, FinOps has moved from a “nice to have” practice into a first-class engineering concern. And Kubernetes — which abstracts away so much of the infrastructure layer — has become the primary battleground for cloud cost control.
Photo by Kvistholt Photography on Unsplash
The Problem: Kubernetes Is Too Good at Hiding Costs
Kubernetes was designed to make it easy to deploy workloads without thinking about the underlying machines. Developers specify resources.requests and resources.limits, the scheduler figures out where to place pods, and nobody thinks much more about it.
This abstraction is wonderful for developer productivity. It’s terrible for cost awareness.
In practice, what happens is:
- Teams set conservative resource requests (often copy-pasted from Stack Overflow)
- Actual utilization runs at 15-30% of requested resources
- The cluster needs more nodes to fit all those over-provisioned pods
- Costs spiral while dashboards show “healthy” utilization
A 2025 CNCF survey found that 67% of Kubernetes users had no automated mechanism for detecting over-provisioned workloads. The average cluster wastes between 40-60% of its allocated compute.
The FinOps Framework Applied to Kubernetes
The FinOps Foundation’s framework — Inform, Optimize, Operate — maps surprisingly well to Kubernetes cost management.
Phase 1: Inform (You Can’t Optimize What You Can’t See)
Before cutting costs, you need visibility. The tooling here has matured significantly:
Kubecost remains the dominant open-source solution. Deploy it in your cluster and within minutes you get cost attribution by namespace, deployment, label, and team. Its integration with cloud provider billing APIs gives you actual dollar amounts, not just CPU/memory percentages.
# Install Kubecost with Helm
helm install kubecost cost-analyzer \
--repo https://kubecost.github.io/cost-analyzer/ \
--namespace kubecost --create-namespace \
--set kubecostToken="your-token"
OpenCost, the CNCF project that Kubecost helped seed, provides a vendor-neutral cost allocation standard. If you’re running multiple clusters or want to avoid vendor lock-in, OpenCost’s specification ensures your cost data is portable.
The critical insight here: tag everything. Cost attribution only works if your resources have consistent labels.
# Every namespace should have these labels
metadata:
labels:
team: "platform-engineering"
cost-center: "CC-1042"
environment: "production"
product: "checkout"
Phase 2: Optimize (Actually Reduce the Waste)
Once you can see costs, optimization becomes tactical. Here are the highest-ROI interventions:
1. Vertical Pod Autoscaler (VPA) for right-sizing
VPA analyzes actual resource usage and recommends (or automatically sets) appropriate requests and limits. In “Recommendation” mode, it’s purely advisory — you review and apply suggestions manually.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # Recommendation only, no automatic changes
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 4
memory: 4Gi
In practice, VPA recommendations reduce request sizes by 50-70% for most workloads. Real world example: a Node.js API with 1000m CPU requests actually needs about 150m under normal load.
2. Horizontal Pod Autoscaler with KEDA
KEDA (Kubernetes Event-Driven Autoscaling) extends HPA to scale on virtually any metric — queue depth, database connections, custom business metrics. The key win: scale to zero when workloads are idle.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: worker-scaledobject
spec:
scaleTargetRef:
name: worker-deployment
minReplicaCount: 0 # Scale to zero when idle
maxReplicaCount: 20
cooldownPeriod: 300 # 5 minutes before scaling down
triggers:
- type: rabbitmq
metadata:
queueName: work-queue
queueLength: "5"
A batch processing service that previously ran 10 workers 24/7 might only need workers for 4 hours per day. Scale to zero = 83% cost reduction on that workload.
3. Cluster Autoscaler + Spot/Preemptible Instances
Running on-demand instances for every node is expensive. Most production workloads can tolerate spot instance interruptions with proper design:
# Node pool for spot instances
apiVersion: v1
kind: Node
metadata:
labels:
cloud.google.com/gke-spot: "true"
node-type: "spot"
---
# Schedule interruptible workloads on spot nodes
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
preference:
matchExpressions:
- key: node-type
operator: In
values: ["spot"]
tolerations:
- key: "cloud.google.com/gke-spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
Spot instances typically cost 60-80% less than on-demand. Even running 50% of your compute on spot dramatically reduces bills.
Photo by Maxim Hopman on Unsplash
Phase 3: Operate (Make Cost-Consciousness Cultural)
The hardest phase. Technical solutions only go so far — you need to change how teams think about compute.
Chargeback and showback models
Showback: Show teams what their workloads cost without actually charging them. This alone often motivates optimization — nobody wants to be the team burning $50k/month on a service with 10 users.
Chargeback: Actually allocate costs to team budgets. More impactful, but requires finance and engineering alignment.
Cost budgets with automated alerts
# Kubecost budget alert
apiVersion: cloudcost.kubecost.com/v1beta1
kind: Budget
metadata:
name: team-backend-budget
spec:
window: month
amount: 5000 # USD
aggregation: namespace
filter:
namespaces:
- backend
- backend-staging
alerts:
- threshold: 80 # Alert at 80% of budget
type: email
to: backend-team@company.com
- threshold: 100
type: slack
channel: "#backend-oncall"
Engineering metrics that include cost
Add cost per request, cost per user, and cost per transaction to your engineering dashboards alongside latency and error rates. When these metrics are visible in sprint reviews, engineers naturally start optimizing.
The New Tooling Landscape in 2026
Several tools have emerged as essential in the Kubernetes FinOps stack:
| Tool | Purpose | License |
|---|---|---|
| OpenCost | Cost allocation standard + UI | Apache 2.0 |
| Kubecost | Full FinOps platform | Commercial/OSS |
| Goldilocks | VPA recommendations UI | Apache 2.0 |
| KEDA | Event-driven autoscaling | Apache 2.0 |
| Karpenter | Node provisioning optimization | Apache 2.0 |
Karpenter deserves special mention. AWS’s node provisioning solution (now also available on other clouds) is dramatically smarter than the traditional Cluster Autoscaler. It provisions exactly the right instance type and size for pending pods, considers spot pricing in real-time, and consolidates underutilized nodes aggressively. Teams migrating from Cluster Autoscaler to Karpenter typically see 20-40% cost reductions from node optimization alone.
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"] # ARM is cheaper!
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 1m
Real World Results
Numbers from teams who’ve implemented a full FinOps practice on Kubernetes:
- E-commerce company (50-node cluster): 42% cost reduction over 6 months, primarily from VPA right-sizing and spot adoption
- SaaS startup (multi-cluster): Identified $180k/year in idle resources through OpenCost attribution; shut down 3 staging environments that nobody was using
- Financial services firm: Chargeback model led to 35% organic cost reduction as teams became budget-conscious
The pattern is consistent: most organizations can cut Kubernetes costs by 30-50% without any degradation in performance or reliability. The savings are just sitting there, obscured by abstraction.
Getting Started Today
If you’re starting from zero, here’s a pragmatic sequence:
- Week 1: Deploy OpenCost or Kubecost. Just look at the data. No changes yet.
- Week 2: Add resource labels to everything. Fix your attribution.
- Week 3: Run VPA in recommendation mode for 2 weeks. Review suggestions.
- Month 2: Implement KEDA for batch/queue workloads. Enable scale-to-zero.
- Month 3: Move non-critical workloads to spot instances.
- Month 4: Set up team budgets with showback reports.
The ROI on this work is almost always immediate. A few weeks of engineering time typically unlocks six-figure annual savings for medium-to-large clusters.
Kubernetes gave us incredible leverage on infrastructure. FinOps gives us the tools to ensure we’re actually using that leverage wisely.
Related reading: OpenCost documentation, Karpenter concepts, FinOps Foundation
이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)
