Kubernetes Best Practices 2026: Production-Ready Cluster Management
in Development on Kubernetes, DevOps
Kubernetes Best Practices 2026: Production-Ready Cluster Management
Kubernetes continues to dominate container orchestration in 2026. This guide covers essential best practices for running production-ready Kubernetes clusters, from resource management to security hardening.
Resource Management
Resource Requests and Limits
Always define resource requests and limits for every container:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 3
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
containers:
- name: api
image: myapp/api:v1.2.3
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
LimitRange for Namespace Defaults
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- default:
memory: "512Mi"
cpu: "500m"
defaultRequest:
memory: "256Mi"
cpu: "250m"
type: Container
ResourceQuota for Namespace Budgets
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
requests.cpu: "20"
requests.memory: "40Gi"
limits.cpu: "40"
limits.memory: "80Gi"
pods: "100"
services: "20"
Pod Design Best Practices
Liveness and Readiness Probes
spec:
containers:
- name: api
image: myapp/api:v1.2.3
ports:
- containerPort: 8080
# Readiness: When to receive traffic
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
# Liveness: When to restart
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
failureThreshold: 3
# Startup: For slow-starting containers
startupProbe:
httpGet:
path: /health/ready
port: 8080
failureThreshold: 30
periodSeconds: 10
Pod Disruption Budgets
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2
# OR: maxUnavailable: 1
selector:
matchLabels:
app: api-server
Anti-Affinity for High Availability
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- api-server
topologyKey: kubernetes.io/hostname
Security Best Practices
Pod Security Standards
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
Security Context
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: api
image: myapp/api:v1.2.3
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
Network Policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-network-policy
namespace: production
spec:
podSelector:
matchLabels:
app: api-server
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8080
egress:
- to:
- namespaceSelector:
matchLabels:
name: production
ports:
- protocol: TCP
port: 5432 # PostgreSQL
Secrets Management
# Use External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: api-secrets
spec:
refreshInterval: 1h
secretStoreRef:
kind: ClusterSecretStore
name: aws-secrets-manager
target:
name: api-secrets
data:
- secretKey: database-url
remoteRef:
key: production/api/database-url
Configuration Management
ConfigMaps for Non-Sensitive Data
apiVersion: v1
kind: ConfigMap
metadata:
name: api-config
data:
LOG_LEVEL: "info"
MAX_CONNECTIONS: "100"
config.yaml: |
server:
port: 8080
timeout: 30s
Environment Variables Best Practices
spec:
containers:
- name: api
env:
# From ConfigMap
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: api-config
key: LOG_LEVEL
# From Secret
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: api-secrets
key: database-url
# Pod metadata
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
# Resource info
- name: MEMORY_LIMIT
valueFrom:
resourceFieldRef:
containerName: api
resource: limits.memory
Observability
Structured Logging
spec:
containers:
- name: api
env:
- name: LOG_FORMAT
value: "json"
Prometheus Metrics
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
OpenTelemetry Integration
spec:
containers:
- name: api
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-collector:4317"
- name: OTEL_SERVICE_NAME
value: "api-server"
- name: OTEL_RESOURCE_ATTRIBUTES
value: "deployment.environment=production"
Deployment Strategies
Rolling Update (Default)
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
Blue-Green with Argo Rollouts
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: api-server
spec:
replicas: 5
strategy:
blueGreen:
activeService: api-active
previewService: api-preview
autoPromotionEnabled: false
prePromotionAnalysis:
templates:
- templateName: success-rate
Canary Deployment
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: api-server
spec:
strategy:
canary:
steps:
- setWeight: 5
- pause: {duration: 5m}
- setWeight: 20
- pause: {duration: 10m}
- setWeight: 50
- pause: {duration: 10m}
- setWeight: 100
analysis:
templates:
- templateName: success-rate
startingStep: 1
Autoscaling
Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
Vertical Pod Autoscaler
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: Auto
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
memory: "256Mi"
cpu: "250m"
maxAllowed:
memory: "4Gi"
cpu: "2"
GitOps with ArgoCD
Application Definition
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: api-server
namespace: argocd
spec:
project: production
source:
repoURL: https://github.com/myorg/k8s-manifests.git
targetRevision: main
path: apps/api-server
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Conclusion
Following these Kubernetes best practices in 2026 will help you build resilient, secure, and efficient production clusters. Key takeaways:
- Always set resource requests and limits
- Implement proper health checks
- Use Pod Disruption Budgets for availability
- Apply security contexts and network policies
- Enable observability from day one
- Adopt GitOps for deployment consistency
Resources
이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)
