Kubernetes Operator Pattern in 2026: Building Production-Grade Controllers
on Kubernetes, Devops, Cloud native, Go, Operators
The Kubernetes operator pattern has become the standard way to encode operational knowledge into code. What started as a way to manage stateful applications like databases has expanded to cover everything from certificate management to ML model deployments. In 2026, operators are so pervasive that most platform teams build at least one custom operator. Here’s how to build them well.
Photo by Growtika on Unsplash
What Is a Kubernetes Operator?
An operator is a combination of a Custom Resource Definition (CRD) and a controller:
- CRD: Extends the Kubernetes API with your custom resource types
- Controller: Watches those resources and reconciles actual state with desired state
The reconciliation loop is the heart of any operator:
Observe desired state (from CRD spec)
↓
Observe actual state (from cluster, external APIs)
↓
Compute diff
↓
Apply changes to make actual = desired
↓
Update status subresource
↓
Return (re-queue if needed)
This loop runs continuously. Operators don’t use webhooks or event-driven pushes — they periodically reconcile, which makes them resilient to missed events.
Scaffolding with Operator SDK
The fastest way to start is with Operator SDK (backed by Red Hat) or Kubebuilder (backed by the Kubernetes project). Both generate the same scaffolding:
# Install operator-sdk
brew install operator-sdk
# Create a new operator project
mkdir database-operator && cd database-operator
operator-sdk init --domain mycompany.com --repo github.com/mycompany/database-operator
# Create a new API (CRD + controller)
operator-sdk create api \
--group database \
--version v1alpha1 \
--kind ManagedDatabase \
--resource \
--controller
This generates:
database-operator/
├── api/
│ └── v1alpha1/
│ ├── manageddatabase_types.go # CRD schema
│ └── zz_generated.deepcopy.go # auto-generated
├── config/
│ ├── crd/ # CRD YAML
│ ├── rbac/ # RBAC rules
│ └── manager/ # Deployment YAML
├── controllers/
│ └── manageddatabase_controller.go # Your reconciler
└── main.go # Entry point
Defining Your CRD Types
// api/v1alpha1/manageddatabase_types.go
package v1alpha1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// ManagedDatabaseSpec defines the desired state
type ManagedDatabaseSpec struct {
// +kubebuilder:validation:Enum=postgres;mysql;redis
Engine string `json:"engine"`
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=100
StorageGB int `json:"storageGB"`
// +kubebuilder:validation:Enum=t3.micro;t3.small;t3.medium;t3.large
InstanceClass string `json:"instanceClass"`
// +optional
// +kubebuilder:default=false
MultiAZ bool `json:"multiAZ,omitempty"`
// +optional
BackupRetentionDays int `json:"backupRetentionDays,omitempty"`
}
// ManagedDatabaseStatus defines the observed state
type ManagedDatabaseStatus struct {
// Conditions represent the latest available observations
// +optional
// +listType=map
// +listMapKey=type
Conditions []metav1.Condition `json:"conditions,omitempty"`
// Phase summarizes the overall state
// +optional
// +kubebuilder:validation:Enum=Pending;Provisioning;Ready;Failed;Deleting
Phase string `json:"phase,omitempty"`
// Endpoint is the connection endpoint
// +optional
Endpoint string `json:"endpoint,omitempty"`
// ProviderID is the cloud provider resource ID (e.g., RDS instance ID)
// +optional
ProviderID string `json:"providerID,omitempty"`
}
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:name="Engine",type=string,JSONPath=`.spec.engine`
// +kubebuilder:printcolumn:name="Phase",type=string,JSONPath=`.status.phase`
// +kubebuilder:printcolumn:name="Endpoint",type=string,JSONPath=`.status.endpoint`
// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp`
type ManagedDatabase struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ManagedDatabaseSpec `json:"spec,omitempty"`
Status ManagedDatabaseStatus `json:"status,omitempty"`
}
The // +kubebuilder:* comments are markers that generate validation and RBAC rules. Run make generate manifests to regenerate after changes.
Writing the Reconciler
// controllers/manageddatabase_controller.go
package controllers
import (
"context"
"fmt"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/api/meta"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
"sigs.k8s.io/controller-runtime/pkg/log"
databasev1alpha1 "github.com/mycompany/database-operator/api/v1alpha1"
)
const (
finalizerName = "database.mycompany.com/finalizer"
conditionTypeReady = "Ready"
conditionTypeProvisioned = "Provisioned"
)
type ManagedDatabaseReconciler struct {
client.Client
CloudProvider CloudProviderInterface
}
func (r *ManagedDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// Fetch the resource
db := &databasev1alpha1.ManagedDatabase{}
if err := r.Get(ctx, req.NamespacedName, db); err != nil {
if apierrors.IsNotFound(err) {
// Object deleted before we could reconcile — ignore
return ctrl.Result{}, nil
}
return ctrl.Result{}, fmt.Errorf("fetching ManagedDatabase: %w", err)
}
// Handle deletion
if db.DeletionTimestamp != nil {
return r.reconcileDelete(ctx, db)
}
// Add finalizer if not present
if !controllerutil.ContainsFinalizer(db, finalizerName) {
controllerutil.AddFinalizer(db, finalizerName)
if err := r.Update(ctx, db); err != nil {
return ctrl.Result{}, fmt.Errorf("adding finalizer: %w", err)
}
return ctrl.Result{Requeue: true}, nil
}
// Reconcile the desired state
return r.reconcileNormal(ctx, db)
}
func (r *ManagedDatabaseReconciler) reconcileNormal(
ctx context.Context,
db *databasev1alpha1.ManagedDatabase,
) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// Check if database already exists in cloud provider
existingDB, err := r.CloudProvider.GetDatabase(ctx, db.Status.ProviderID)
if err != nil && !IsNotFoundError(err) {
return ctrl.Result{}, fmt.Errorf("checking cloud provider: %w", err)
}
if existingDB == nil {
// Database doesn't exist — create it
logger.Info("Creating database", "engine", db.Spec.Engine)
r.setPhase(db, "Provisioning")
if err := r.Status().Update(ctx, db); err != nil {
return ctrl.Result{}, err
}
providerID, err := r.CloudProvider.CreateDatabase(ctx, CreateDatabaseInput{
Name: db.Name,
Engine: db.Spec.Engine,
StorageGB: db.Spec.StorageGB,
InstanceClass: db.Spec.InstanceClass,
MultiAZ: db.Spec.MultiAZ,
})
if err != nil {
r.setCondition(db, conditionTypeProvisioned, metav1.ConditionFalse,
"CreateFailed", err.Error())
r.setPhase(db, "Failed")
_ = r.Status().Update(ctx, db)
return ctrl.Result{}, fmt.Errorf("creating database: %w", err)
}
db.Status.ProviderID = providerID
// Requeue to check provisioning status
return ctrl.Result{RequeueAfter: 30 * time.Second}, r.Status().Update(ctx, db)
}
// Database exists — check status
if !existingDB.IsAvailable() {
logger.Info("Database still provisioning", "status", existingDB.Status)
r.setPhase(db, "Provisioning")
_ = r.Status().Update(ctx, db)
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}
// Database is ready!
db.Status.Endpoint = existingDB.Endpoint
r.setPhase(db, "Ready")
r.setCondition(db, conditionTypeReady, metav1.ConditionTrue, "DatabaseReady",
"Database is available and accepting connections")
return ctrl.Result{RequeueAfter: 5 * time.Minute}, r.Status().Update(ctx, db)
}
func (r *ManagedDatabaseReconciler) reconcileDelete(
ctx context.Context,
db *databasev1alpha1.ManagedDatabase,
) (ctrl.Result, error) {
if !controllerutil.ContainsFinalizer(db, finalizerName) {
return ctrl.Result{}, nil // already cleaned up
}
// Delete from cloud provider
if db.Status.ProviderID != "" {
if err := r.CloudProvider.DeleteDatabase(ctx, db.Status.ProviderID); err != nil {
return ctrl.Result{}, fmt.Errorf("deleting from cloud provider: %w", err)
}
}
// Remove finalizer
controllerutil.RemoveFinalizer(db, finalizerName)
return ctrl.Result{}, r.Update(ctx, db)
}
Key Patterns for Production Operators
1. Status Conditions
Always use the standard metav1.Condition type for status conditions. This is the Kubernetes-native way to communicate state:
func (r *Reconciler) setCondition(
obj *databasev1alpha1.ManagedDatabase,
condType string,
status metav1.ConditionStatus,
reason, message string,
) {
meta.SetStatusCondition(&obj.Status.Conditions, metav1.Condition{
Type: condType,
Status: status,
Reason: reason,
Message: message,
LastTransitionTime: metav1.Now(),
ObservedGeneration: obj.Generation,
})
}
2. Idempotency is Everything
Your reconciler will run multiple times for the same resource. Design accordingly:
// ❌ Wrong: Not idempotent
func createDatabase() {
db.Create(name) // Will error on second run: "already exists"
}
// ✅ Correct: Check and create if needed
func ensureDatabase(ctx context.Context, name string) error {
existing, err := provider.GetDatabase(ctx, name)
if err != nil && !IsNotFound(err) {
return err
}
if existing != nil {
return nil // already exists, nothing to do
}
return provider.CreateDatabase(ctx, name)
}
3. Use Exponential Backoff for Requeue
// On transient error: requeue with backoff
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
// On permanent error: don't requeue (fix the spec)
return ctrl.Result{}, nil
// On "still waiting": requeue to poll
return ctrl.Result{RequeueAfter: 10 * time.Second}, nil
// Completed successfully: periodic health check
return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil
4. Watch Related Resources
func (r *ManagedDatabaseReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&databasev1alpha1.ManagedDatabase{}).
// Watch Secrets that reference this DB
Watches(
&corev1.Secret{},
handler.EnqueueRequestsFromMapFunc(r.secretToDatabase),
).
// Watch owned resources (Deployments, Services this operator creates)
Owns(&corev1.ConfigMap{}).
// Rate limit reconciliation
WithOptions(controller.Options{
MaxConcurrentReconciles: 5,
RateLimiter: workqueue.NewItemExponentialFailureRateLimiter(
5*time.Millisecond,
1000*time.Second,
),
}).
Complete(r)
}
Testing Your Operator
// controllers/manageddatabase_controller_test.go
package controllers_test
import (
"context"
"time"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"sigs.k8s.io/controller-runtime/pkg/client"
databasev1alpha1 "github.com/mycompany/database-operator/api/v1alpha1"
)
var _ = Describe("ManagedDatabase Controller", func() {
Context("When creating a ManagedDatabase", func() {
It("Should provision the database and update status", func() {
ctx := context.Background()
db := &databasev1alpha1.ManagedDatabase{
ObjectMeta: metav1.ObjectMeta{
Name: "test-db",
Namespace: "default",
},
Spec: databasev1alpha1.ManagedDatabaseSpec{
Engine: "postgres",
StorageGB: 10,
InstanceClass: "t3.micro",
},
}
Expect(k8sClient.Create(ctx, db)).To(Succeed())
// Wait for reconciliation
Eventually(func() string {
_ = k8sClient.Get(ctx, client.ObjectKeyFromObject(db), db)
return db.Status.Phase
}, time.Second*30, time.Second).Should(Equal("Ready"))
Expect(db.Status.Endpoint).NotTo(BeEmpty())
})
})
})
The envtest package from controller-runtime runs a real Kubernetes API server in-process for tests — no cluster needed.
Conclusion
Kubernetes operators encode operational expertise that would otherwise live in runbooks, Slack messages, and tribal knowledge. The controller pattern — observe, compare, reconcile — is elegant once internalized.
The controller-runtime library has matured to the point where the boilerplate is manageable, and the Operator SDK adds scaffolding that gets you to a working operator in minutes. The hard part isn’t the code; it’s the operational model: what does “desired state” mean for your resource? What compensations are needed on failure? How does deletion work?
Answer those questions clearly, implement them in your reconciler, and you’ll have infrastructure that manages itself.
Related: Service Mesh: Cilium vs Istio in 2026
이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)
