GitOps at Scale: Managing 100+ Microservices with Argo CD and Helm
on Gitops, Argocd, Helm, Kubernetes, Devops, Ci/cd
GitOps Sounds Simple Until You Have 100 Services
The core GitOps promise is elegant: Git is the source of truth. Every cluster state change goes through a pull request. Rollbacks are git reverts. Audit trails are git history.
In practice, scaling GitOps from 5 services to 100+ exposes a lot of rough edges. Repository structure, sync strategies, secrets management, drift detection, environment promotion — all of these have multiple valid approaches and none of them are obvious until you’ve made the wrong choice and lived with it.
This post is about what actually works in large-scale GitOps deployments, specifically with Argo CD and Helm.
Photo by Growtika on Unsplash
Repository Structure: The Foundational Decision
The biggest source of GitOps pain at scale is getting repository structure wrong early. Two main patterns:
Option A: Monorepo (App Code + K8s Config)
repo/
├── services/
│ ├── payment-service/
│ │ ├── src/
│ │ ├── Dockerfile
│ │ └── helm/ ← K8s config lives with the code
│ │ ├── Chart.yaml
│ │ ├── values.yaml
│ │ └── values-prod.yaml
│ └── user-service/
│ ├── src/
│ └── helm/
└── .github/workflows/
Pros: Single source of truth, easy to see what changed with a feature. Good for small teams.
Cons: CI/CD pipeline complexity — pushing code and pushing Kubernetes config use different triggers and review processes. Hard to enforce separation of concerns.
Option B: Separate Config Repository (Recommended for Scale)
company-app-code/ ← developers own this
├── payment-service/
│ ├── src/
│ └── Dockerfile
└── user-service/
company-gitops/ ← platform team owns this
├── apps/
│ ├── payment-service/
│ │ ├── Chart.yaml
│ │ ├── values-dev.yaml
│ │ ├── values-staging.yaml
│ │ └── values-prod.yaml
│ └── user-service/
│ └── ...
├── argocd/
│ ├── apps/ ← Argo CD Application manifests
│ └── projects/ ← Argo CD AppProject manifests
└── clusters/
├── dev/
├── staging/
└── prod/
Why this wins at scale:
- Clear ownership boundaries
- Config changes go through a separate review process (often more rigorous for prod)
- Argo CD watches the config repo, not the app repo
- Secrets and infrastructure config are isolated from application code
App of Apps Pattern
For 100+ services, the “App of Apps” pattern in Argo CD is essential. Instead of registering each service individually in Argo CD, you have a root Application that manages all other Applications:
# argocd/apps/root.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: root
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/company/gitops
targetRevision: HEAD
path: argocd/apps
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
# argocd/apps/payment-service.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: payment-service
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: production
source:
repoURL: https://github.com/company/gitops
targetRevision: HEAD
path: apps/payment-service
helm:
valueFiles:
- values-prod.yaml
destination:
server: https://kubernetes.default.svc
namespace: payment
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
revisionHistoryLimit: 10
Now adding a new service = adding one YAML file to argocd/apps/. Argo CD picks it up automatically.
Image Update Automation
The most common question: “How does a new Docker image get deployed?” You have two main approaches:
Approach 1: CI Pipeline Updates values.yaml
# CI pipeline (GitHub Actions, after building and pushing image)
- name: Update image tag in GitOps repo
uses: actions/checkout@v4
with:
repository: company/gitops
token: $
- name: Update image tag
run: |
cd gitops
# Update the image tag in values.yaml
yq e ".image.tag = \"$\"" -i apps/payment-service/values-staging.yaml
git config user.email "ci@company.com"
git config user.name "CI Bot"
git add apps/payment-service/values-staging.yaml
git commit -m "chore: update payment-service to $"
git push
Argo CD detects the change and syncs. Simple, auditable, widely used.
Approach 2: Argo CD Image Updater
# Annotation-based image update automation
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: payment-service
annotations:
argocd-image-updater.argoproj.io/image-list: payment=company/payment-service
argocd-image-updater.argoproj.io/payment.update-strategy: semver
argocd-image-updater.argoproj.io/payment.allow-tags: "^v[0-9]+\\.[0-9]+\\.[0-9]+$"
argocd-image-updater.argoproj.io/write-back-method: git
argocd-image-updater.argoproj.io/git-branch: main
Image Updater polls the container registry and opens PRs (or commits directly) when a new matching tag is found. Great for automated patch updates.
Environment Promotion
The promotion flow from dev → staging → prod deserves careful design:
Dev auto-sync on every commit to main
│
▼
Staging auto-sync on schedule (or PR approval)
│
▼
Prod: Manual approval required (PR + CODEOWNERS + required reviews)
# Using Argo CD ApplicationSets for multi-environment
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: payment-service
namespace: argocd
spec:
generators:
- list:
elements:
- env: dev
cluster: dev-cluster
autoSync: "true"
- env: staging
cluster: staging-cluster
autoSync: "true"
- env: prod
cluster: prod-cluster
autoSync: "false" # manual for prod
template:
metadata:
name: "payment-service-"
spec:
project: ""
source:
repoURL: https://github.com/company/gitops
path: "apps/payment-service"
helm:
valueFiles:
- "values-.yaml"
destination:
server: ""
namespace: payment
syncPolicy:
automated:
prune: ""
selfHeal: ""
Secrets Management: The Hardest Part
Secrets don’t belong in Git. But your Kubernetes workloads need secrets. The two best solutions in 2026:
External Secrets Operator (ESO)
# ExternalSecret syncs from AWS Secrets Manager / Vault / etc.
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: payment-service-secrets
namespace: payment
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secretsmanager
kind: ClusterSecretStore
target:
name: payment-service-secrets
creationPolicy: Owner
data:
- secretKey: DATABASE_PASSWORD
remoteRef:
key: production/payment-service
property: database_password
- secretKey: STRIPE_SECRET_KEY
remoteRef:
key: production/payment-service
property: stripe_secret_key
This manifest is safe to commit. The actual secret values live in AWS Secrets Manager (or Vault, GCP Secret Manager, etc.) and are synced to Kubernetes Secrets by the ESO controller.
Sealed Secrets (Simpler but Less Flexible)
# Encrypt a secret with the cluster's public key
echo -n "my-secret-value" | \
kubeseal --raw \
--from-file=/dev/stdin \
--namespace payment \
--name payment-service-secrets
# The SealedSecret is safe to commit
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
name: payment-service-secrets
namespace: payment
spec:
encryptedData:
DATABASE_PASSWORD: AgB5...encrypted...
Sealed Secrets are simpler but tied to a specific cluster. ESO is better for multi-cluster and rotation scenarios.
Drift Detection and Alerting
In a production system, you need to know when cluster state diverges from Git state:
# Custom drift alerting script
import subprocess
import json
import requests
def check_argo_sync_status():
result = subprocess.run(
["argocd", "app", "list", "-o", "json", "--server", ARGO_URL],
capture_output=True, text=True
)
apps = json.loads(result.stdout)
drifted = []
for app in apps:
status = app["status"]["sync"]["status"]
health = app["status"]["health"]["status"]
if status != "Synced" or health != "Healthy":
drifted.append({
"name": app["metadata"]["name"],
"sync_status": status,
"health": health,
"message": app["status"].get("operationState", {}).get("message", "")
})
return drifted
def alert_on_drift(drifted: list):
if not drifted:
return
# Send to Slack/PagerDuty
for app in drifted:
if app["health"] == "Degraded":
# Critical: page on-call
pagerduty_alert(app)
else:
# Non-critical: Slack notification
slack_notify(f"⚠️ {app['name']} is {app['sync_status']}/{app['health']}")
# Run every 5 minutes via cron
if __name__ == "__main__":
drifted = check_argo_sync_status()
alert_on_drift(drifted)
Performance at Scale
With 100+ apps, Argo CD itself can become a bottleneck. Key tuning parameters:
# argocd-cmd-params-cm ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cmd-params-cm
namespace: argocd
data:
# Increase concurrent reconciliations
controller.operation.processors: "25"
controller.status.processors: "50"
# Sharding for 100+ clusters
controller.sharding.algorithm: "round-robin"
# Repo server caching
reposerver.parallelism.limit: "10"
# Reduce unnecessary reconciliations
timeout.reconciliation: "180s"
timeout.hard.reconciliation: "0s"
For very large deployments (500+ apps), consider:
- Argo CD instance per cluster — each cluster manages itself
- Multi-instance hub-spoke — one control plane, multiple app instances
- ApplicationSet with sharding — distribute work across multiple controllers
Rollback Strategy
# List revision history
argocd app history payment-service
ID DATE REVISION
5 2026-06-27 12:00:00 UTC HEAD (abc1234)
4 2026-06-26 18:00:00 UTC def5678
3 2026-06-26 09:00:00 UTC ghi9012
# Rollback to revision 4
argocd app rollback payment-service 4
# Or do it via Git (preferred for auditability)
git -C gitops revert HEAD --no-edit
git push
# Argo CD detects and syncs automatically
The Git revert approach is preferred because it creates an audit trail and keeps Git as the source of truth. Direct Argo CD rollbacks bypass the GitOps workflow — use them only for emergencies, then immediately create the corresponding Git revert.
Conclusion
GitOps at scale is absolutely achievable, but it requires deliberate architecture decisions early. The patterns that work:
- Separate config repository with clear ownership
- App of Apps for managing 100+ Argo CD Applications
- ApplicationSets for multi-environment/multi-cluster deployments
- External Secrets Operator for secrets management
- CI-driven image updates with PR-based promotion
- Drift detection alerting so problems surface immediately
The teams that struggle with GitOps at scale are usually the ones who started simple (monorepo, no automation) and tried to scale without restructuring. Getting the foundations right pays dividends for years.
What GitOps patterns is your team using at scale? Curious to hear how others handle the environment promotion and secrets management challenges.
이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)
