OpenTelemetry in 2026: The Complete Guide to Unified Observability for Distributed Systems
in Devops / Observability on Opentelemetry, Observability, Distributed tracing, Metrics, Logging, Devops, Sre
Introduction
OpenTelemetry (OTel) has become the de-facto standard for instrumenting distributed systems. In 2026, the project has reached a level of maturity where traces, metrics, and logs are all stable across major languages, and the fourth pillar — profiling — is graduating from experimental status. If you’re still running separate APM agents for metrics and a different solution for tracing, it’s time to consolidate.
This guide covers the full OTel stack: SDK instrumentation, the Collector, backends, and the emerging patterns that are simplifying observability at scale.
Photo by Stephen Dawson on Unsplash
The OpenTelemetry Architecture in 2026
┌─────────────────────────────────────────────────────────────┐
│ Your Application │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ OTel SDK (auto + manual instrumentation) │ │
│ │ Traces │ Metrics │ Logs │ Profiles (beta) │ │
│ └──────────────────────┬──────────────────────────────┘ │
└─────────────────────────┼───────────────────────────────────┘
│ OTLP (gRPC or HTTP)
▼
┌─────────────────────────────────────────────────────────────┐
│ OTel Collector (Gateway) │
│ ┌──────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │Receivers │→ │ Processors │→ │ Exporters │ │
│ │OTLP │ │ batch/filter │ │ Jaeger/Tempo │ │
│ │Prometheus│ │ transform │ │ Prometheus/Mimir │ │
│ │Zipkin │ │ tail-sample │ │ Loki/Elasticsearch │ │
│ └──────────┘ └──────────────┘ │ Pyroscope (profiles)│ │
│ └────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Setting Up Auto-Instrumentation
The easiest way to get started is with zero-code instrumentation. For Node.js:
npm install @opentelemetry/auto-instrumentations-node \
@opentelemetry/sdk-node \
@opentelemetry/exporter-trace-otlp-http \
@opentelemetry/exporter-metrics-otlp-http
// tracing.ts — load before your app
import { NodeSDK } from '@opentelemetry/sdk-node'
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node'
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http'
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http'
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics'
import { Resource } from '@opentelemetry/resources'
import { ATTR_SERVICE_NAME, ATTR_SERVICE_VERSION } from '@opentelemetry/semantic-conventions'
const sdk = new NodeSDK({
resource: new Resource({
[ATTR_SERVICE_NAME]: process.env.SERVICE_NAME ?? 'unknown-service',
[ATTR_SERVICE_VERSION]: process.env.SERVICE_VERSION ?? '0.0.0',
'deployment.environment': process.env.NODE_ENV ?? 'development',
}),
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://otel-collector:4318/v1/traces',
}),
metricReader: new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://otel-collector:4318/v1/metrics',
}),
exportIntervalMillis: 15_000,
}),
instrumentations: [
getNodeAutoInstrumentations({
'@opentelemetry/instrumentation-fs': { enabled: false }, // too noisy
'@opentelemetry/instrumentation-http': {
requestHook: (span, request) => {
span.setAttribute('http.request.body.size', request.headers['content-length'] ?? 0)
},
},
}),
],
})
sdk.start()
process.on('SIGTERM', () => sdk.shutdown())
Start your app with:
node --require ./tracing.js app.js
Auto-instrumentation covers: Express/Fastify/Koa, PostgreSQL, MySQL, Redis, MongoDB, gRPC, AWS SDK, fetch/http, and 50+ other libraries.
Manual Instrumentation: Adding Business Context
Auto-instrumentation captures infrastructure-level traces. For business-level insights, add manual spans:
import { trace, context, SpanStatusCode } from '@opentelemetry/api'
const tracer = trace.getTracer('order-service', '1.0.0')
async function processOrder(orderId: string, userId: string) {
// Create a root span for the business operation
return tracer.startActiveSpan('order.process', async (span) => {
// Add business context as attributes
span.setAttributes({
'order.id': orderId,
'user.id': userId,
'order.source': 'web',
})
try {
// Nested spans for sub-operations
const inventory = await tracer.startActiveSpan('inventory.check', async (invSpan) => {
const result = await checkInventory(orderId)
invSpan.setAttribute('inventory.available', result.available)
invSpan.end()
return result
})
if (!inventory.available) {
span.setStatus({ code: SpanStatusCode.ERROR, message: 'Out of stock' })
span.addEvent('order.rejected', { reason: 'out_of_stock' })
throw new Error(`Order ${orderId} failed: out of stock`)
}
const payment = await tracer.startActiveSpan('payment.charge', async (paySpan) => {
const result = await chargePayment(userId, inventory.totalAmount)
paySpan.setAttributes({
'payment.amount': inventory.totalAmount,
'payment.currency': 'USD',
'payment.method': result.method,
})
paySpan.end()
return result
})
span.addEvent('order.completed', {
'order.total': inventory.totalAmount,
'payment.id': payment.id,
})
span.setStatus({ code: SpanStatusCode.OK })
return { success: true, paymentId: payment.id }
} catch (error) {
span.recordException(error as Error)
span.setStatus({ code: SpanStatusCode.ERROR })
throw error
} finally {
span.end()
}
})
}
Custom Metrics: The Right Way
import { metrics } from '@opentelemetry/api'
const meter = metrics.getMeter('order-service', '1.0.0')
// Counter: monotonically increasing
const ordersProcessed = meter.createCounter('orders.processed', {
description: 'Total number of orders processed',
unit: '{orders}',
})
// Histogram: latency / distribution data
const orderProcessingDuration = meter.createHistogram('orders.processing.duration', {
description: 'Time to process an order',
unit: 'ms',
advice: {
explicitBucketBoundaries: [10, 50, 100, 200, 500, 1000, 2000, 5000],
},
})
// UpDownCounter: can go up or down (e.g., queue depth)
const activeOrders = meter.createUpDownCounter('orders.active', {
description: 'Number of orders currently being processed',
})
// Gauge: point-in-time measurement (via observable)
const orderQueueDepth = meter.createObservableGauge('orders.queue.depth', {
description: 'Current depth of the order processing queue',
})
orderQueueDepth.addCallback((result) => {
result.observe(getQueueDepth(), { queue: 'primary' })
})
// Usage in business logic
async function processOrder(orderId: string) {
const startTime = Date.now()
activeOrders.add(1, { status: 'processing' })
try {
const result = await doProcessOrder(orderId)
ordersProcessed.add(1, { status: 'success', region: result.region })
return result
} catch (error) {
ordersProcessed.add(1, { status: 'error', error_type: error.constructor.name })
throw error
} finally {
orderProcessingDuration.record(Date.now() - startTime)
activeOrders.add(-1, { status: 'processing' })
}
}
The OTel Collector: Deployment Patterns
Sidecar Pattern (per-pod)
# Kubernetes DaemonSet for node-level collection
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-collector
namespace: observability
spec:
selector:
matchLabels:
app: otel-collector
template:
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector-contrib:0.98.0
resources:
limits:
memory: 512Mi
cpu: 500m
requests:
memory: 128Mi
cpu: 100m
volumeMounts:
- name: otel-collector-config-vol
mountPath: /conf
volumes:
- name: otel-collector-config-vol
configMap:
name: otel-collector-config
Collector Configuration with Tail Sampling
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
send_batch_size: 1024
memory_limiter:
check_interval: 1s
limit_mib: 400
spike_limit_mib: 100
# Tail-based sampling: make sampling decisions after seeing the full trace
tail_sampling:
decision_wait: 10s
policies:
- name: errors-policy
type: status_code
status_code: {status_codes: [ERROR]}
- name: slow-traces-policy
type: latency
latency: {threshold_ms: 1000}
- name: probabilistic-sample
type: probabilistic
probabilistic: {sampling_percentage: 10}
exporters:
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
prometheusremotewrite:
endpoint: http://mimir:9009/api/v1/push
loki:
endpoint: http://loki:3100/loki/api/v1/push
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, batch]
exporters: [otlp/tempo]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheusremotewrite]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [loki]
Logs: Structured Logging with OTel
In 2026, the OTel Logs SDK is stable across all major languages. The key is correlating logs with traces using trace context:
import { logs, SeverityNumber } from '@opentelemetry/api-logs'
import { context, trace } from '@opentelemetry/api'
const logger = logs.getLogger('order-service')
function structuredLog(level: SeverityNumber, message: string, attributes: Record<string, unknown> = {}) {
const span = trace.getActiveSpan()
const spanContext = span?.spanContext()
logger.emit({
severityNumber: level,
severityText: SeverityNumber[level],
body: message,
attributes: {
...attributes,
// Automatically inject trace context for log-trace correlation
'trace.id': spanContext?.traceId,
'span.id': spanContext?.spanId,
'service.name': 'order-service',
},
timestamp: new Date(),
})
}
// Usage
structuredLog(SeverityNumber.INFO, 'Order payment processed', {
'order.id': orderId,
'payment.amount': amount,
'payment.provider': 'stripe',
})
Profiling: The Fourth Pillar (Beta in 2026)
Continuous profiling — capturing CPU, memory, and goroutine/thread profiles in production — is the newest addition to the OTel spec. Grafana Pyroscope is the leading open-source backend for OTel profiles.
// Node.js continuous profiling with OTel
import { ProfilingExporter } from '@opentelemetry/exporter-profiles-otlp-http'
import { ContinuousProfiler } from '@opentelemetry/profiling'
const profiler = new ContinuousProfiler({
exporter: new ProfilingExporter({
url: 'http://pyroscope:4040/ingest',
}),
intervalMs: 10_000, // Profile every 10 seconds
collectCpuProfile: true,
collectHeapProfile: true,
})
profiler.start()
The real power is trace-to-profile linking: click on a slow span in Jaeger/Tempo, and jump directly to the CPU profile captured during that trace’s execution.
Summary: The OTel Stack in 2026
| Signal | Status | Recommended Backend |
|---|---|---|
| Traces | Stable | Grafana Tempo / Jaeger |
| Metrics | Stable | Grafana Mimir / Prometheus |
| Logs | Stable | Grafana Loki / Elasticsearch |
| Profiles | Beta | Grafana Pyroscope |
| Baggage | Stable | (propagation only) |
OpenTelemetry has won the observability vendor lock-in battle. By instrumenting once with OTel, you retain the freedom to swap backends without re-instrumenting your services. In 2026, there’s no good reason to use a proprietary APM SDK for new projects.
Questions about setting up OTel in your infrastructure? Leave a comment or reach out on GitHub.
이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)
