OpenTelemetry in 2026: The Complete Guide to Unified Observability for Distributed Systems



Introduction

OpenTelemetry (OTel) has become the de-facto standard for instrumenting distributed systems. In 2026, the project has reached a level of maturity where traces, metrics, and logs are all stable across major languages, and the fourth pillar — profiling — is graduating from experimental status. If you’re still running separate APM agents for metrics and a different solution for tracing, it’s time to consolidate.

This guide covers the full OTel stack: SDK instrumentation, the Collector, backends, and the emerging patterns that are simplifying observability at scale.

Monitoring Dashboard

Photo by Stephen Dawson on Unsplash


The OpenTelemetry Architecture in 2026

┌─────────────────────────────────────────────────────────────┐
│                    Your Application                          │
│  ┌─────────────────────────────────────────────────────┐   │
│  │         OTel SDK (auto + manual instrumentation)     │   │
│  │   Traces │ Metrics │ Logs │ Profiles (beta)          │   │
│  └──────────────────────┬──────────────────────────────┘   │
└─────────────────────────┼───────────────────────────────────┘
                          │ OTLP (gRPC or HTTP)
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                 OTel Collector (Gateway)                     │
│  ┌──────────┐  ┌──────────────┐  ┌────────────────────┐   │
│  │Receivers │→ │  Processors   │→ │     Exporters       │   │
│  │OTLP      │  │ batch/filter │  │ Jaeger/Tempo        │   │
│  │Prometheus│  │ transform    │  │ Prometheus/Mimir    │   │
│  │Zipkin    │  │ tail-sample  │  │ Loki/Elasticsearch  │   │
│  └──────────┘  └──────────────┘  │ Pyroscope (profiles)│   │
│                                   └────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Setting Up Auto-Instrumentation

The easiest way to get started is with zero-code instrumentation. For Node.js:

npm install @opentelemetry/auto-instrumentations-node \
            @opentelemetry/sdk-node \
            @opentelemetry/exporter-trace-otlp-http \
            @opentelemetry/exporter-metrics-otlp-http
// tracing.ts — load before your app
import { NodeSDK } from '@opentelemetry/sdk-node'
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node'
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http'
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http'
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics'
import { Resource } from '@opentelemetry/resources'
import { ATTR_SERVICE_NAME, ATTR_SERVICE_VERSION } from '@opentelemetry/semantic-conventions'

const sdk = new NodeSDK({
  resource: new Resource({
    [ATTR_SERVICE_NAME]: process.env.SERVICE_NAME ?? 'unknown-service',
    [ATTR_SERVICE_VERSION]: process.env.SERVICE_VERSION ?? '0.0.0',
    'deployment.environment': process.env.NODE_ENV ?? 'development',
  }),
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://otel-collector:4318/v1/traces',
  }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({
      url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://otel-collector:4318/v1/metrics',
    }),
    exportIntervalMillis: 15_000,
  }),
  instrumentations: [
    getNodeAutoInstrumentations({
      '@opentelemetry/instrumentation-fs': { enabled: false }, // too noisy
      '@opentelemetry/instrumentation-http': {
        requestHook: (span, request) => {
          span.setAttribute('http.request.body.size', request.headers['content-length'] ?? 0)
        },
      },
    }),
  ],
})

sdk.start()

process.on('SIGTERM', () => sdk.shutdown())

Start your app with:

node --require ./tracing.js app.js

Auto-instrumentation covers: Express/Fastify/Koa, PostgreSQL, MySQL, Redis, MongoDB, gRPC, AWS SDK, fetch/http, and 50+ other libraries.


Manual Instrumentation: Adding Business Context

Auto-instrumentation captures infrastructure-level traces. For business-level insights, add manual spans:

import { trace, context, SpanStatusCode } from '@opentelemetry/api'

const tracer = trace.getTracer('order-service', '1.0.0')

async function processOrder(orderId: string, userId: string) {
  // Create a root span for the business operation
  return tracer.startActiveSpan('order.process', async (span) => {
    // Add business context as attributes
    span.setAttributes({
      'order.id': orderId,
      'user.id': userId,
      'order.source': 'web',
    })
    
    try {
      // Nested spans for sub-operations
      const inventory = await tracer.startActiveSpan('inventory.check', async (invSpan) => {
        const result = await checkInventory(orderId)
        invSpan.setAttribute('inventory.available', result.available)
        invSpan.end()
        return result
      })
      
      if (!inventory.available) {
        span.setStatus({ code: SpanStatusCode.ERROR, message: 'Out of stock' })
        span.addEvent('order.rejected', { reason: 'out_of_stock' })
        throw new Error(`Order ${orderId} failed: out of stock`)
      }
      
      const payment = await tracer.startActiveSpan('payment.charge', async (paySpan) => {
        const result = await chargePayment(userId, inventory.totalAmount)
        paySpan.setAttributes({
          'payment.amount': inventory.totalAmount,
          'payment.currency': 'USD',
          'payment.method': result.method,
        })
        paySpan.end()
        return result
      })
      
      span.addEvent('order.completed', {
        'order.total': inventory.totalAmount,
        'payment.id': payment.id,
      })
      
      span.setStatus({ code: SpanStatusCode.OK })
      return { success: true, paymentId: payment.id }
      
    } catch (error) {
      span.recordException(error as Error)
      span.setStatus({ code: SpanStatusCode.ERROR })
      throw error
    } finally {
      span.end()
    }
  })
}

Custom Metrics: The Right Way

import { metrics } from '@opentelemetry/api'

const meter = metrics.getMeter('order-service', '1.0.0')

// Counter: monotonically increasing
const ordersProcessed = meter.createCounter('orders.processed', {
  description: 'Total number of orders processed',
  unit: '{orders}',
})

// Histogram: latency / distribution data
const orderProcessingDuration = meter.createHistogram('orders.processing.duration', {
  description: 'Time to process an order',
  unit: 'ms',
  advice: {
    explicitBucketBoundaries: [10, 50, 100, 200, 500, 1000, 2000, 5000],
  },
})

// UpDownCounter: can go up or down (e.g., queue depth)
const activeOrders = meter.createUpDownCounter('orders.active', {
  description: 'Number of orders currently being processed',
})

// Gauge: point-in-time measurement (via observable)
const orderQueueDepth = meter.createObservableGauge('orders.queue.depth', {
  description: 'Current depth of the order processing queue',
})
orderQueueDepth.addCallback((result) => {
  result.observe(getQueueDepth(), { queue: 'primary' })
})

// Usage in business logic
async function processOrder(orderId: string) {
  const startTime = Date.now()
  activeOrders.add(1, { status: 'processing' })
  
  try {
    const result = await doProcessOrder(orderId)
    ordersProcessed.add(1, { status: 'success', region: result.region })
    return result
  } catch (error) {
    ordersProcessed.add(1, { status: 'error', error_type: error.constructor.name })
    throw error
  } finally {
    orderProcessingDuration.record(Date.now() - startTime)
    activeOrders.add(-1, { status: 'processing' })
  }
}

The OTel Collector: Deployment Patterns

Sidecar Pattern (per-pod)

# Kubernetes DaemonSet for node-level collection
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-collector
  namespace: observability
spec:
  selector:
    matchLabels:
      app: otel-collector
  template:
    spec:
      containers:
      - name: otel-collector
        image: otel/opentelemetry-collector-contrib:0.98.0
        resources:
          limits:
            memory: 512Mi
            cpu: 500m
          requests:
            memory: 128Mi
            cpu: 100m
        volumeMounts:
        - name: otel-collector-config-vol
          mountPath: /conf
      volumes:
      - name: otel-collector-config-vol
        configMap:
          name: otel-collector-config

Collector Configuration with Tail Sampling

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024
  
  memory_limiter:
    check_interval: 1s
    limit_mib: 400
    spike_limit_mib: 100
  
  # Tail-based sampling: make sampling decisions after seeing the full trace
  tail_sampling:
    decision_wait: 10s
    policies:
    - name: errors-policy
      type: status_code
      status_code: {status_codes: [ERROR]}
    - name: slow-traces-policy
      type: latency
      latency: {threshold_ms: 1000}
    - name: probabilistic-sample
      type: probabilistic
      probabilistic: {sampling_percentage: 10}

exporters:
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true
  prometheusremotewrite:
    endpoint: http://mimir:9009/api/v1/push
  loki:
    endpoint: http://loki:3100/loki/api/v1/push

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, tail_sampling, batch]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheusremotewrite]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [loki]

Logs: Structured Logging with OTel

In 2026, the OTel Logs SDK is stable across all major languages. The key is correlating logs with traces using trace context:

import { logs, SeverityNumber } from '@opentelemetry/api-logs'
import { context, trace } from '@opentelemetry/api'

const logger = logs.getLogger('order-service')

function structuredLog(level: SeverityNumber, message: string, attributes: Record<string, unknown> = {}) {
  const span = trace.getActiveSpan()
  const spanContext = span?.spanContext()
  
  logger.emit({
    severityNumber: level,
    severityText: SeverityNumber[level],
    body: message,
    attributes: {
      ...attributes,
      // Automatically inject trace context for log-trace correlation
      'trace.id': spanContext?.traceId,
      'span.id': spanContext?.spanId,
      'service.name': 'order-service',
    },
    timestamp: new Date(),
  })
}

// Usage
structuredLog(SeverityNumber.INFO, 'Order payment processed', {
  'order.id': orderId,
  'payment.amount': amount,
  'payment.provider': 'stripe',
})

Profiling: The Fourth Pillar (Beta in 2026)

Continuous profiling — capturing CPU, memory, and goroutine/thread profiles in production — is the newest addition to the OTel spec. Grafana Pyroscope is the leading open-source backend for OTel profiles.

// Node.js continuous profiling with OTel
import { ProfilingExporter } from '@opentelemetry/exporter-profiles-otlp-http'
import { ContinuousProfiler } from '@opentelemetry/profiling'

const profiler = new ContinuousProfiler({
  exporter: new ProfilingExporter({
    url: 'http://pyroscope:4040/ingest',
  }),
  intervalMs: 10_000,       // Profile every 10 seconds
  collectCpuProfile: true,
  collectHeapProfile: true,
})

profiler.start()

The real power is trace-to-profile linking: click on a slow span in Jaeger/Tempo, and jump directly to the CPU profile captured during that trace’s execution.


Summary: The OTel Stack in 2026

SignalStatusRecommended Backend
TracesStableGrafana Tempo / Jaeger
MetricsStableGrafana Mimir / Prometheus
LogsStableGrafana Loki / Elasticsearch
ProfilesBetaGrafana Pyroscope
BaggageStable(propagation only)

OpenTelemetry has won the observability vendor lock-in battle. By instrumenting once with OTel, you retain the freedom to swap backends without re-instrumenting your services. In 2026, there’s no good reason to use a proprietary APM SDK for new projects.


Questions about setting up OTel in your infrastructure? Leave a comment or reach out on GitHub.

이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)