OpenTelemetry: Complete Guide to Modern Observability


OpenTelemetry: Complete Guide to Modern Observability

OpenTelemetry (OTel) has become the industry standard for observability, providing vendor-neutral APIs and SDKs for distributed tracing, metrics, and logging. This comprehensive guide will help you implement production-grade observability in your applications.

What is OpenTelemetry?

OpenTelemetry provides three pillars of observability:

┌─────────────────────────────────────────────────────────────┐
│                     OpenTelemetry                           │
├───────────────────┬───────────────────┬─────────────────────┤
│      Traces       │      Metrics      │        Logs         │
├───────────────────┼───────────────────┼─────────────────────┤
│ Request flow      │ Counters/Gauges   │ Structured events   │
│ Latency analysis  │ Histograms        │ Context correlation │
│ Error tracking    │ Resource usage    │ Debug information   │
└───────────────────┴───────────────────┴─────────────────────┘

Setting Up OpenTelemetry

Node.js/TypeScript

npm install @opentelemetry/api \
            @opentelemetry/sdk-node \
            @opentelemetry/auto-instrumentations-node \
            @opentelemetry/exporter-trace-otlp-http \
            @opentelemetry/exporter-metrics-otlp-http

Basic Configuration

// tracing.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { Resource } from '@opentelemetry/resources';
import { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';

const sdk = new NodeSDK({
  resource: new Resource({
    [SEMRESATTRS_SERVICE_NAME]: 'my-service',
    [SEMRESATTRS_SERVICE_VERSION]: '1.0.0',
  }),
  traceExporter: new OTLPTraceExporter({
    url: 'http://localhost:4318/v1/traces',
  }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({
      url: 'http://localhost:4318/v1/metrics',
    }),
    exportIntervalMillis: 10000,
  }),
  instrumentations: [
    getNodeAutoInstrumentations({
      '@opentelemetry/instrumentation-fs': { enabled: false },
    }),
  ],
});

sdk.start();

process.on('SIGTERM', () => {
  sdk.shutdown().then(() => process.exit(0));
});

export { sdk };

Initialize Before App

// index.ts
import './tracing'; // Must be first!
import express from 'express';

const app = express();
// ... rest of your app

Distributed Tracing

Manual Instrumentation

import { trace, SpanStatusCode, SpanKind } from '@opentelemetry/api';

const tracer = trace.getTracer('my-service');

async function processOrder(orderId: string): Promise<Order> {
  return tracer.startActiveSpan('processOrder', async (span) => {
    try {
      span.setAttribute('order.id', orderId);
      
      // Create child spans for sub-operations
      const order = await tracer.startActiveSpan('fetchOrder', async (fetchSpan) => {
        fetchSpan.setAttribute('db.operation', 'SELECT');
        const result = await db.orders.findById(orderId);
        fetchSpan.end();
        return result;
      });
      
      await tracer.startActiveSpan('validateInventory', async (validateSpan) => {
        validateSpan.setAttribute('items.count', order.items.length);
        await validateInventory(order.items);
        validateSpan.end();
      });
      
      await tracer.startActiveSpan('processPayment', async (paymentSpan) => {
        paymentSpan.setAttribute('payment.amount', order.total);
        paymentSpan.setAttributes({
          'payment.currency': 'USD',
          'payment.method': order.paymentMethod,
        });
        await chargePayment(order);
        paymentSpan.end();
      });
      
      span.setStatus({ code: SpanStatusCode.OK });
      return order;
      
    } catch (error) {
      span.setStatus({
        code: SpanStatusCode.ERROR,
        message: error.message,
      });
      span.recordException(error);
      throw error;
      
    } finally {
      span.end();
    }
  });
}

Context Propagation

import { context, propagation, trace } from '@opentelemetry/api';

// Extract context from incoming request
app.use((req, res, next) => {
  const parentContext = propagation.extract(context.active(), req.headers);
  context.with(parentContext, () => next());
});

// Inject context into outgoing request
async function callExternalService(data: any) {
  const headers: Record<string, string> = {};
  propagation.inject(context.active(), headers);
  
  return fetch('http://other-service/api', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      ...headers, // Trace context headers
    },
    body: JSON.stringify(data),
  });
}

Metrics

Creating Metrics

import { metrics, ValueType } from '@opentelemetry/api';

const meter = metrics.getMeter('my-service');

// Counter - for counting events
const requestCounter = meter.createCounter('http_requests_total', {
  description: 'Total number of HTTP requests',
  unit: '1',
});

// Histogram - for measuring distributions
const requestDuration = meter.createHistogram('http_request_duration_ms', {
  description: 'HTTP request duration in milliseconds',
  unit: 'ms',
});

// Gauge - for current values (using observable)
const activeConnections = meter.createObservableGauge('active_connections', {
  description: 'Number of active connections',
});

activeConnections.addCallback((result) => {
  result.observe(connectionPool.activeCount, { pool: 'main' });
});

// UpDownCounter - for values that go up and down
const queueSize = meter.createUpDownCounter('queue_size', {
  description: 'Current queue size',
});

Using Metrics

app.use((req, res, next) => {
  const startTime = Date.now();
  
  res.on('finish', () => {
    const duration = Date.now() - startTime;
    const attributes = {
      'http.method': req.method,
      'http.route': req.route?.path || 'unknown',
      'http.status_code': res.statusCode,
    };
    
    requestCounter.add(1, attributes);
    requestDuration.record(duration, attributes);
  });
  
  next();
});

Custom Business Metrics

const orderMetrics = {
  ordersCreated: meter.createCounter('orders_created_total'),
  orderValue: meter.createHistogram('order_value_usd', {
    description: 'Order value in USD',
    boundaries: [10, 50, 100, 250, 500, 1000],
  }),
  orderProcessingTime: meter.createHistogram('order_processing_seconds'),
};

async function createOrder(order: Order) {
  const startTime = Date.now();
  
  try {
    const result = await db.orders.create(order);
    
    orderMetrics.ordersCreated.add(1, {
      'order.type': order.type,
      'order.status': 'success',
    });
    orderMetrics.orderValue.record(order.total);
    
    return result;
  } finally {
    const duration = (Date.now() - startTime) / 1000;
    orderMetrics.orderProcessingTime.record(duration);
  }
}

Structured Logging

Integrating with OpenTelemetry

import { trace, context } from '@opentelemetry/api';
import pino from 'pino';

const logger = pino({
  mixin() {
    const span = trace.getActiveSpan();
    if (span) {
      const spanContext = span.spanContext();
      return {
        trace_id: spanContext.traceId,
        span_id: spanContext.spanId,
        trace_flags: spanContext.traceFlags,
      };
    }
    return {};
  },
});

// Now logs automatically include trace context
logger.info({ orderId: '123' }, 'Processing order');
// Output: {"level":30,"trace_id":"abc...","span_id":"def...","orderId":"123","msg":"Processing order"}

OpenTelemetry Logs API

import { logs, SeverityNumber } from '@opentelemetry/api-logs';

const loggerProvider = logs.getLoggerProvider();
const otelLogger = loggerProvider.getLogger('my-service');

function logWithContext(message: string, attributes: Record<string, unknown>) {
  const span = trace.getActiveSpan();
  
  otelLogger.emit({
    severityNumber: SeverityNumber.INFO,
    body: message,
    attributes: {
      ...attributes,
      'service.name': 'my-service',
    },
    context: span ? trace.setSpan(context.active(), span) : context.active(),
  });
}

OpenTelemetry Collector

Configuration

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024
  
  memory_limiter:
    check_interval: 1s
    limit_mib: 1000
  
  attributes:
    actions:
      - key: environment
        value: production
        action: insert

exporters:
  otlp/jaeger:
    endpoint: jaeger:4317
    tls:
      insecure: true
  
  prometheus:
    endpoint: 0.0.0.0:8889
  
  loki:
    endpoint: http://loki:3100/loki/api/v1/push

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, attributes]
      exporters: [otlp/jaeger]
    
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus]
    
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [loki]

Docker Compose

version: '3.8'
services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "8889:8889"   # Prometheus metrics

  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686" # UI
      - "4317"        # OTLP gRPC

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana

volumes:
  grafana-data:

Auto-Instrumentation

Supported Libraries

OpenTelemetry auto-instruments these libraries automatically:

  • HTTP: Express, Fastify, Koa, Hapi
  • Databases: PostgreSQL, MySQL, MongoDB, Redis
  • Messaging: Kafka, RabbitMQ, AWS SQS
  • Cloud: AWS SDK, GCP, Azure
  • GraphQL: Apollo, graphql-js

Configuration

import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';

const instrumentations = getNodeAutoInstrumentations({
  '@opentelemetry/instrumentation-http': {
    ignoreIncomingPaths: ['/health', '/ready'],
    requestHook: (span, request) => {
      span.setAttribute('custom.header', request.headers['x-custom']);
    },
  },
  '@opentelemetry/instrumentation-express': {
    enabled: true,
  },
  '@opentelemetry/instrumentation-pg': {
    enhancedDatabaseReporting: true,
  },
});

Sampling Strategies

Head-Based Sampling

import { TraceIdRatioBasedSampler, ParentBasedSampler } from '@opentelemetry/sdk-trace-base';

// Sample 10% of traces
const sampler = new ParentBasedSampler({
  root: new TraceIdRatioBasedSampler(0.1),
});

Custom Sampler

import { Sampler, SamplingDecision, SamplingResult } from '@opentelemetry/sdk-trace-base';

class PrioritySampler implements Sampler {
  shouldSample(context, traceId, spanName, spanKind, attributes): SamplingResult {
    // Always sample errors
    if (attributes['error'] === true) {
      return { decision: SamplingDecision.RECORD_AND_SAMPLED };
    }
    
    // Always sample VIP customers
    if (attributes['customer.tier'] === 'vip') {
      return { decision: SamplingDecision.RECORD_AND_SAMPLED };
    }
    
    // Sample 10% of everything else
    const random = Math.random();
    if (random < 0.1) {
      return { decision: SamplingDecision.RECORD_AND_SAMPLED };
    }
    
    return { decision: SamplingDecision.NOT_RECORD };
  }
  
  toString(): string {
    return 'PrioritySampler';
  }
}

Best Practices

1. Semantic Conventions

Use standard attribute names:

import {
  SEMATTRS_HTTP_METHOD,
  SEMATTRS_HTTP_STATUS_CODE,
  SEMATTRS_HTTP_URL,
  SEMATTRS_DB_SYSTEM,
  SEMATTRS_DB_OPERATION,
} from '@opentelemetry/semantic-conventions';

span.setAttributes({
  [SEMATTRS_HTTP_METHOD]: 'GET',
  [SEMATTRS_HTTP_STATUS_CODE]: 200,
  [SEMATTRS_HTTP_URL]: 'https://api.example.com/users',
});

2. Error Handling

span.setStatus({
  code: SpanStatusCode.ERROR,
  message: error.message,
});

span.recordException(error, {
  'exception.escaped': true,
});

3. Span Events

span.addEvent('cache_miss', {
  'cache.key': 'user:123',
  'cache.type': 'redis',
});

span.addEvent('retry_attempt', {
  'retry.count': 2,
  'retry.delay_ms': 100,
});

4. Resource Detection

import { envDetector, processDetector, hostDetector } from '@opentelemetry/resources';
import { awsEc2Detector } from '@opentelemetry/resource-detector-aws';

const resource = await detectResources({
  detectors: [envDetector, processDetector, hostDetector, awsEc2Detector],
});

Conclusion

OpenTelemetry provides:

  • Vendor neutrality - Switch backends without code changes
  • Unified observability - Traces, metrics, and logs correlated
  • Automatic instrumentation - Minimal code changes
  • Industry standard - Wide adoption and support

Start with auto-instrumentation, then add custom spans and metrics where you need deeper visibility.

Resources


이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)