OpenTelemetry: Complete Guide to Modern Observability

OpenTelemetry (OTel) has become the industry standard for observability, providing vendor-neutral APIs and SDKs for distributed tracing, metrics, and logging. This comprehensive guide will help you implement production-grade observability in your applications.

What is OpenTelemetry?

OpenTelemetry provides three pillars of observability:

┌─────────────────────────────────────────────────────────────┐
│                     OpenTelemetry                           │
├───────────────────┬───────────────────┬─────────────────────┤
│      Traces       │      Metrics      │        Logs         │
├───────────────────┼───────────────────┼─────────────────────┤
│ Request flow      │ Counters/Gauges   │ Structured events   │
│ Latency analysis  │ Histograms        │ Context correlation │
│ Error tracking    │ Resource usage    │ Debug information   │
└───────────────────┴───────────────────┴─────────────────────┘

Setting Up OpenTelemetry

Node.js/TypeScript

npm install @opentelemetry/api \
            @opentelemetry/sdk-node \
            @opentelemetry/auto-instrumentations-node \
            @opentelemetry/exporter-trace-otlp-http \
            @opentelemetry/exporter-metrics-otlp-http

Basic Configuration

// tracing.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { Resource } from '@opentelemetry/resources';
import { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';

const sdk = new NodeSDK({
  resource: new Resource({
    [SEMRESATTRS_SERVICE_NAME]: 'my-service',
    [SEMRESATTRS_SERVICE_VERSION]: '1.0.0',
  }),
  traceExporter: new OTLPTraceExporter({
    url: 'http://localhost:4318/v1/traces',
  }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({
      url: 'http://localhost:4318/v1/metrics',
    }),
    exportIntervalMillis: 10000,
  }),
  instrumentations: [
    getNodeAutoInstrumentations({
      '@opentelemetry/instrumentation-fs': { enabled: false },
    }),
  ],
});

sdk.start();

process.on('SIGTERM', () => {
  sdk.shutdown().then(() => process.exit(0));
});

export { sdk };

Initialize Before App

// index.ts
import './tracing'; // Must be first!
import express from 'express';

const app = express();
// ... rest of your app

Distributed Tracing

Manual Instrumentation

import { trace, SpanStatusCode, SpanKind } from '@opentelemetry/api';

const tracer = trace.getTracer('my-service');

async function processOrder(orderId: string): Promise<Order> {
  return tracer.startActiveSpan('processOrder', async (span) => {
    try {
      span.setAttribute('order.id', orderId);
      
      // Create child spans for sub-operations
      const order = await tracer.startActiveSpan('fetchOrder', async (fetchSpan) => {
        fetchSpan.setAttribute('db.operation', 'SELECT');
        const result = await db.orders.findById(orderId);
        fetchSpan.end();
        return result;
      });
      
      await tracer.startActiveSpan('validateInventory', async (validateSpan) => {
        validateSpan.setAttribute('items.count', order.items.length);
        await validateInventory(order.items);
        validateSpan.end();
      });
      
      await tracer.startActiveSpan('processPayment', async (paymentSpan) => {
        paymentSpan.setAttribute('payment.amount', order.total);
        paymentSpan.setAttributes({
          'payment.currency': 'USD',
          'payment.method': order.paymentMethod,
        });
        await chargePayment(order);
        paymentSpan.end();
      });
      
      span.setStatus({ code: SpanStatusCode.OK });
      return order;
      
    } catch (error) {
      span.setStatus({
        code: SpanStatusCode.ERROR,
        message: error.message,
      });
      span.recordException(error);
      throw error;
      
    } finally {
      span.end();
    }
  });
}

Context Propagation

import { context, propagation, trace } from '@opentelemetry/api';

// Extract context from incoming request
app.use((req, res, next) => {
  const parentContext = propagation.extract(context.active(), req.headers);
  context.with(parentContext, () => next());
});

// Inject context into outgoing request
async function callExternalService(data: any) {
  const headers: Record<string, string> = {};
  propagation.inject(context.active(), headers);
  
  return fetch('http://other-service/api', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      ...headers, // Trace context headers
    },
    body: JSON.stringify(data),
  });
}

Metrics

Creating Metrics

import { metrics, ValueType } from '@opentelemetry/api';

const meter = metrics.getMeter('my-service');

// Counter - for counting events
const requestCounter = meter.createCounter('http_requests_total', {
  description: 'Total number of HTTP requests',
  unit: '1',
});

// Histogram - for measuring distributions
const requestDuration = meter.createHistogram('http_request_duration_ms', {
  description: 'HTTP request duration in milliseconds',
  unit: 'ms',
});

// Gauge - for current values (using observable)
const activeConnections = meter.createObservableGauge('active_connections', {
  description: 'Number of active connections',
});

activeConnections.addCallback((result) => {
  result.observe(connectionPool.activeCount, { pool: 'main' });
});

// UpDownCounter - for values that go up and down
const queueSize = meter.createUpDownCounter('queue_size', {
  description: 'Current queue size',
});

Using Metrics

app.use((req, res, next) => {
  const startTime = Date.now();
  
  res.on('finish', () => {
    const duration = Date.now() - startTime;
    const attributes = {
      'http.method': req.method,
      'http.route': req.route?.path || 'unknown',
      'http.status_code': res.statusCode,
    };
    
    requestCounter.add(1, attributes);
    requestDuration.record(duration, attributes);
  });
  
  next();
});

Custom Business Metrics

const orderMetrics = {
  ordersCreated: meter.createCounter('orders_created_total'),
  orderValue: meter.createHistogram('order_value_usd', {
    description: 'Order value in USD',
    boundaries: [10, 50, 100, 250, 500, 1000],
  }),
  orderProcessingTime: meter.createHistogram('order_processing_seconds'),
};

async function createOrder(order: Order) {
  const startTime = Date.now();
  
  try {
    const result = await db.orders.create(order);
    
    orderMetrics.ordersCreated.add(1, {
      'order.type': order.type,
      'order.status': 'success',
    });
    orderMetrics.orderValue.record(order.total);
    
    return result;
  } finally {
    const duration = (Date.now() - startTime) / 1000;
    orderMetrics.orderProcessingTime.record(duration);
  }
}

Structured Logging

Integrating with OpenTelemetry

import { trace, context } from '@opentelemetry/api';
import pino from 'pino';

const logger = pino({
  mixin() {
    const span = trace.getActiveSpan();
    if (span) {
      const spanContext = span.spanContext();
      return {
        trace_id: spanContext.traceId,
        span_id: spanContext.spanId,
        trace_flags: spanContext.traceFlags,
      };
    }
    return {};
  },
});

// Now logs automatically include trace context
logger.info({ orderId: '123' }, 'Processing order');
// Output: {"level":30,"trace_id":"abc...","span_id":"def...","orderId":"123","msg":"Processing order"}

OpenTelemetry Logs API

import { logs, SeverityNumber } from '@opentelemetry/api-logs';

const loggerProvider = logs.getLoggerProvider();
const otelLogger = loggerProvider.getLogger('my-service');

function logWithContext(message: string, attributes: Record<string, unknown>) {
  const span = trace.getActiveSpan();
  
  otelLogger.emit({
    severityNumber: SeverityNumber.INFO,
    body: message,
    attributes: {
      ...attributes,
      'service.name': 'my-service',
    },
    context: span ? trace.setSpan(context.active(), span) : context.active(),
  });
}

OpenTelemetry Collector

Configuration

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024
  
  memory_limiter:
    check_interval: 1s
    limit_mib: 1000
  
  attributes:
    actions:
      - key: environment
        value: production
        action: insert

exporters:
  otlp/jaeger:
    endpoint: jaeger:4317
    tls:
      insecure: true
  
  prometheus:
    endpoint: 0.0.0.0:8889
  
  loki:
    endpoint: http://loki:3100/loki/api/v1/push

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, attributes]
      exporters: [otlp/jaeger]
    
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus]
    
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [loki]

Docker Compose

version: '3.8'
services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "8889:8889"   # Prometheus metrics

  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686" # UI
      - "4317"        # OTLP gRPC

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana

volumes:
  grafana-data:

Auto-Instrumentation

Supported Libraries

OpenTelemetry auto-instruments these libraries automatically:

HTTP: Express, Fastify, Koa, Hapi
Databases: PostgreSQL, MySQL, MongoDB, Redis
Messaging: Kafka, RabbitMQ, AWS SQS
Cloud: AWS SDK, GCP, Azure
GraphQL: Apollo, graphql-js

Configuration

import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';

const instrumentations = getNodeAutoInstrumentations({
  '@opentelemetry/instrumentation-http': {
    ignoreIncomingPaths: ['/health', '/ready'],
    requestHook: (span, request) => {
      span.setAttribute('custom.header', request.headers['x-custom']);
    },
  },
  '@opentelemetry/instrumentation-express': {
    enabled: true,
  },
  '@opentelemetry/instrumentation-pg': {
    enhancedDatabaseReporting: true,
  },
});

Sampling Strategies

Head-Based Sampling

import { TraceIdRatioBasedSampler, ParentBasedSampler } from '@opentelemetry/sdk-trace-base';

// Sample 10% of traces
const sampler = new ParentBasedSampler({
  root: new TraceIdRatioBasedSampler(0.1),
});

Custom Sampler

import { Sampler, SamplingDecision, SamplingResult } from '@opentelemetry/sdk-trace-base';

class PrioritySampler implements Sampler {
  shouldSample(context, traceId, spanName, spanKind, attributes): SamplingResult {
    // Always sample errors
    if (attributes['error'] === true) {
      return { decision: SamplingDecision.RECORD_AND_SAMPLED };
    }
    
    // Always sample VIP customers
    if (attributes['customer.tier'] === 'vip') {
      return { decision: SamplingDecision.RECORD_AND_SAMPLED };
    }
    
    // Sample 10% of everything else
    const random = Math.random();
    if (random < 0.1) {
      return { decision: SamplingDecision.RECORD_AND_SAMPLED };
    }
    
    return { decision: SamplingDecision.NOT_RECORD };
  }
  
  toString(): string {
    return 'PrioritySampler';
  }
}

Best Practices

1. Semantic Conventions

Use standard attribute names:

import {
  SEMATTRS_HTTP_METHOD,
  SEMATTRS_HTTP_STATUS_CODE,
  SEMATTRS_HTTP_URL,
  SEMATTRS_DB_SYSTEM,
  SEMATTRS_DB_OPERATION,
} from '@opentelemetry/semantic-conventions';

span.setAttributes({
  [SEMATTRS_HTTP_METHOD]: 'GET',
  [SEMATTRS_HTTP_STATUS_CODE]: 200,
  [SEMATTRS_HTTP_URL]: 'https://api.example.com/users',
});

2. Error Handling

span.setStatus({
  code: SpanStatusCode.ERROR,
  message: error.message,
});

span.recordException(error, {
  'exception.escaped': true,
});

3. Span Events

span.addEvent('cache_miss', {
  'cache.key': 'user:123',
  'cache.type': 'redis',
});

span.addEvent('retry_attempt', {
  'retry.count': 2,
  'retry.delay_ms': 100,
});

4. Resource Detection

import { envDetector, processDetector, hostDetector } from '@opentelemetry/resources';
import { awsEc2Detector } from '@opentelemetry/resource-detector-aws';

const resource = await detectResources({
  detectors: [envDetector, processDetector, hostDetector, awsEc2Detector],
});

Conclusion

OpenTelemetry provides:

Vendor neutrality - Switch backends without code changes
Unified observability - Traces, metrics, and logs correlated
Automatic instrumentation - Minimal code changes
Industry standard - Wide adoption and support

Start with auto-instrumentation, then add custom spans and metrics where you need deeper visibility.

Resources

이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)

OpenTelemetry: Complete Guide to Modern Observability

What is OpenTelemetry?

Setting Up OpenTelemetry

Node.js/TypeScript

Basic Configuration

Initialize Before App

Distributed Tracing

Manual Instrumentation

Context Propagation

Metrics

Creating Metrics

Using Metrics

Custom Business Metrics

Structured Logging

Integrating with OpenTelemetry

OpenTelemetry Logs API

OpenTelemetry Collector

Configuration

Docker Compose

Auto-Instrumentation

Supported Libraries

Configuration

Sampling Strategies

Head-Based Sampling

Custom Sampler

Best Practices

1. Semantic Conventions

2. Error Handling

3. Span Events

4. Resource Detection

Conclusion

Resources

Share this post