OpenTelemetry in 2026: Collector Patterns for Logs, Metrics, Traces, and Cost Control

OpenTelemetry has become the default instrumentation framework for modern infrastructure. The specification is stable for traces, metrics, and logs; the Collector has matured into a reliable production component; and most observability backends accept OTLP natively. The practical question in 2026 is not whether to adopt OpenTelemetry, but how to deploy the Collector effectively — with the right pipeline patterns, sensible defaults, and cost controls that keep your telemetry budget from spiralling.

This guide covers Collector deployment patterns, pipeline configuration for each signal type, and the specific techniques for controlling telemetry volume and cost.

The Collector's role

The OpenTelemetry Collector sits between your instrumented applications and your observability backend. It receives telemetry data (traces, metrics, logs), processes it (filtering, sampling, enriching, batching), and exports it to one or more destinations.

Why not send directly from applications to backends?

Decoupling: changing your backend doesn't require redeploying applications
Processing: the Collector handles sampling, filtering, and enrichment centrally
Reliability: the Collector can buffer and retry during backend outages
Cost control: filtering and sampling at the Collector reduces what reaches (and is billed by) your backend

Deployment patterns

Pattern 1: Agent per host (sidecar or daemonset)

Deploy one Collector instance per host (VM, container host, or Kubernetes node):

# Kubernetes DaemonSet pattern
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-collector-agent
spec:
  template:
    spec:
      containers:
        - name: otel-collector
          image: otel/opentelemetry-collector-contrib:0.96.0
          ports:
            - containerPort: 4317  # OTLP gRPC
            - containerPort: 4318  # OTLP HTTP

Applications on each host send telemetry to their local Collector via localhost:4317. The agent Collector does initial processing and forwards to a central gateway.

Best for: Kubernetes environments, host-level metrics collection, low-latency local processing.

Pattern 2: Central gateway

A single (or small cluster of) Collector instance receives telemetry from all applications:

Applications → Central Collector Gateway → Backend

Best for: Small deployments, simple architectures, environments where running agents on every host is impractical.

Risk: Single point of failure and potential bottleneck. Use multiple replicas behind a load balancer for production.

Pattern 3: Agent + gateway (recommended for scale)

Two-tier deployment:

Applications → Agent Collectors (per host) → Gateway Collector(s) → Backend(s)

Agent Collectors handle local receiving, basic processing, and batching. Gateway Collectors handle cross-host processing (tail sampling, aggregation) and export to backends.

Best for: Production environments at moderate to large scale.

Pipeline configuration

The Collector pipeline follows a receivers → processors → exporters model. Each signal type (traces, metrics, logs) has its own pipeline.

Traces pipeline

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 1024
  
  # Tail-based sampling (gateway only)
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: error-policy
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: latency-policy
        type: latency
        latency: {threshold_ms: 1000}
      - name: probabilistic-policy
        type: probabilistic
        probabilistic: {sampling_percentage: 10}

exporters:
  otlp:
    endpoint: "your-backend:4317"
    tls:
      insecure: false

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, tail_sampling]
      exporters: [otlp]

Metrics pipeline

processors:
  filter:
    metrics:
      exclude:
        match_type: regexp
        metric_names:
          - ".*_temp_.*"  # Exclude temporary metrics
  
  # Reduce cardinality
  metricstransform:
    transforms:
      - include: http_server_duration
        action: update
        operations:
          - action: aggregate_labels
            aggregation_type: sum
            label_set: [method, status_code]  # Keep only these labels

service:
  pipelines:
    metrics:
      receivers: [otlp, prometheus]
      processors: [filter, metricstransform, batch]
      exporters: [otlp]

Logs pipeline

receivers:
  filelog:
    include: [/var/log/app/*.log]
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.timestamp
          layout: "%Y-%m-%dT%H:%M:%S.%LZ"

processors:
  # Drop debug logs in production
  filter:
    logs:
      exclude:
        match_type: strict
        bodies: []
        severity_number:
          min: 1
          max: 4  # Drop TRACE and DEBUG

service:
  pipelines:
    logs:
      receivers: [otlp, filelog]
      processors: [filter, batch]
      exporters: [otlp]

Cost control techniques

Telemetry costs are driven by volume — specifically, the number of spans, metric data points, and log entries ingested by your backend. The Collector is your primary tool for controlling volume before it reaches the billing meter.

1. Sampling traces

Most backends charge per span. Reduce span volume with:

Head sampling (at the application/SDK level): decide to sample before the trace starts. Simple and efficient. sampling_percentage: 10 keeps 10% of traces.
Tail sampling (at the gateway Collector): decide after the trace completes. Keeps 100% of error and high-latency traces, samples the rest. More useful but requires the gateway to buffer complete traces.

Recommendation: Use tail sampling at the gateway. Keep all errors and slow traces; sample routine traces at 5–10%.

2. Filtering metrics

High-cardinality metrics (labels with many unique values) are the primary cost driver for metrics backends.

Drop unused metrics: filter out metrics no application consumes
Reduce label cardinality: aggregate or drop high-cardinality labels (user IDs, request IDs, full URLs)
Adjust collection interval: not everything needs 10-second resolution. Use 60-second intervals for non-critical metrics

3. Filtering logs

Logs are often the largest volume signal. Control costs by:

Dropping debug/trace severity levels in production
Sampling repetitive log lines: if the same error occurs 10,000 times per minute, you need a count, not 10,000 log entries
Parsing and extracting: convert unstructured logs into structured metrics where possible (e.g., error counts rather than individual error log lines)

4. Batching and compression

Always enable batching and compression in exporters:

exporters:
  otlp:
    endpoint: "backend:4317"
    compression: gzip
    sending_queue:
      enabled: true
      num_consumers: 10
      queue_size: 5000
    retry_on_failure:
      enabled: true

This reduces network overhead and improves throughput.

Common mistakes

Running the Collector with default configuration. The default accepts everything and exports everything. In production, you will pay for telemetry you never look at. Configure filtering and sampling from day one.

Tail sampling without enough memory. Tail sampling buffers complete traces in memory. If your decision_wait is too long or trace throughput is too high, the Collector runs out of memory. Monitor Collector memory usage and tune decision_wait.

Not monitoring the Collector itself. The Collector is infrastructure. Monitor its CPU, memory, queue depth, and dropped telemetry. The Collector exposes Prometheus metrics at /metrics by default.

Using head sampling when you need tail sampling. Head sampling decides before the trace starts, so it randomly drops error traces. If keeping all errors is important (it usually is), use tail sampling at the gateway.

Sending telemetry to multiple backends without filtering. If you export to both a metrics backend and a logging backend, ensure each pipeline only sends the relevant signal type. Cross-signal export wastes money.

Verification

Deploy the Collector and verify it starts: check logs for Everything is ready
Send test telemetry: otel-cli span --name "test" or equivalent
Verify telemetry appears in your backend
Check Collector metrics: curl http://localhost:8888/metrics | grep otelcol_receiver_accepted_spans
Test sampling: send 1000 traces and verify only the expected percentage reaches the backend
Test failure handling: stop the backend and verify the Collector queues and retries