Vector OpenTelemetry sink fails with 400 bad request #22054

navodveduth · 2024-12-18T15:19:31Z

A note for the community

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

When using the opentelemetry sink in vector to send metrics derived from logs to an opentelemetry collector, vector repeatedly fails with 400 bad request. These errors appear in the vector agent logs, but the otel collector does not show any related error logs or indications of receiving malformed payloads. As a result, metrics are not processed by the otel collector as expected

Configuration

apiVersion: observability.kaasops.io/v1alpha1
kind: ClusterVectorPipeline
metadata:
  name: log-level-metrics-pipeline
spec:
  sources:
    kubernetes_logs:
      type: kubernetes_logs
      pod_annotation_fields:
        container_image: container_image
        container_name: container_name
        pod_name: pod_name
        pod_namespace: pod_namespace
      fingerprint_lines: 1
      ignore_older_secs: 600

  transforms:
    log_level_tagger:
      type: remap
      inputs:
        - kubernetes_logs
      source: |
        if exists(.message) {
          log_message = string!(.message)
          log_level = "INFO"

          if contains(upcase(log_message), "ERROR") {
            log_level = "ERROR"
          } else if contains(upcase(log_message), "WARN") {
            log_level = "WARN"
          } else if contains(upcase(log_message), "DEBUG") {
            log_level = "DEBUG"
          }

          .log_level = log_level

          .attributes = {
            "log_level": log_level
          }

          if exists(.pod_name) {
            .attributes.pod_name = string!(.pod_name)
          } else {
            .attributes.pod_name = "unknown_pod"
          }

          if exists(.pod_namespace) {
            .attributes.pod_namespace = string!(.pod_namespace)
          } else {
            .attributes.pod_namespace = "unknown_namespace"
          }

          .timestamp = now()
        } else {
          .log_level = "UNKNOWN"
          .attributes = {
            "log_level": "UNKNOWN",
            "pod_name": "unknown_pod",
            "pod_namespace": "unknown_namespace"
          }
        }

    log_to_metric:
      type: log_to_metric
      inputs:
        - log_level_tagger
      metrics:
        - type: counter
          name: log_level_count
          field: log_level
          tags:
            log_level: "{{attributes.log_level}}"
            pod_name: "{{attributes.pod_name}}"
            pod_namespace: "{{attributes.pod_namespace}}"

  sinks:
    otel_collector_sink:
      type: opentelemetry
      inputs:
        - log_to_metric
      protocol:
        type: http
        uri: "http://otel-collector.otel:4318/v1/logs"
        method: post
        encoding:
          codec: json
          framing:
            method: newline_delimited
      batch:
        max_events: 100
        max_bytes: 1048576
        timeout_secs: 10
      retry:
        initial_interval_secs: 1
        max_interval_secs: 30
        max_retries: 5
      healthcheck:
        enabled: true
        interval_secs: 60

Version

0.43.0

Debug Output

2024-12-18T14:40:25.520217Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=456}: vector::sinks::util::retries: Not retriable; dropping the request. reason="Http status: 400 Bad Request" internal_log_rate_limit=true
2024-12-18T14:40:25.520229Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=456}: vector_common::internal_event::service: Internal log [Service call failed. No retries or retries exhausted.] has been suppressed 4 times.
2024-12-18T14:40:25.520231Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=456}: vector_common::internal_event::service: Service call failed. No retries or retries exhausted. error=None request_id=456 error_type="request_failed" stage="sending" internal_log_rate_limit=true
2024-12-18T14:40:25.520266Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=456}: vector_common::internal_event::component_events_dropped: Internal log [Events dropped] has been suppressed 4 times.
2024-12-18T14:40:25.520268Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=456}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=2 reason="Service call failed. No retries or retries exhausted." internal_log_rate_limit=true
2024-12-18T14:40:26.554810Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=457}: vector::sinks::util::retries: Internal log [Not retriable; dropping the request.] is being suppressed to avoid flooding.
2024-12-18T14:40:26.554830Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=457}: vector_common::internal_event::service: Internal log [Service call failed. No retries or retries exhausted.] is being suppressed to avoid flooding.
2024-12-18T14:40:26.554840Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=457}: vector_common::internal_event::component_events_dropped: Internal log [Events dropped] is being suppressed to avoid flooding.
2024-12-18T14:40:43.994791Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=459}: vector::sinks::util::retries: Internal log [Not retriable; dropping the request.] has been suppressed 2 times.
2024-12-18T14:40:43.994821Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=459}: vector::sinks::util::retries: Not retriable; dropping the request. reason="Http status: 400 Bad Request" internal_log_rate_limit=true
2024-12-18T14:40:43.994858Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=459}: vector_common::internal_event::service: Internal log [Service call failed. No retries or retries exhausted.] has been suppressed 2 times.

Example Data

No response

Additional Context

Both vector and the otel collector are running in a cluster. Even with debug logging enabled on the otel collector, there are no logs showing that it received the payload or encountered any issues. However, when the same payload is sent to the otel collector using a curl request, it is logged and processed correctly

OpenTelemetry collector config:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
processors:
  batch: {}
  memory_limiter:
    limit_mib: 1000
    spike_limit_mib: 512
    check_interval: 5s
extensions:
  zpages: {}
exporters:
  logging:
    loglevel: debug
    sampling_initial: 5
    sampling_thereafter: 200
  prometheus:
    endpoint: 0.0.0.0:8889
    metric_expiration: 1m
service:
  extensions: [zpages]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [logging]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [logging, prometheus]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [logging, file]

References

No response

The text was updated successfully, but these errors were encountered:

jszwedko · 2024-12-18T16:00:48Z

Hi @navodveduth ,

The opentelemetry sink currently only supports logs as input and not metrics. Metrics support is being tracked by #17310. I'll close this out, but let me know if I'm misunderstanding this issue. Feel free to follow the other issue for metrics support.

pront · 2025-01-04T00:46:26Z

Hmm, there is a bug here. This should delegate to underlying protocol, essentially this.

In this case, with this encoding:

encoding:
  codec: json
  framing:
    method: newline_delimited

the underlying HTTP sink should be able to handle it. So I suspect this has to do with the event format produced by the log_to_metric transform.

I would recommend vector tap to inspect the event format and then manually sending this event to your OTEL endpoint and observe if it's ingested or not. You could use curl or a Python script.

navodveduth added the type: bug A code related bug. label Dec 18, 2024

jszwedko closed this as not planned Won't fix, can't repro, duplicate, stale Dec 18, 2024

pront reopened this Jan 4, 2025

pront mentioned this issue Jan 6, 2025

fix(opentelemetry sink): input() should delegate to internal config input() #22126

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector OpenTelemetry sink fails with 400 bad request #22054

Vector OpenTelemetry sink fails with 400 bad request #22054

navodveduth commented Dec 18, 2024

jszwedko commented Dec 18, 2024

pront commented Jan 4, 2025

Vector OpenTelemetry sink fails with 400 bad request #22054

Vector OpenTelemetry sink fails with 400 bad request #22054

Comments

navodveduth commented Dec 18, 2024

A note for the community

Problem

Configuration

Version

Debug Output

Example Data

Additional Context

References

jszwedko commented Dec 18, 2024

pront commented Jan 4, 2025