Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector OpenTelemetry sink fails with 400 bad request #22054

Open
navodveduth opened this issue Dec 18, 2024 · 2 comments
Open

Vector OpenTelemetry sink fails with 400 bad request #22054

navodveduth opened this issue Dec 18, 2024 · 2 comments
Labels
type: bug A code related bug.

Comments

@navodveduth
Copy link

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

When using the opentelemetry sink in vector to send metrics derived from logs to an opentelemetry collector, vector repeatedly fails with 400 bad request. These errors appear in the vector agent logs, but the otel collector does not show any related error logs or indications of receiving malformed payloads. As a result, metrics are not processed by the otel collector as expected

Configuration

apiVersion: observability.kaasops.io/v1alpha1
kind: ClusterVectorPipeline
metadata:
  name: log-level-metrics-pipeline
spec:
  sources:
    kubernetes_logs:
      type: kubernetes_logs
      pod_annotation_fields:
        container_image: container_image
        container_name: container_name
        pod_name: pod_name
        pod_namespace: pod_namespace
      fingerprint_lines: 1
      ignore_older_secs: 600

  transforms:
    log_level_tagger:
      type: remap
      inputs:
        - kubernetes_logs
      source: |
        if exists(.message) {
          log_message = string!(.message)
          log_level = "INFO"

          if contains(upcase(log_message), "ERROR") {
            log_level = "ERROR"
          } else if contains(upcase(log_message), "WARN") {
            log_level = "WARN"
          } else if contains(upcase(log_message), "DEBUG") {
            log_level = "DEBUG"
          }

          .log_level = log_level

          .attributes = {
            "log_level": log_level
          }

          if exists(.pod_name) {
            .attributes.pod_name = string!(.pod_name)
          } else {
            .attributes.pod_name = "unknown_pod"
          }

          if exists(.pod_namespace) {
            .attributes.pod_namespace = string!(.pod_namespace)
          } else {
            .attributes.pod_namespace = "unknown_namespace"
          }

          .timestamp = now()
        } else {
          .log_level = "UNKNOWN"
          .attributes = {
            "log_level": "UNKNOWN",
            "pod_name": "unknown_pod",
            "pod_namespace": "unknown_namespace"
          }
        }

    log_to_metric:
      type: log_to_metric
      inputs:
        - log_level_tagger
      metrics:
        - type: counter
          name: log_level_count
          field: log_level
          tags:
            log_level: "{{attributes.log_level}}"
            pod_name: "{{attributes.pod_name}}"
            pod_namespace: "{{attributes.pod_namespace}}"

  sinks:
    otel_collector_sink:
      type: opentelemetry
      inputs:
        - log_to_metric
      protocol:
        type: http
        uri: "http://otel-collector.otel:4318/v1/logs"
        method: post
        encoding:
          codec: json
          framing:
            method: newline_delimited
      batch:
        max_events: 100
        max_bytes: 1048576
        timeout_secs: 10
      retry:
        initial_interval_secs: 1
        max_interval_secs: 30
        max_retries: 5
      healthcheck:
        enabled: true
        interval_secs: 60

Version

0.43.0

Debug Output

2024-12-18T14:40:25.520217Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=456}: vector::sinks::util::retries: Not retriable; dropping the request. reason="Http status: 400 Bad Request" internal_log_rate_limit=true
2024-12-18T14:40:25.520229Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=456}: vector_common::internal_event::service: Internal log [Service call failed. No retries or retries exhausted.] has been suppressed 4 times.
2024-12-18T14:40:25.520231Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=456}: vector_common::internal_event::service: Service call failed. No retries or retries exhausted. error=None request_id=456 error_type="request_failed" stage="sending" internal_log_rate_limit=true
2024-12-18T14:40:25.520266Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=456}: vector_common::internal_event::component_events_dropped: Internal log [Events dropped] has been suppressed 4 times.
2024-12-18T14:40:25.520268Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=456}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=2 reason="Service call failed. No retries or retries exhausted." internal_log_rate_limit=true
2024-12-18T14:40:26.554810Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=457}: vector::sinks::util::retries: Internal log [Not retriable; dropping the request.] is being suppressed to avoid flooding.
2024-12-18T14:40:26.554830Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=457}: vector_common::internal_event::service: Internal log [Service call failed. No retries or retries exhausted.] is being suppressed to avoid flooding.
2024-12-18T14:40:26.554840Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=457}: vector_common::internal_event::component_events_dropped: Internal log [Events dropped] is being suppressed to avoid flooding.
2024-12-18T14:40:43.994791Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=459}: vector::sinks::util::retries: Internal log [Not retriable; dropping the request.] has been suppressed 2 times.
2024-12-18T14:40:43.994821Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=459}: vector::sinks::util::retries: Not retriable; dropping the request. reason="Http status: 400 Bad Request" internal_log_rate_limit=true
2024-12-18T14:40:43.994858Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=459}: vector_common::internal_event::service: Internal log [Service call failed. No retries or retries exhausted.] has been suppressed 2 times.

Example Data

No response

Additional Context

Both vector and the otel collector are running in a cluster. Even with debug logging enabled on the otel collector, there are no logs showing that it received the payload or encountered any issues. However, when the same payload is sent to the otel collector using a curl request, it is logged and processed correctly

OpenTelemetry collector config:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
processors:
  batch: {}
  memory_limiter:
    limit_mib: 1000
    spike_limit_mib: 512
    check_interval: 5s
extensions:
  zpages: {}
exporters:
  logging:
    loglevel: debug
    sampling_initial: 5
    sampling_thereafter: 200
  prometheus:
    endpoint: 0.0.0.0:8889
    metric_expiration: 1m
service:
  extensions: [zpages]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [logging]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [logging, prometheus]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [logging, file]

References

No response

@navodveduth navodveduth added the type: bug A code related bug. label Dec 18, 2024
@jszwedko
Copy link
Member

Hi @navodveduth ,

The opentelemetry sink currently only supports logs as input and not metrics. Metrics support is being tracked by #17310. I'll close this out, but let me know if I'm misunderstanding this issue. Feel free to follow the other issue for metrics support.

@jszwedko jszwedko closed this as not planned Won't fix, can't repro, duplicate, stale Dec 18, 2024
@pront
Copy link
Member

pront commented Jan 4, 2025

Hmm, there is a bug here. This should delegate to underlying protocol, essentially this.

In this case, with this encoding:

encoding:
  codec: json
  framing:
    method: newline_delimited

the underlying HTTP sink should be able to handle it. So I suspect this has to do with the event format produced by the log_to_metric transform.

I would recommend vector tap to inspect the event format and then manually sending this event to your OTEL endpoint and observe if it's ingested or not. You could use curl or a Python script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

3 participants