Clickhouse Sink "upstream connect error or disconnect/reset before headers" after Clickhouse update #22116

seilerre · 2025-01-03T08:06:36Z

A note for the community

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

We are running Clickhouse and Vector in Kubernetes. Vector is receiving events from Kafka and is sinking those to Clickhouse. When updating Clickhouse, thereby rolling out a new Pod, data is incoming again without dropped data, however Vector is complaining with WARN messages. This doesn't go away on its own. A 503 seems to me, like Vector is trying to reach the old Clickhouse instances? Could this be possible? Failed requests are retried and written successfully. Requests seem to time out after 10 or 30s, as can be seen in the istio-proxy logs below. A restart of Vector helps, but this can't be the right solution.

2025-01-03T07:48:00.264670Z  WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=15212892}: vector::sinks::util::retries: Internal log [Retrying after response.] has been suppressed 22 times.
2025-01-03T07:48:00.264683Z  WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=15212892}: vector::sinks::util::retries: Retrying after response. reason=503 Service Unavailable: upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection timeout internal_log_rate_limit=true
2025-01-03T07:48:00.264715Z  WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=15212891}: vector::sinks::util::retries: Internal log [Retrying after response.] is being suppressed to avoid flooding.
2025-01-03T07:48:11.196311Z  WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=15213093}: vector::sinks::util::retries: Internal log [Retrying after response.] has been suppressed 24 times.
2025-01-03T07:48:11.196327Z  WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=15213093}: vector::sinks::util::retries: Retrying after response. reason=503 Service Unavailable: upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection timeout internal_log_rate_limit=true
2025-01-03T07:48:11.196372Z  WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=15213091}: vector::sinks::util::retries: Internal log [Retrying after response.] is being suppressed to avoid flooding.
2025-01-03T07:48:22.164840Z  WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=15213341}: vector::sinks::util::retries: Internal log [Retrying after response.] has been suppressed 29 times.

### Configuration

```text
clickhouse_sink:
        type: clickhouse
        compression: zstd
        healthcheck:
          enabled: true
        acknowledgements:
          enabled: true
        inputs:
          - clickhouse_tables
        endpoint: http://service-clickhouse-prod.clickhouse-prod.svc.cluster.local:8123
        format: json_each_row
        database: ep_sdc_edge
        table: '{{ "{{kafkaSrcTopicToDbName}}" }}'
        skip_unknown_fields: true
        auth:
          strategy: basic
          user: ${DEST_CLICKHOUSE_USERNAME}
          password: ${DEST_CLICKHOUSE_PASSWORD}
        date_time_best_effort: true
        encoding:
          timestamp_format: rfc3339
        batch:
          max_events: 2500
          timeout_secs: 2

Version

0.42.0-distroless-libc

Debug Output

Debug Logs of one request, that fails after rolling the Clickhouse Pods

k logs vector pod -f | grep "34123"
2025-01-03T09:53:57.063453Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}:http: vector::internal_events::http_client: Sending HTTP request. uri=http://clickhouse-clickhouse.clickhouse.svc.cluster.local:8123/?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22ep_sdc_edge%22.%22some_table%22+FORMAT+JSONEachRow method=POST version=HTTP/1.1 headers={"content-type": "application/x-ndjson", "content-length": "804", "content-encoding": "zstd", "authorization": Sensitive, "accept-encoding": "zstd,gzip,deflate,br", "user-agent": "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)"} body=[804 bytes]
2025-01-03T09:53:57.063469Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}:http: hyper::client::pool: reuse idle connection for ("http", clickhouse-clickhouse.clickhouse.svc.cluster.local:8123)
2025-01-03T09:54:14.385982Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}:http: hyper::client::pool: pooling idle connection for ("http", clickhouse-clickhouse.clickhouse.svc.cluster.local:8123)
2025-01-03T09:54:14.386006Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}:http: vector::internal_events::http_client: HTTP response. status=503 Service Unavailable version=HTTP/1.1 headers={"content-length": "114", "content-type": "text/plain", "date": "Fri, 03 Jan 2025 09:54:13 GMT", "server": "envoy"} body=[114 bytes]
2025-01-03T09:54:14.386056Z  WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}: vector::sinks::util::retries: Internal log [Retrying after response.] has been suppressed 44 times.
2025-01-03T09:54:14.386062Z  WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}: vector::sinks::util::retries: Retrying after response. reason=503 Service Unavailable: upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection timeout internal_log_rate_limit=true
2025-01-03T09:54:14.386067Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}: vector::sinks::util::retries: Retrying request. delay_ms=720
2025-01-03T09:54:15.107150Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}:http: vector::internal_events::http_client: Sending HTTP request. uri=http://clickhouse-clickhouse.clickhouse.svc.cluster.local:8123/?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22ep_sdc_edge%22.%22some_table%22+FORMAT+JSONEachRow method=POST version=HTTP/1.1 headers={"content-type": "application/x-ndjson", "content-length": "804", "content-encoding": "zstd", "authorization": Sensitive, "accept-encoding": "zstd,gzip,deflate,br", "user-agent": "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)"} body=[804 bytes]
2025-01-03T09:54:15.107195Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}:http: hyper::client::pool: reuse idle connection for ("http", clickhouse-clickhouse.clickhouse.svc.cluster.local:8123)
2025-01-03T09:54:15.123728Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}:http: hyper::client::pool: pooling idle connection for ("http", clickhouse-clickhouse.clickhouse.svc.cluster.local:8123)
2025-01-03T09:54:15.123748Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}:http: vector::internal_events::http_client: HTTP response. status=200 OK version=HTTP/1.1 headers={"date": "Fri, 03 Jan 2025 09:54:15 GMT", "content-type": "text/plain; charset=UTF-8", "x-clickhouse-server-display-name": "chi-clickhouse-clickhouse-pv-0-2-0.chi-clickhouse-clickhouse-pv-0-2.clickhouse.svc.cluster.local", "x-clickhouse-query-id": "c143d4bf-f960-4197-aab8-5a78971f614d", "x-clickhouse-timezone": "UTC", "x-clickhouse-summary": "{\"read_rows\":\"2\",\"read_bytes\":\"537\",\"written_rows\":\"2\",\"written_bytes\":\"537\",\"total_rows_to_read\":\"0\",\"result_rows\":\"2\",\"result_bytes\":\"537\",\"elapsed_ns\":\"14808528\"}", "x-envoy-upstream-service-time": "15", "server": "envoy", "transfer-encoding": "chunked"} body=[unknown]

istio-proxy logs outgoing from vector pod:

[2025-01-03T12:05:15.541Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 892 114 10004 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "3a1172a2-b1ea-43c2-a80f-110995853121" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:52044 - default
[2025-01-03T12:04:59.054Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 857 114 30075 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "b0fa3e20-29b9-4f6d-a684-66f14c2a67b0" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.102.120.112:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.102.120.112:8123 100.96.105.200:33804 - default
[2025-01-03T12:05:23.567Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 7895 114 10465 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "4ce19de7-c1b0-49d0-adfe-6989b348582a" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:38118 - default
[2025-01-03T12:05:24.031Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 816 114 10001 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "08e6db41-1ff7-410e-bd84-afd074aea060" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:39238 - default
[2025-01-03T12:05:26.056Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 853 114 10000 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "90a611f4-26f3-404b-a36e-9a6b49841b5b" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:52044 - default
[2025-01-03T12:05:34.045Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 875 114 10031 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "24d96855-d8fa-4b2e-9aa9-594e8e304da3" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:38118 - default
[2025-01-03T12:05:34.045Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 872 114 10031 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "b268b8b4-418b-4635-b4c7-e83d7dfa0021" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:39238 - default
[2025-01-03T12:05:36.062Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 825 114 10006 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "0f92425f-2d25-4bd5-9519-6f798acea71e" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:52044 - default
[2025-01-03T12:05:22.482Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 809 114 30083 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "8ef3b254-e2b9-43c8-9b0b-9dbeae769cce" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.105.236.38:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.105.236.38:8123 100.96.105.200:60324 - default
[2025-01-03T12:05:44.486Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 866 114 10002 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "5b4a6614-7298-43b3-aa82-b04110734ee6" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:38118 - default
[2025-01-03T12:05:44.370Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 8295 114 10119 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "7bd0441c-6898-4d25-8788-e0765589b115" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:39238 - default

This doesn't seem to be a DNS issue, IPs, are correct:

;clickhouse-clickhouse.clickhouse.svc.cluster.local. IN A

;; ANSWER SECTION:
clickhouse-clickhouse.clickhouse.svc.cluster.local. 30 IN A 100.102.120.105
clickhouse-clickhouse.clickhouse.svc.cluster.local. 30 IN A 100.105.236.61
clickhouse-clickhouse.clickhouse.svc.cluster.local. 30 IN A 100.111.67.13

 k get pod -o wide
NAME                                        READY   STATUS    RESTARTS   AGE    IP                NODE                  NOMINATED NODE   READINESS GATES
chi-clickhouse-clickhouse-pv-0-0-0          2/2     Running   0          154m   100.102.120.105   i-094734b915a844584   <none>           <none>
chi-clickhouse-clickhouse-pv-0-1-0          2/2     Running   0          152m   100.111.67.13     i-08e176d047c1fefdc   <none>           <none>
chi-clickhouse-clickhouse-pv-0-2-0          2/2     Running   0          150m   100.105.236.61    i-02cbff00ea6dc3a56   <none>           <none>

Example Data

No response

Additional Context

No response

References

No response

seilerre · 2025-01-03T12:36:57Z

Alright, after debugging and updating this post I seem to realise that this might be an Istio problem, as istio-proxy stills holds the old endpoints. Not going to delete this post as this might help somebody else

seilerre · 2025-01-07T12:31:47Z

Okay, this seems to be due to the fact that Hyper in Clickhouse Sink is doing connection pooling and envoy in Istio is as well. Is there any way to deactivate connection pooling in the Clickhouse Sink and let this be handled by Istio? See discussion here istio/istio#54539

seilerre added the type: bug A code related bug. label Jan 3, 2025

seilerre closed this as completed Jan 3, 2025

seilerre reopened this Jan 7, 2025

pront added the sink: clickhouse Anything `clickhouse` sink related label Jan 7, 2025

vectordotdev locked and limited conversation to collaborators Jan 7, 2025

pront converted this issue into discussion #22137 Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Clickhouse Sink "upstream connect error or disconnect/reset before headers" after Clickhouse update #22116

Clickhouse Sink "upstream connect error or disconnect/reset before headers" after Clickhouse update #22116

seilerre commented Jan 3, 2025 •

edited

Loading

seilerre commented Jan 3, 2025

seilerre commented Jan 7, 2025 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Clickhouse Sink "upstream connect error or disconnect/reset before headers" after Clickhouse update #22116

Clickhouse Sink "upstream connect error or disconnect/reset before headers" after Clickhouse update #22116

Comments

seilerre commented Jan 3, 2025 • edited Loading

A note for the community

Problem

Version

Debug Output

Example Data

Additional Context

References

seilerre commented Jan 3, 2025

seilerre commented Jan 7, 2025 • edited Loading

This issue was moved to a discussion.

seilerre commented Jan 3, 2025 •

edited

Loading

seilerre commented Jan 7, 2025 •

edited

Loading