High cpu usage #22081

xufeixianggithub · 2024-12-24T06:04:50Z

A note for the community

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I think this parse_regex is not complicated, referring to the official website demo, when I read a 100M log file, after parsing to write another file, the cpu will reach 100+ or even 200 percent, even if I try to set the maximum number of threads to 1, if it is 4 or 8, The cpu will reach 400% and the number of logs processed per second is 5.5K, whether this is a problem with the expression, or what parameters can be set to slow down the parsing speed?

Configuration

sources:
  es_data_logs_src:
    type: "file"
    include:
      - "/home/enplus/alidata/vector/testScript/es_data_charge_log_test.log"
    ignore_older_secs: 86400     # 1 day
    ignore_checkpoints: false
    line_delimiter: "-[END]\n"
transforms:
  es_data_logs_parse:
    type: "remap"
    inputs: ["es_data_logs_src"]  # 替换为您的数据源名称
    source: |
      . |= parse_regex!(.message, r'^\[(?P<biz_type>[^\]]+)\] \[(?P<app_name>[^\]]+)\] \[(?P<trace_id>[^\]]*)\] \[(?P<ip_address>[^\]]+)\] \[(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}:\d{3})\] \[(?P<level>[^\]]+)\] \[(?P<thread_id>[^\]]+)\] - (?P<message>[\s\S]*)$')
      del(.host)
      del(.source_type)
      del(.file)
sinks:
  web_log_test_sink:
    inputs:
      - es_data_logs_parse
    type: file
    path: "/home/enplus/alidata/vector/testScript/output.log"
    encoding:
      codec: json
      json:
        pretty: true

Version

vector 0.42.0 (x86_64-unknown-linux-gnu 3d16e34 2024-10-21 14:10:14.375255220)

Debug Output

No response

Example Data

I think this parse_regex is not complicated, referring to the official website demo, when I read a 100M log file, after parsing to write another file, the cpu will reach 100+ or even 200 percent, even if I try to set the maximum number of threads to 1, if it is 4 or 8, The cpu will reach 400% and the number of logs processed per second will be 5.5K. Is this an expression problem or is there any parameter that can be set to reduce the parsing speed? I have tried VECTOR_THREADS=1 VECTOR_INTERNAL_LOG_RATE_LIMIT=100
Parameters such as max_read_bytes.

Additional Context

No response

References

No response

The text was updated successfully, but these errors were encountered:

xufeixianggithub · 2024-12-24T06:05:45Z

[COMM-DATA] [Charge-Web] [5ed93fc5-5de2-4605-9d3f-683c5f2ec128] [127.0.0.1] [2024-12-19 16:45:07:013] [INFO] [294|http-nio-9090-exec-1] - {"time":1734597907010,"dataType":"RECEIVE","enGateSerialNum":"SN90052308090965","chargerSerialNum":null,"protocolVersion":5,"actionCode1":1504,"actionCode2":1504,"param":1,"dataFrame":"aa f5 17 00 03 d1 e0 05 00 00 1b 00 4e 2b 00 00 00 00 00 00 00 00 79","keyData":"测试","isValid":false} -[END]

xufeixianggithub · 2024-12-25T02:49:33Z

I have solved this problem, it is not a bug, I am very sorry for raising a wrong issue. It is also recommended that there be a parameter specifying the interval for reading logs from file at the end to better control the consumption rate.

xufeixianggithub added the type: bug A code related bug. label Dec 24, 2024

xufeixianggithub closed this as completed Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High cpu usage #22081

High cpu usage #22081

xufeixianggithub commented Dec 24, 2024

xufeixianggithub commented Dec 24, 2024

xufeixianggithub commented Dec 25, 2024

High cpu usage #22081

High cpu usage #22081

Comments

xufeixianggithub commented Dec 24, 2024

A note for the community

Problem

Configuration

Version

Debug Output

Example Data

Additional Context

References

xufeixianggithub commented Dec 24, 2024

xufeixianggithub commented Dec 25, 2024