Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High cpu usage #22081

Closed
xufeixianggithub opened this issue Dec 24, 2024 · 2 comments
Closed

High cpu usage #22081

xufeixianggithub opened this issue Dec 24, 2024 · 2 comments
Labels
type: bug A code related bug.

Comments

@xufeixianggithub
Copy link

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I think this parse_regex is not complicated, referring to the official website demo, when I read a 100M log file, after parsing to write another file, the cpu will reach 100+ or even 200 percent, even if I try to set the maximum number of threads to 1, if it is 4 or 8, The cpu will reach 400% and the number of logs processed per second is 5.5K, whether this is a problem with the expression, or what parameters can be set to slow down the parsing speed?

Configuration

sources:
  es_data_logs_src:
    type: "file"
    include:
      - "/home/enplus/alidata/vector/testScript/es_data_charge_log_test.log"
    ignore_older_secs: 86400     # 1 day
    ignore_checkpoints: false
    line_delimiter: "-[END]\n"
transforms:
  es_data_logs_parse:
    type: "remap"
    inputs: ["es_data_logs_src"]  # 替换为您的数据源名称
    source: |
      . |= parse_regex!(.message, r'^\[(?P<biz_type>[^\]]+)\] \[(?P<app_name>[^\]]+)\] \[(?P<trace_id>[^\]]*)\] \[(?P<ip_address>[^\]]+)\] \[(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}:\d{3})\] \[(?P<level>[^\]]+)\] \[(?P<thread_id>[^\]]+)\] - (?P<message>[\s\S]*)$')
      del(.host)
      del(.source_type)
      del(.file)
sinks:
  web_log_test_sink:
    inputs:
      - es_data_logs_parse
    type: file
    path: "/home/enplus/alidata/vector/testScript/output.log"
    encoding:
      codec: json
      json:
        pretty: true

Version

vector 0.42.0 (x86_64-unknown-linux-gnu 3d16e34 2024-10-21 14:10:14.375255220)

Debug Output

No response

Example Data

I think this parse_regex is not complicated, referring to the official website demo, when I read a 100M log file, after parsing to write another file, the cpu will reach 100+ or even 200 percent, even if I try to set the maximum number of threads to 1, if it is 4 or 8, The cpu will reach 400% and the number of logs processed per second will be 5.5K. Is this an expression problem or is there any parameter that can be set to reduce the parsing speed? I have tried VECTOR_THREADS=1 VECTOR_INTERNAL_LOG_RATE_LIMIT=100
Parameters such as max_read_bytes.

Additional Context

No response

References

No response

@xufeixianggithub xufeixianggithub added the type: bug A code related bug. label Dec 24, 2024
@xufeixianggithub
Copy link
Author

[COMM-DATA] [Charge-Web] [5ed93fc5-5de2-4605-9d3f-683c5f2ec128] [127.0.0.1] [2024-12-19 16:45:07:013] [INFO] [294|http-nio-9090-exec-1] - {"time":1734597907010,"dataType":"RECEIVE","enGateSerialNum":"SN90052308090965","chargerSerialNum":null,"protocolVersion":5,"actionCode1":1504,"actionCode2":1504,"param":1,"dataFrame":"aa f5 17 00 03 d1 e0 05 00 00 1b 00 4e 2b 00 00 00 00 00 00 00 00 79","keyData":"测试","isValid":false} -[END]

@xufeixianggithub
Copy link
Author

I have solved this problem, it is not a bug, I am very sorry for raising a wrong issue. It is also recommended that there be a parameter specifying the interval for reading logs from file at the end to better control the consumption rate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

1 participant