-
Notifications
You must be signed in to change notification settings - Fork 1.6k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File read log interval #22115
Comments
I will describe my usage scenario in detail. source configuates file reading, sink configuates elasticsearch, configuates buffering and backpressing hard buffer and block modes, but in fact, according to my test results, even if buffer files are generated, they will be used to write buffer files. However, the downstream elasticsearch cpu is still high, and the upstream source is still using 40% of the total amount on a 4-core 8G server because I configured it with only single threads. I used vector top to view, and writes to source and transforms reached tens of Kelvin per second. Downstream elasticsearch cpu usage is up to 90 percent. If I configure throttle, I have to throw out logs, and in fact, if I can slow this down without introducing new middleware like kafka, I think it's perfect. |
`sources:
app_logs_parse: |
I provided a piece of code using lua stream limiting, tested, it can control the speed of three components, source,transforms,sink it can be implemented, read only 5K logs at a time, and tested, the number of logs output and the number of logs in the source file is consistent, the cpu usage is only used a few points. If anyone has seen or tried this solution, they can point out whether it has hidden dangers. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
I see the issue you are describing, but I don't think we'd solve it by adding configuration to the
file
source. Instead, I think a more holistic way to solve this problem is for thethrottle
transform to support applying back-pressure. This is being tracked by #13651You could also consider configuring the sink to apply back-pressure by limiting the concurrency or batch sizes.
I'll close this issue, but let me know if you disagree with my assessment!
Originally posted by @jszwedko in #22095 (comment)
The text was updated successfully, but these errors were encountered: