Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File source doesn't read the log file until the end of file (only first batch) in single run #22108

Closed
DmitryLukyanov opened this issue Jan 2, 2025 · 5 comments
Labels
meta: awaiting author Pull requests that are awaiting their author. source: file Anything `file` source related type: bug A code related bug.

Comments

@DmitryLukyanov
Copy link

DmitryLukyanov commented Jan 2, 2025

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

There is an open related discussion: #22103

File source doesn't read the full log file. After reading the first batch, the script call become completed. To proceed with the second batch, the vector .. call should be launched again. Please see output.

I use 2 sinks: http (azure function) and console, the issue is visible in console too

Original running command (Launch.ps1):

vector `
	-c .\config\vector.yaml `
	--require-healthy true `
	--color always `
	--no-graceful-shutdown-limit `
	--watch-config `
	--verbose `

My folder structure:

$ ls -R
.:
Launch.ps1  config  data  full_output.txt  logs
./config:
vector.yaml
./data:
./logs:
raw.log

After first launch, the checkpoints.json looks like:

{"version":"1","checkpoints":[{"fingerprint":{"first_lines_checksum":18390092115065914886},"position":2106,"modified":"2025-01-02T17:37:33.304125700Z"}]}

Configuration

api:
  enabled: true

acknowledgements:
  enabled: true

data_dir: ./data

sources:
  fileIn:
	type: "file"
	glob_minimum_cooldown_ms: 5000
	include: 
	  - ./logs/*.log
	internal_metrics:
	  include_file_tag: true

transforms:
  msg_parser:
	type: remap
	inputs: [fileIn]
	source: |
	  # send everything to sinks
	  .=.

sinks:

  console_print:
	type: console
	inputs:
	  - "msg_parser"
	encoding:
	  codec: "json"

  http:
	type: "http"
	method: post
	inputs:
	  - "msg_parser"
	uri: http://localhost:7245/api/LogsTransmitter
	tls:
	  verify_certificate: false
	encoding:
	  codec: "json"

	request:
	  retry_attempts: 3

Version

vector 0.43.1 (x86_64-pc-windows-msvc e30bf1f 2024-12-10 16:14:47.175528383)

Debug Output

please see the below comment: #22108 (comment)

Example Data

please see the below comment: #22108 (comment)

Additional Context

OS: Windows 11

References

#22103

@DmitryLukyanov DmitryLukyanov added the type: bug A code related bug. label Jan 2, 2025
@DmitryLukyanov
Copy link
Author

DmitryLukyanov commented Jan 2, 2025

Sorry, can't add the full output and example of file into the description (due to size lmit), so adding it here:
full_output.txt

Input file (raw.log):
raw.log

@jszwedko jszwedko added the source: file Anything `file` source related label Jan 2, 2025
@jszwedko
Copy link
Member

jszwedko commented Jan 2, 2025

Sorry, can't add the full output and example of file into the description (due to size lmit), so adding it here: full_output.txt

Input file (raw.log): raw.log

Thanks for sharing the logs @DmitryLukyanov . I think I see the issue. Vector is hitting an error:

2025-01-02T17:14:07.127914Z ERROR vector::internal_events::socket: Error binding socket. error=error creating server listener: Only one usage of each socket address (protocol/network address/port) is normally permitted. (os error 10048) error_code="socket_bind" error_type="io_failed" stage="receiving" mode=tcp internal_log_rate_limit=true
2025-01-02T17:14:07.128552Z ERROR vector::app: An error occurred that Vector couldn't handle: error creating server listener: Only one usage of each socket address (protocol/network address/port) is normally permitted. (os error 10048).

Which is causing it to shutdown which is why it is only able to send one batch. The only thing I see in your config that should be binding to a socket is api: enabled: true. It is possible there is something already listening on the default API port (8686)?

@jszwedko jszwedko added the meta: awaiting author Pull requests that are awaiting their author. label Jan 2, 2025
@DmitryLukyanov
Copy link
Author

thanks @jszwedko for your response,

I saw this error before, however I wasn't confident which port exactly is affected (I suspected my azure function, but it didn't look so) and given that this error appears in the log long before the time when I call the function, it didn't look very related..

yes, something is listening port 8686:

PS C:\test> netstat -ano|findstr 8686
TCP 127.0.0.1:8686 0.0.0.0:0 LISTENING 3364

it looks like it's vector process that somehow wasn't closed during my tests (even though the powershell process has been completed).

image

If I manually kill this process, it looks like the issue is gone.

@DmitryLukyanov
Copy link
Author

DmitryLukyanov commented Jan 2, 2025

Not sure why the vector process hangs in the tasks even though the parent powershell has been completed, but at least my original question looks resolved. Thanks!

@jszwedko
Copy link
Member

jszwedko commented Jan 2, 2025

Aha, gotcha, thanks for confirming! I'll close this out, but let us know if you have any other issues.

@jszwedko jszwedko closed this as completed Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: awaiting author Pull requests that are awaiting their author. source: file Anything `file` source related type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants