Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipe: add flb_pipe_error #10017

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

braydonk
Copy link
Contributor

@braydonk braydonk commented Feb 26, 2025

On Windows, the flb_pipe_r and flb_pipe_w macros do not set errno on failure, meaning calling flb_errno in error scenarios is insufficient. This PR adds a new macro that will check the correct place, WSAGetLastError, and output a similar error message. On Linux this will still be flb_errno, meaning messages should work the same as they always did, but now on Windows we will get actual error messages.

Fixes #3146


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change

I had trouble finding a scenario wherein a proper winsock.h failure is propagated. In #3146 they mentioned sending to an elasticsearch host that is not available, but the pattern for how the http client handles that error appears to have changed such that a direct pipe read error isn't propagated.

To test the log format, I made this local change on Windows:

--- a/src/flb_utils.c
+++ b/src/flb_utils.c
@@ -494,7 +494,9 @@ int flb_utils_timer_consume(flb_pipefd_t fd)
     int ret;
     uint64_t val;

-    ret = flb_pipe_r(fd, &val, sizeof(val));
+    // ret = flb_pipe_r(fd, &val, sizeof(val));
+    ret = -1;
+    WSASetLastError(WSAEADDRINUSE);
     if (ret == -1) {
         flb_pipe_error();
         return -1;

Resulting log:

[2025/02/27 18:01:58] [error] [C:\Users\braydonk\Git\fluent-bit\src\flb_utils.c:501 WSAGetLastError=10048] Only one usage of each socket address (protocol/network address/port) is normally permitted.

On Linux I made the following similar local change:

--- a/src/flb_utils.c
+++ b/src/flb_utils.c
@@ -494,7 +494,9 @@ int flb_utils_timer_consume(flb_pipefd_t fd)
     int ret;
     uint64_t val;
 
-    ret = flb_pipe_r(fd, &val, sizeof(val));
+    // ret = flb_pipe_r(fd, &val, sizeof(val));
+    ret = -1;
+    errno = EADDRINUSE;
     if (ret == -1) {
         flb_pipe_error();
         return -1;

And this is the resulting error message:

[2025/02/28 13:57:17] [error] [/usr/local/google/home/braydonk/Git/fluent-bit/src/flb_utils.c:501 errno=98] Address already in use
  • Attached Valgrind output that shows no leaks or memory corruption was found

Documentation

Docs PR should not be required.

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

On Windows, the `flb_pipe_r` and `flb_pipe_w` macros do not set errno on
failure, meaning calling `flb_errno` in error scenarios is insufficient.
This PR adds a new macro that will check the correct place,
`WSAGetLastError`, and output a similar error message. On Linux this
will still be `flb_errno`, meaning messages should work the same as they
always did, but now on Windows we will get actual error messages.

Signed-off-by: braydonk <braydonk@google.com>
@braydonk
Copy link
Contributor Author

That s390x failure doesn't look related to this PR.

@edsiper edsiper added this to the Fluent Bit v4.0.0 milestone Feb 28, 2025
On Windows, the `flb_pipe_r` and `flb_pipe_w` macros do not set errno on
failure, meaning calling `flb_errno` in error scenarios is insufficient.
This PR adds a new macro that will check the correct place,
`WSAGetLastError`, and output a similar error message. On Linux this
will still be `flb_errno`, meaning messages should work the same as they
always did, but now on Windows we will get actual error messages.

Signed-off-by: braydonk <braydonk@google.com>
@@ -232,11 +232,18 @@ static inline int flb_log_suppress_check(int log_suppress_interval, const char *
int flb_log_worker_init(struct flb_worker *worker);
int flb_log_worker_destroy(struct flb_worker *worker);
int flb_errno_print(int errnum, const char *file, int line);
int flb_WSAGetLastError_print(int errnum, const char *file, int line);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please change these names to follow the coding style?


#ifdef __FLB_FILENAME__
#define flb_errno() flb_errno_print(errno, __FLB_FILENAME__, __LINE__)
#ifdef WIN32
#define flb_WSAGetLastError() flb_WSAGetLastError_print(WSAGetLastError(), __FLB_FILENAME__, __LINE__)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please change these names to follow the coding style?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function should be renamed, otherwise the current name suggests it would return the error code rather than printing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just matching the same pattern as flb_errno currently. How should the name change? And should the name of flb_errno change given that it provides the same potential confusion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate it if you named it flb_wsa_get_last_error.

I agree with you that flb_errno presents the same issue in terms of its name but as you might imagine changing its name would involve a boat load of changes which even if minimal would be super noisy so it might not be the right time for that.

What do you think of having both flb_errno and flb_wsa_get_last_error_print return the error code in addition to printing it just as a first step?

I understand that this would basically eat the error code that the function itself could generate (if error conditions were handled) which could make this a step in the wrong direction so let me know what you think about that.

The "critical" thing to me is changing those names from camel case to snake case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although no callers will currently use that return value, I don't see any actual harm in simply returning the errnum value. Did that along with the rename in 429bbe1

src/flb_log.c Outdated
@@ -121,7 +126,7 @@ static inline int log_read(flb_pipefd_t fd, struct flb_log *log)
bytes = flb_pipe_read_all(fd, &msg, sizeof(struct log_message));

if (bytes <= 0) {
flb_errno();
flb_pipe_error();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case there is an flb_error call in the second error exitt code path inside of flb_pipe_read_all which could cause confusion, could you remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed in 7a1429c

buf, sizeof(buf), NULL);
flb_error("[%s:%i WSAGetLastError=%i] %s", file, line, errnum, buf);
#endif
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a return value or change the return type to void

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a return value to keep it lined up with flb_errno in 7a1429c

@@ -745,6 +750,17 @@ int flb_errno_print(int errnum, const char *file, int line)
return 0;
}

int flb_WSAGetLastError_print(int errnum, const char *file, int line)
{
#ifdef WIN32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to make the body of the function or the whole function conditional?
This function is only referenced by flb_WSAGetLastError if our target is Windows so I don't think making the whole thing conditional would be a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep fair assessment. Done in 7a1429c

Signed-off-by: braydonk <braydonk@google.com>
Signed-off-by: braydonk <braydonk@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flb_errno printing wrong error codes on windows
3 participants