Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a basic illumos test job #6769

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

add a basic illumos test job #6769

wants to merge 7 commits into from

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Aug 12, 2024

This commit adds a simple test job for illumos using Buildomat. I'd like to make some more improvements and add docs before merging this, but I want to see if it even works first.

This commit adds a simple test job for illumos using Buildomat. I'd like
to make some more improvements and add docs before merging this, but I
want to see if it even works first.
@hawkw
Copy link
Member Author

hawkw commented Aug 12, 2024

@jclulow, whenever you have the time, I'd love to get your eyes on my buildomat script --- I'm sure it could be made nicer!

@hawkw
Copy link
Member Author

hawkw commented Aug 12, 2024

Welp, it looks like some of the tests fail on illumos, which I actually think is good news --- that's why we're adding this, after all!

@jclulow
Copy link

jclulow commented Aug 12, 2024

I looked at the anon_pipe_spawn_echo test, FWIW, and I think it's just leaning on unspecified behaviour of echo(1). You could switch echo out for printf there, and I think it would still work on every other system.

(For reference: echo in xpg8 says that Implementations shall not support any options.)

@hawkw
Copy link
Member Author

hawkw commented Aug 12, 2024

I looked at the anon_pipe_spawn_echo test, FWIW, and I think it's just leaning on unspecified behaviour of echo(1). You could switch echo out for printf there, and I think it would still work on every other system.

Yeah, I kinda figured it was something like that --- makes sense to me!

The illumos version of `echo(1)` doesn't support `-n`, so it interprets
it literally as part of the string to echo, making the test fail. I've
changed it to use `printf(1)` with no flags args to get the same
behavior.

I also changed the name of the test, since it doesn't actually spawn
`echo(1)` anymore :)
@Darksonn Darksonn added the A-ci Area: The continuous integration setup label Aug 12, 2024
@hawkw
Copy link
Member Author

hawkw commented Aug 12, 2024

Some more test failures, including two for UDS:

I'll add that we appear to currently be on mio 1.0.2 on CI: https://buildomat.eng.oxide.computer/wg/0/details/01J53XZE95B7J0BRPT2QN3FR1B/3o4DSQRTlGfizqZJ0Q8Qzv5Lp8Axg7hUe5MPyG5OUVR87hae/01J53Y0AQTN8H3RDD6CK3QJ6FG#S336 It would be interesting to see whether the test results are different on earlier mio revisions.

@hawkw
Copy link
Member Author

hawkw commented Aug 12, 2024

well, that's cool: if i add a single dbg! macro to the AsyncFd::read call in the drain function in io_async_fd.rs, like this:

diff --git a/tokio/tests/io_async_fd.rs b/tokio/tests/io_async_fd.rs
index 3ab1cebd..c5d5cb61 100644
--- a/tokio/tests/io_async_fd.rs
+++ b/tokio/tests/io_async_fd.rs
@@ -139,7 +139,7 @@ fn drain(mut fd: &FileDescriptor) {
     let mut buf = [0u8; 512];
     #[allow(clippy::unused_io_amount)]
     loop {
-        match fd.read(&mut buf[..]) {
+        match dbg!(fd.read(&mut buf[..])) {
             Err(e) if e.kind() == ErrorKind::WouldBlock => break,
             Ok(0) => panic!("unexpected EOF"),
             Err(e) => panic!("unexpected error: {:?}", e),

the test now passes for me on illumos. 🤡

@hawkw
Copy link
Member Author

hawkw commented Aug 12, 2024

stack traces from the reset_writable test when it hangs:

eliza@atrium ~ $ pargs 9451
9451:   /home/eliza/tokio/target/debug/deps/io_async_fd-c80359565a3a95c8 reset_writable
argv[0]: /home/eliza/tokio/target/debug/deps/io_async_fd-c80359565a3a95c8
argv[1]: reset_writable
argv[2]: --nocapture
eliza@atrium ~ $ pstack 9451 | demangle
9451:   /home/eliza/tokio/target/debug/deps/io_async_fd-c80359565a3a95c8 reset
--------------------- thread# 1 / lwp# 1 ---------------------
 fffff9ffeef44697 lwp_park (0, 0, 0)
 fffff9ffeef3db05 cond_wait_queue (e8ef88, e8efa0, 0) + 55
 fffff9ffeef3e16a __cond_wait (e8ef88, e8efa0) + ba
 fffff9ffeef3e1ae cond_wait (e8ef88, e8efa0) + 2e
 fffff9ffeef3e1f5 pthread_cond_wait (e8ef88, e8efa0) + 15
 0000000000a0814b std::thread::park::hb49d084e872705a5 () + 6b
 000000000079b87e std::sync::mpmc::list::Channel<T>::recv::{{closure}}::hf62e1229f5251f1e () + 8e
 000000000079b382 std::sync::mpmc::list::Channel<T>::recv::h0b15a8b7d971ec93 () + 272
 00000000007b05c0 test::console::run_tests_console::hbc01c9336ae30ea6 () + 1c00
 00000000007cea23 test::test_main::h821d08de9f858998 () + 153
 00000000007cf817 test::test_main_static::hd54d88e3cf121884 () + 57
 0000000000797e55 io_async_fd::main::hda858a55aa0b0e04 () + 15
 00000000007652ae core::ops::function::FnOnce::call_once::h537173c55fa15c15 () + e
 000000000078ed51 std::sys_common::backtrace::__rust_begin_short_backtrace::h5368d7f525ae1a75 () + 11
 000000000073b5e4 std::rt::lang_start::{{closure}}::hed5ee56bf0bdaa4e () + 14
 0000000000a077c3 std::rt::lang_start_internal::hdc1ac086c909d074 () + 2d3
 000000000073b5b7 std::rt::lang_start::h20aead5d623943e6 () + 37
 0000000000797e81 main () + 21
 000000000072ec37 _start_crt () + 87
 000000000072eb98 _start () + 18
------------ thread# 2 / lwp# 2 [reset_writable] -------------
 fffff9ffeef4af4a ioctl    (3, d001, fffff9ffeb5fddc0)
 000000000092ab5f mio::sys::unix::selector::Selector::select::heba8144752651684 () + 8f
 000000000092e3a9 mio::poll::Poll::poll::ha7f583c7aba8c8df () + 49
 00000000007f99d3 tokio::runtime::io::driver::Driver::turn::hdbc877f2c5a33d2a () + 73
 00000000007f96d9 tokio::runtime::io::driver::Driver::park::h9aa1ba931c7ade7a () + 49
 000000000081e1e9 tokio::runtime::signal::Driver::park::hc0c910bc43cc300c () + 19
 00000000007f3d89 tokio::runtime::process::Driver::park::hc7bde4d7ee237dd9 () + 19
 000000000086a6b7 tokio::runtime::driver::IoStack::park::hc2b6b08bb1a7b51d () + 47
 00000000008b1658 tokio::runtime::time::Driver::park_internal::h012bb0183fb6e65b () + 2e8
 00000000008b12a3 tokio::runtime::time::Driver::park::h540665ec9ecaed80 () + 23
 000000000086ac33 tokio::runtime::driver::TimeDriver::park::h092df9172d1963aa () + 33
 0000000000869f15 tokio::runtime::driver::Driver::park::h82219332c43dd971 () + 15
 0000000000802e52 tokio::runtime::scheduler::current_thread::Context::park::{{closure}}::h591fef1df8327f80 () + 22
 0000000000803507 tokio::runtime::scheduler::current_thread::Context::enter::hd4aadae4f3c48ecb () + 147
 0000000000814590 tokio::runtime::scheduler::current_thread::Context::park::h5710d1b3bfecfdae () + 220
 0000000000755e9c tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::hcd0aab2bdfea8763 () + 61c
 0000000000754e0b tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}::ha9e5ceabc5a33a08 () + 2b
 0000000000790256 tokio::runtime::context::scoped::Scoped<T>::set::hba54509bcb7d1159 () + 96
 0000000000792ad7 tokio::runtime::context::set_scheduler::{{closure}}::h0d748329321238e3 () + 37
 000000000073fb2e std::thread::local::LocalKey<T>::try_with::ha15aa11176beca59 () + fe
 000000000073ddf1 std::thread::local::LocalKey<T>::with::h3c6198f3a4fe4c05 () + 11
 00000000007929af tokio::runtime::context::set_scheduler::h4d7018abd6870714 () + 2f
 0000000000754529 tokio::runtime::scheduler::current_thread::CoreGuard::enter::h3d583cd250e1a64c () + 179
 0000000000754ebd tokio::runtime::scheduler::current_thread::CoreGuard::block_on::h4b3b82ad10cb0d96 () + 1d
 000000000075069f tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}}::h785603f5fbb451e3 () + 1ff
 000000000078b9c2 tokio::runtime::context::runtime::enter_runtime::h7ace6ddcba5e5651 () + f2
 0000000000750067 tokio::runtime::scheduler::current_thread::CurrentThread::block_on::hbfbcbe776dcb53ff () + 97
 000000000073802f tokio::runtime::runtime::Runtime::block_on_inner::h1afb29bef3233be6 () + 8f
 000000000073849a tokio::runtime::runtime::Runtime::block_on::hac4960ca03c4ed0a () + 5a
 0000000000794dd3 io_async_fd::reset_writable::h070bf37d42223f46 () + 123
 00000000007730a9 io_async_fd::reset_writable::{{closure}}::hb0f0c038f01573a4 () + 19
 0000000000765a98 core::ops::function::FnOnce::call_once::hc1404f0ca2253420 () + 18
 00000000007d5462 test::__rust_begin_short_backtrace::h1d84a84fc5339953 () + 12
 00000000007d4330 test::run_test::{{closure}}::hc7f6d1d4a8ac2dde () + 5b0
 0000000000798a22 std::sys_common::backtrace::__rust_begin_short_backtrace::hbc4ec67e58aa30af () + c2
 000000000079dbc7 core::ops::function::FnOnce::call_once{{vtable.shim}}::hb8da26b9e73c729a () + 77
 0000000000a2cec9 std::sys::pal::unix::thread::Thread::new::thread_start::h9ad099a424212e51 () + 29
 fffff9ffeef44307 _thrp_setup (fffff9ffec6a0240) + 77
 fffff9ffeef44650 _lwp_start ()
eliza@atrium ~ $

unfortunately, these stacks are not very interesting: they're just telling us that we are waiting on a notification (in mio::sys::unix::selector::Selector::select that we never get. which, given the failure mode, i feel like i could've told you...

hawkw added a commit that referenced this pull request Aug 14, 2024
## Motivation

The `io_async_fd.rs` tests contain a `drain()` function, which
currently performs synchronous reads from a UDS socket until it returns
`io::ErrorKind::WouldBlock` (i.e., errno `EWOULDBLOCK`/`EAGAIN`). The
*intent* behind this function is to ensure that all data has been
drained from the UDS socket's buffer...which is what it appears to
do...on Linux. On other systems, it appears that an `EWOULDBLOCK` or
`EAGAIN` may be returned before enough data has been read from the UDS
socket to result in the other end being notified that the socket is now
writable. In particular, this appears to be the case on illumos, where
the tests using this function hang forever (see [this comment][1] on PR
#6769).

To my knowledge, this behavior is still POSIX-compliant --- the
reader will still be notified that the socket is readable, and if it
were actually doing non-blocking IO, it would continue reading upon
receipt of that notification. So, relying on `EWOULDBLOCK` to indicate
that the socket has been sufficiently drained appears to rely on
Linux/FreeBSD behavior that isn't necessarily portable to other Unices.

## Solution

This commit changes the `drain()` function to take an argument for the
number of bytes *written* to the socket previously, and continue looping
until it has read that many bytes, regardless of whether `EWOULDBLOCK`
is returned. This should ensure that the socket is drained on all
POSIX-compliant systems, and indeed, the `io_async_fd::reset_writable`
and `io_async_fd::poll_fns` tests no longer hang forever on illumos.

I think making this change is an appropriate solution to the
test failure here, as the `drain()` function is part of the test, rather
than the code in Tokio *being* tested, and (as I mentioned above) the
use of blocking reads on a non-blocking socket without a mechanism to
continue reading when the socket becomes readable again is not really
something a real life program seems likely to do. Ensuring that all the
written bytes have been read by passing in a byte count seems more
faithful to what the test is actually *trying* to do here, anyway.

Thanks to @jclulow for debugging what was going on here!

This change was cherry-picked from commit
f18d6ed from PR #6769, so that the fix
can be merged separately.

[1]: #6769 (comment)
@hawkw
Copy link
Member Author

hawkw commented Aug 14, 2024

#6776 should fix the io_async_fd tests hanging.

@hawkw
Copy link
Member Author

hawkw commented Aug 14, 2024

## Motivation

In `tokio::task::local`, there's a
`LocalState::assert_called_from_owner_thread` function, which checks
that the caller's thread ID matches that of the thread on which the
`LocalSet` was created. This assertion is disabled on FreeBSD and
OpenBSD, with a comment indicating that this is because those systems
have "some weirdness around thread-local destruction". If memory serves,
I believe the "weirdness" was related to the order in which
thread-locals are destroyed relative to each other, or something along
those lines.

Upon running the tests on illumos, there appears to be [a similar
panic][1] in the `task_local_set_in_thread_local` due to this assertion
assertion while dropping a `LocalSet` which lives in a thread-local.
This leads me to believe that illumos, and presumably Solaris, also
exhibit the same "weirdness". This wouldn't be too surprising, given the
shared heritage of those systems.

## Solution

This commit adds `target_os="illumos"` and `target_os="solaris"` to the
`cfg` attribute that disables the assertion on systems determined to
have weirdness. This fixes the test panicking on illumos.

In the future, it might be worthwhile to look into changign the
assertion to only be disabled on weirdness-having systems _while
dropping the `LocalSet`_, rather than all the time. That way, we can
still check other accesses. But, I wanted to make the minimum change
necessary to fix it here before messing around with that.

[1]: https://buildomat.eng.oxide.computer/wg/0/details/01J592RB0JR203RGGN0RYHJHMY/CHieo1Uee7qzRVyp811YBl0MvXGO3i0QA9ezPaFWj6hf6P3P/01J592RSWCNX1MCFYGW74AHVH6#S1954
hawkw added a commit that referenced this pull request Aug 15, 2024
* tests: handle spurious EWOULDBLOCK in io_async_fd

## Motivation

The `io_async_fd.rs` tests contain a `drain()` function, which
currently performs synchronous reads from a UDS socket until it returns
`io::ErrorKind::WouldBlock` (i.e., errno `EWOULDBLOCK`/`EAGAIN`). The
*intent* behind this function is to ensure that all data has been
drained from the UDS socket's buffer...which is what it appears to
do...on Linux. On other systems, it appears that an `EWOULDBLOCK` or
`EAGAIN` may be returned before enough data has been read from the UDS
socket to result in the other end being notified that the socket is now
writable. In particular, this appears to be the case on illumos, where
the tests using this function hang forever (see [this comment][1] on PR
#6769).

To my knowledge, this behavior is still POSIX-compliant --- the
reader will still be notified that the socket is readable, and if it
were actually doing non-blocking IO, it would continue reading upon
receipt of that notification. So, relying on `EWOULDBLOCK` to indicate
that the socket has been sufficiently drained appears to rely on
Linux/FreeBSD behavior that isn't necessarily portable to other Unices.

## Solution

This commit changes the `drain()` function to take an argument for the
number of bytes *written* to the socket previously, and continue looping
until it has read that many bytes, regardless of whether `EWOULDBLOCK`
is returned. This should ensure that the socket is drained on all
POSIX-compliant systems, and indeed, the `io_async_fd::reset_writable`
and `io_async_fd::poll_fns` tests no longer hang forever on illumos.

I think making this change is an appropriate solution to the
test failure here, as the `drain()` function is part of the test, rather
than the code in Tokio *being* tested, and (as I mentioned above) the
use of blocking reads on a non-blocking socket without a mechanism to
continue reading when the socket becomes readable again is not really
something a real life program seems likely to do. Ensuring that all the
written bytes have been read by passing in a byte count seems more
faithful to what the test is actually *trying* to do here, anyway.

Thanks to @jclulow for debugging what was going on here!

This change was cherry-picked from commit
f18d6ed from PR #6769, so that the fix
can be merged separately.

[1]: #6769 (comment)

Signed-off-by: Eliza Weisman <eliza@elizas.website>
hawkw added a commit that referenced this pull request Aug 15, 2024
## Motivation

Currently, the test `uds_stream::epollhup` expects that a
`UdsStream::connect` future to a Unix socket which is closed by the
accept side to always fail with `io::ErrorKind::ConnectionReset`. On
illumos, and potentially other systems, it instead fails with
`io::ErrorKind::ConnectionRefused`.

This was discovered whilst adding an illumos CI job in PR #6769. See:
#6769 (comment)

## Solution

This commit changes the test to accept either `ConenctionReset` or
`ConnectionRefused`. This way, we are more tolerant of different
operating systems which may decide to return slightly different errnos
here. Both ECONNREFUSED and ECONNRESET seem reasonable to expect in this
situation, although arguably, ECONNREFUSED is actually more correct: the
acceptor did not accept the connection at all, which seems like
"refusing" it to me...

This commit was cherry-picked from PR #6769.
Darksonn pushed a commit that referenced this pull request Aug 16, 2024
## Motivation

Currently, the test `uds_stream::epollhup` expects that a
`UdsStream::connect` future to a Unix socket which is closed by the
accept side to always fail with `io::ErrorKind::ConnectionReset`. On
illumos, and potentially other systems, it instead fails with
`io::ErrorKind::ConnectionRefused`.

This was discovered whilst adding an illumos CI job in PR #6769. See:
#6769 (comment)

## Solution

This commit changes the test to accept either `ConenctionReset` or
`ConnectionRefused`. This way, we are more tolerant of different
operating systems which may decide to return slightly different errnos
here. Both ECONNREFUSED and ECONNRESET seem reasonable to expect in this
situation, although arguably, ECONNREFUSED is actually more correct: the
acceptor did not accept the connection at all, which seems like
"refusing" it to me...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ci Area: The continuous integration setup
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants