-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a basic illumos test job #6769
base: master
Are you sure you want to change the base?
Conversation
This commit adds a simple test job for illumos using Buildomat. I'd like to make some more improvements and add docs before merging this, but I want to see if it even works first.
@jclulow, whenever you have the time, I'd love to get your eyes on my buildomat script --- I'm sure it could be made nicer! |
Welp, it looks like some of the tests fail on illumos, which I actually think is good news --- that's why we're adding this, after all! |
I looked at the anon_pipe_spawn_echo test, FWIW, and I think it's just leaning on unspecified behaviour of echo(1). You could switch echo out for printf there, and I think it would still work on every other system. (For reference: echo in xpg8 says that Implementations shall not support any options.) |
Yeah, I kinda figured it was something like that --- makes sense to me! |
The illumos version of `echo(1)` doesn't support `-n`, so it interprets it literally as part of the string to echo, making the test fail. I've changed it to use `printf(1)` with no flags args to get the same behavior. I also changed the name of the test, since it doesn't actually spawn `echo(1)` anymore :)
Some more test failures, including two for UDS:
I'll add that we appear to currently be on mio 1.0.2 on CI: https://buildomat.eng.oxide.computer/wg/0/details/01J53XZE95B7J0BRPT2QN3FR1B/3o4DSQRTlGfizqZJ0Q8Qzv5Lp8Axg7hUe5MPyG5OUVR87hae/01J53Y0AQTN8H3RDD6CK3QJ6FG#S336 It would be interesting to see whether the test results are different on earlier mio revisions. |
well, that's cool: if i add a single diff --git a/tokio/tests/io_async_fd.rs b/tokio/tests/io_async_fd.rs
index 3ab1cebd..c5d5cb61 100644
--- a/tokio/tests/io_async_fd.rs
+++ b/tokio/tests/io_async_fd.rs
@@ -139,7 +139,7 @@ fn drain(mut fd: &FileDescriptor) {
let mut buf = [0u8; 512];
#[allow(clippy::unused_io_amount)]
loop {
- match fd.read(&mut buf[..]) {
+ match dbg!(fd.read(&mut buf[..])) {
Err(e) if e.kind() == ErrorKind::WouldBlock => break,
Ok(0) => panic!("unexpected EOF"),
Err(e) => panic!("unexpected error: {:?}", e), the test now passes for me on illumos. 🤡 |
stack traces from the eliza@atrium ~ $ pargs 9451
9451: /home/eliza/tokio/target/debug/deps/io_async_fd-c80359565a3a95c8 reset_writable
argv[0]: /home/eliza/tokio/target/debug/deps/io_async_fd-c80359565a3a95c8
argv[1]: reset_writable
argv[2]: --nocapture
eliza@atrium ~ $ pstack 9451 | demangle
9451: /home/eliza/tokio/target/debug/deps/io_async_fd-c80359565a3a95c8 reset
--------------------- thread# 1 / lwp# 1 ---------------------
fffff9ffeef44697 lwp_park (0, 0, 0)
fffff9ffeef3db05 cond_wait_queue (e8ef88, e8efa0, 0) + 55
fffff9ffeef3e16a __cond_wait (e8ef88, e8efa0) + ba
fffff9ffeef3e1ae cond_wait (e8ef88, e8efa0) + 2e
fffff9ffeef3e1f5 pthread_cond_wait (e8ef88, e8efa0) + 15
0000000000a0814b std::thread::park::hb49d084e872705a5 () + 6b
000000000079b87e std::sync::mpmc::list::Channel<T>::recv::{{closure}}::hf62e1229f5251f1e () + 8e
000000000079b382 std::sync::mpmc::list::Channel<T>::recv::h0b15a8b7d971ec93 () + 272
00000000007b05c0 test::console::run_tests_console::hbc01c9336ae30ea6 () + 1c00
00000000007cea23 test::test_main::h821d08de9f858998 () + 153
00000000007cf817 test::test_main_static::hd54d88e3cf121884 () + 57
0000000000797e55 io_async_fd::main::hda858a55aa0b0e04 () + 15
00000000007652ae core::ops::function::FnOnce::call_once::h537173c55fa15c15 () + e
000000000078ed51 std::sys_common::backtrace::__rust_begin_short_backtrace::h5368d7f525ae1a75 () + 11
000000000073b5e4 std::rt::lang_start::{{closure}}::hed5ee56bf0bdaa4e () + 14
0000000000a077c3 std::rt::lang_start_internal::hdc1ac086c909d074 () + 2d3
000000000073b5b7 std::rt::lang_start::h20aead5d623943e6 () + 37
0000000000797e81 main () + 21
000000000072ec37 _start_crt () + 87
000000000072eb98 _start () + 18
------------ thread# 2 / lwp# 2 [reset_writable] -------------
fffff9ffeef4af4a ioctl (3, d001, fffff9ffeb5fddc0)
000000000092ab5f mio::sys::unix::selector::Selector::select::heba8144752651684 () + 8f
000000000092e3a9 mio::poll::Poll::poll::ha7f583c7aba8c8df () + 49
00000000007f99d3 tokio::runtime::io::driver::Driver::turn::hdbc877f2c5a33d2a () + 73
00000000007f96d9 tokio::runtime::io::driver::Driver::park::h9aa1ba931c7ade7a () + 49
000000000081e1e9 tokio::runtime::signal::Driver::park::hc0c910bc43cc300c () + 19
00000000007f3d89 tokio::runtime::process::Driver::park::hc7bde4d7ee237dd9 () + 19
000000000086a6b7 tokio::runtime::driver::IoStack::park::hc2b6b08bb1a7b51d () + 47
00000000008b1658 tokio::runtime::time::Driver::park_internal::h012bb0183fb6e65b () + 2e8
00000000008b12a3 tokio::runtime::time::Driver::park::h540665ec9ecaed80 () + 23
000000000086ac33 tokio::runtime::driver::TimeDriver::park::h092df9172d1963aa () + 33
0000000000869f15 tokio::runtime::driver::Driver::park::h82219332c43dd971 () + 15
0000000000802e52 tokio::runtime::scheduler::current_thread::Context::park::{{closure}}::h591fef1df8327f80 () + 22
0000000000803507 tokio::runtime::scheduler::current_thread::Context::enter::hd4aadae4f3c48ecb () + 147
0000000000814590 tokio::runtime::scheduler::current_thread::Context::park::h5710d1b3bfecfdae () + 220
0000000000755e9c tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::hcd0aab2bdfea8763 () + 61c
0000000000754e0b tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}::ha9e5ceabc5a33a08 () + 2b
0000000000790256 tokio::runtime::context::scoped::Scoped<T>::set::hba54509bcb7d1159 () + 96
0000000000792ad7 tokio::runtime::context::set_scheduler::{{closure}}::h0d748329321238e3 () + 37
000000000073fb2e std::thread::local::LocalKey<T>::try_with::ha15aa11176beca59 () + fe
000000000073ddf1 std::thread::local::LocalKey<T>::with::h3c6198f3a4fe4c05 () + 11
00000000007929af tokio::runtime::context::set_scheduler::h4d7018abd6870714 () + 2f
0000000000754529 tokio::runtime::scheduler::current_thread::CoreGuard::enter::h3d583cd250e1a64c () + 179
0000000000754ebd tokio::runtime::scheduler::current_thread::CoreGuard::block_on::h4b3b82ad10cb0d96 () + 1d
000000000075069f tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}}::h785603f5fbb451e3 () + 1ff
000000000078b9c2 tokio::runtime::context::runtime::enter_runtime::h7ace6ddcba5e5651 () + f2
0000000000750067 tokio::runtime::scheduler::current_thread::CurrentThread::block_on::hbfbcbe776dcb53ff () + 97
000000000073802f tokio::runtime::runtime::Runtime::block_on_inner::h1afb29bef3233be6 () + 8f
000000000073849a tokio::runtime::runtime::Runtime::block_on::hac4960ca03c4ed0a () + 5a
0000000000794dd3 io_async_fd::reset_writable::h070bf37d42223f46 () + 123
00000000007730a9 io_async_fd::reset_writable::{{closure}}::hb0f0c038f01573a4 () + 19
0000000000765a98 core::ops::function::FnOnce::call_once::hc1404f0ca2253420 () + 18
00000000007d5462 test::__rust_begin_short_backtrace::h1d84a84fc5339953 () + 12
00000000007d4330 test::run_test::{{closure}}::hc7f6d1d4a8ac2dde () + 5b0
0000000000798a22 std::sys_common::backtrace::__rust_begin_short_backtrace::hbc4ec67e58aa30af () + c2
000000000079dbc7 core::ops::function::FnOnce::call_once{{vtable.shim}}::hb8da26b9e73c729a () + 77
0000000000a2cec9 std::sys::pal::unix::thread::Thread::new::thread_start::h9ad099a424212e51 () + 29
fffff9ffeef44307 _thrp_setup (fffff9ffec6a0240) + 77
fffff9ffeef44650 _lwp_start ()
eliza@atrium ~ $ unfortunately, these stacks are not very interesting: they're just telling us that we are waiting on a notification (in |
## Motivation The `io_async_fd.rs` tests contain a `drain()` function, which currently performs synchronous reads from a UDS socket until it returns `io::ErrorKind::WouldBlock` (i.e., errno `EWOULDBLOCK`/`EAGAIN`). The *intent* behind this function is to ensure that all data has been drained from the UDS socket's buffer...which is what it appears to do...on Linux. On other systems, it appears that an `EWOULDBLOCK` or `EAGAIN` may be returned before enough data has been read from the UDS socket to result in the other end being notified that the socket is now writable. In particular, this appears to be the case on illumos, where the tests using this function hang forever (see [this comment][1] on PR #6769). To my knowledge, this behavior is still POSIX-compliant --- the reader will still be notified that the socket is readable, and if it were actually doing non-blocking IO, it would continue reading upon receipt of that notification. So, relying on `EWOULDBLOCK` to indicate that the socket has been sufficiently drained appears to rely on Linux/FreeBSD behavior that isn't necessarily portable to other Unices. ## Solution This commit changes the `drain()` function to take an argument for the number of bytes *written* to the socket previously, and continue looping until it has read that many bytes, regardless of whether `EWOULDBLOCK` is returned. This should ensure that the socket is drained on all POSIX-compliant systems, and indeed, the `io_async_fd::reset_writable` and `io_async_fd::poll_fns` tests no longer hang forever on illumos. I think making this change is an appropriate solution to the test failure here, as the `drain()` function is part of the test, rather than the code in Tokio *being* tested, and (as I mentioned above) the use of blocking reads on a non-blocking socket without a mechanism to continue reading when the socket becomes readable again is not really something a real life program seems likely to do. Ensuring that all the written bytes have been read by passing in a byte count seems more faithful to what the test is actually *trying* to do here, anyway. Thanks to @jclulow for debugging what was going on here! This change was cherry-picked from commit f18d6ed from PR #6769, so that the fix can be merged separately. [1]: #6769 (comment)
#6776 should fix the |
https://buildomat.eng.oxide.computer/wg/0/details/01J592RB0JR203RGGN0RYHJHMY/CHieo1Uee7qzRVyp811YBl0MvXGO3i0QA9ezPaFWj6hf6P3P/01J592RSWCNX1MCFYGW74AHVH6#S1953 seems not great --- it appears that on illumos, the thread local...isn't, somehow? |
## Motivation In `tokio::task::local`, there's a `LocalState::assert_called_from_owner_thread` function, which checks that the caller's thread ID matches that of the thread on which the `LocalSet` was created. This assertion is disabled on FreeBSD and OpenBSD, with a comment indicating that this is because those systems have "some weirdness around thread-local destruction". If memory serves, I believe the "weirdness" was related to the order in which thread-locals are destroyed relative to each other, or something along those lines. Upon running the tests on illumos, there appears to be [a similar panic][1] in the `task_local_set_in_thread_local` due to this assertion assertion while dropping a `LocalSet` which lives in a thread-local. This leads me to believe that illumos, and presumably Solaris, also exhibit the same "weirdness". This wouldn't be too surprising, given the shared heritage of those systems. ## Solution This commit adds `target_os="illumos"` and `target_os="solaris"` to the `cfg` attribute that disables the assertion on systems determined to have weirdness. This fixes the test panicking on illumos. In the future, it might be worthwhile to look into changign the assertion to only be disabled on weirdness-having systems _while dropping the `LocalSet`_, rather than all the time. That way, we can still check other accesses. But, I wanted to make the minimum change necessary to fix it here before messing around with that. [1]: https://buildomat.eng.oxide.computer/wg/0/details/01J592RB0JR203RGGN0RYHJHMY/CHieo1Uee7qzRVyp811YBl0MvXGO3i0QA9ezPaFWj6hf6P3P/01J592RSWCNX1MCFYGW74AHVH6#S1954
* tests: handle spurious EWOULDBLOCK in io_async_fd ## Motivation The `io_async_fd.rs` tests contain a `drain()` function, which currently performs synchronous reads from a UDS socket until it returns `io::ErrorKind::WouldBlock` (i.e., errno `EWOULDBLOCK`/`EAGAIN`). The *intent* behind this function is to ensure that all data has been drained from the UDS socket's buffer...which is what it appears to do...on Linux. On other systems, it appears that an `EWOULDBLOCK` or `EAGAIN` may be returned before enough data has been read from the UDS socket to result in the other end being notified that the socket is now writable. In particular, this appears to be the case on illumos, where the tests using this function hang forever (see [this comment][1] on PR #6769). To my knowledge, this behavior is still POSIX-compliant --- the reader will still be notified that the socket is readable, and if it were actually doing non-blocking IO, it would continue reading upon receipt of that notification. So, relying on `EWOULDBLOCK` to indicate that the socket has been sufficiently drained appears to rely on Linux/FreeBSD behavior that isn't necessarily portable to other Unices. ## Solution This commit changes the `drain()` function to take an argument for the number of bytes *written* to the socket previously, and continue looping until it has read that many bytes, regardless of whether `EWOULDBLOCK` is returned. This should ensure that the socket is drained on all POSIX-compliant systems, and indeed, the `io_async_fd::reset_writable` and `io_async_fd::poll_fns` tests no longer hang forever on illumos. I think making this change is an appropriate solution to the test failure here, as the `drain()` function is part of the test, rather than the code in Tokio *being* tested, and (as I mentioned above) the use of blocking reads on a non-blocking socket without a mechanism to continue reading when the socket becomes readable again is not really something a real life program seems likely to do. Ensuring that all the written bytes have been read by passing in a byte count seems more faithful to what the test is actually *trying* to do here, anyway. Thanks to @jclulow for debugging what was going on here! This change was cherry-picked from commit f18d6ed from PR #6769, so that the fix can be merged separately. [1]: #6769 (comment) Signed-off-by: Eliza Weisman <eliza@elizas.website>
## Motivation Currently, the test `uds_stream::epollhup` expects that a `UdsStream::connect` future to a Unix socket which is closed by the accept side to always fail with `io::ErrorKind::ConnectionReset`. On illumos, and potentially other systems, it instead fails with `io::ErrorKind::ConnectionRefused`. This was discovered whilst adding an illumos CI job in PR #6769. See: #6769 (comment) ## Solution This commit changes the test to accept either `ConenctionReset` or `ConnectionRefused`. This way, we are more tolerant of different operating systems which may decide to return slightly different errnos here. Both ECONNREFUSED and ECONNRESET seem reasonable to expect in this situation, although arguably, ECONNREFUSED is actually more correct: the acceptor did not accept the connection at all, which seems like "refusing" it to me... This commit was cherry-picked from PR #6769.
## Motivation Currently, the test `uds_stream::epollhup` expects that a `UdsStream::connect` future to a Unix socket which is closed by the accept side to always fail with `io::ErrorKind::ConnectionReset`. On illumos, and potentially other systems, it instead fails with `io::ErrorKind::ConnectionRefused`. This was discovered whilst adding an illumos CI job in PR #6769. See: #6769 (comment) ## Solution This commit changes the test to accept either `ConenctionReset` or `ConnectionRefused`. This way, we are more tolerant of different operating systems which may decide to return slightly different errnos here. Both ECONNREFUSED and ECONNRESET seem reasonable to expect in this situation, although arguably, ECONNREFUSED is actually more correct: the acceptor did not accept the connection at all, which seems like "refusing" it to me...
This commit adds a simple test job for illumos using Buildomat. I'd like to make some more improvements and add docs before merging this, but I want to see if it even works first.