Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create processes and waitpid with Miou #49

Open
mbarbin opened this issue Jan 3, 2025 · 6 comments
Open

Create processes and waitpid with Miou #49

mbarbin opened this issue Jan 3, 2025 · 6 comments

Comments

@mbarbin
Copy link
Contributor

mbarbin commented Jan 3, 2025

I would like to do some experimentation and learning with Miou. In particular, I'd like to run external processes and wait for them cooperatively. I've looked at the Picos implementation of waitpid as source of inspiration and I wonder if it is possible to implement something equivalent in Miou (e.g. could the Miou.syscall mechanism be used as an equivalent to Picos' Select.return_on_sigchld?).

I am very new to Miou, and still have a lot of reading and exploration to do. Perhaps it doesn't make sense, or would be very unusual, to want to run external processes in a Miou app. Let me know! On the other hand, perhaps this ability will be inherited in the future by the work to make Miou compatible with Picos_io? I don't know much at this time, and would be interested to read your thoughts on this, before I embark on an perilous journey. Thank you for Miou!

@dinosaure
Copy link
Contributor

In particular, I'd like to run external processes and wait for them cooperatively. I've looked at the Picos implementation of waitpid as source of inspiration and I wonder if it is possible to implement something equivalent in Miou

Miou is currently based on certain Picos principles such as Triggers and Computations. In addition, Miou can manage signals via Miou.sys_signal. It might be conceivable to combine the two in order to continue a task if a process receives such a signal from the system.

let () = Miou.run ~domains:0 @@ fun () ->
  let c = Miou.Computation.create () in
  let fn _sigchld = assert (Miou.Computation.try_return c ()) in
  let _ = Miou.sys_signal Sys.sigchld (Sys.Signal_handle fn) in
  Miou.Computation.await_exn c
$ ocamlfind opt -linkpkg -package miou main.ml
$ ./a.out &
[1] xxxxx
$ kill -SIGCHLD $!
[1]  + xxxxx done       ./a.out

Note that dom0 tries to observe any changes in the system and that's all it does - it loops until something happens.

(e.g. could the Miou.syscall mechanism be used as an equivalent to Picos' Select.return_on_sigchld?)

Miou.syscall is higher level and concerns interactions with a function such as epoll or Unix.select (or io_uring). For example, Miou_unix is simply an extension to Miou with Miou.syscalls and a Unix.select function. A short tutorial is available here which extends Miou with sleepers.

I am very new to Miou, and still have a lot of reading and exploration to do.

I really recommend reading this short book, which explains the basics of asynchronous programming and Miou's objectives. This book will be extended with concrete examples, but I'm currently working on other writing projects.

Perhaps it doesn't make sense, or would be very unusual, to want to run external processes in a Miou app.

From what I can tell and as far as the Miou_unix module is concerned, it is indeed legitimate to want to spawn a process and obtain the result. From my perspective, which is fairly conciliatory with what already exists, the use of Miou.call and Bos.OS.Cmd could be a possibility:

exception Timeout

open Bos

let () = Miou_unix.run ~domains:1 @@ fun () ->
  let spawn seconds () =
    let cmd = Cmd.(v "sleep" % (Fmt.str "%d" seconds)) in
    let res = OS.Cmd.run_out cmd in
    match OS.Cmd.to_null res with
    | Ok () -> ()
    | Error (`Msg msg) -> failwith msg in
  let with_timeout fn seconds =
    let prm0 = Miou.async @@ fun () -> Miou_unix.sleep seconds; raise Timeout in
    let prm1 = Miou.call fn in
    Miou.await_first [ prm0; prm1 ] in
  match with_timeout (spawn 6) 5. with
  | Ok () -> ()
  | Error exn -> raise exn

In this example, the command is run on a domain that is already launched by Miou (via Miou.call) and a concurrent task (on dom0 via Miou.async) waits 5 seconds. If the command takes longer than 5 seconds, the Timeout exception is thrown - which is currently the case. Otherwise, we return the result of our command.

$ ocamlfind opt -linkpkg -package bos,miou.unix test.ml
$ ./a.out
Fatal error: exception Test.Timeout

The disadvantage of this proposal is that the program must run with at least 2 cores. Otherwise, Miou.call will fail. Another single-core proposal may exist and may be worth proposing in the Miou_unix module. We can work on this specific question and propose a function Miou_unix.spawn 👍.

More generally, Miou is more rudimentary than other schedulers. That doesn't mean it can't be used to create an http server, an email client or even a unikernel! The idea is to take an interest in the basics, so that you can work with other projects in a home-made way.

On the other hand, perhaps this ability will be inherited in the future by the work to make Miou compatible with Picos_io?

Unfortunately, decisions (which are not mine) have been taken on this subject which exclude Miou from any joint efforts to ensure compatibility with Picos. There is, however, a PR (#26) that begins a form of compatibility, feel free to complete it and I can help in the best way I can.

Thank you for Miou!

Thank you for taking the time to test Miou!

@mbarbin
Copy link
Contributor Author

mbarbin commented Jan 5, 2025

It didn't occur to me that blocking calls run in another domain would not block the whole process. Now that you are showing these examples, this makes sense! I suppose my brain is still not quite on OCaml 5 yet!

This actually gives me pause, because I suspect that there is a class of executables that probably can go a long way simply by using Domainslib rather than a concurrent library. I do have codes like this in my projects, especially on client commands, that simply do a few RPCs and process spawning, and don't really use the combinatorics power of a lib like Miou / Eio / Lwt, etc.

Back to the subject at hand, for now I would be OK assuming the presence of at least one extra core, and explored a first miou backend for vcs based on running blocking calls in a Miou.call in this PR: mbarbin/vcs#47

I looked a bit in tests in conditions that result in Miou.no_domain_available. You see, looking at the vcs integration as a driving example, it's not so much that I would not want to run blocking calls in another domain, but more that I fear this would be hard to predict whether some client code will not realize what is happening, and starve the domain pool by spawning too many processes in Miou.async or Miou.parallel calls. This would definitely be solved with a design with a cooperative waitpid so you don't need to hold an entire domain just waiting on an external process, and not fear the Miou.no_domain_available exception.

I am likely to have follow-up questions about Miou.call about that. But first. I'll do the recommended readinds and get back to you. Thanks for the lovely response!

@dinosaure
Copy link
Contributor

As far as Miou.No_domain_available is concerned, the exception is only raised in 2 situations:

  • the first is of course if you try to use Miou.call/Miou.parallel when you only have one core (which is the case for unikernels)
  • the second, a little more subtle, is when you try to launch a task in parallel from a task which is already in parallel with dom0

There is an invariant in Miou where Miou.call/Miou.parallel does not transmit the task to dom0 regardless of where it was executed (from dom0 or another domain).

let () = Miou.run @@ fun () ->
  let rec fn () =
    assert ((Stdlib.Domain.self () :> int) <> 0);
    let prm = Miou.call @@ fun () ->
      assert ((Stdlib.Domain.self () :> int) <> 0) in
    Miou.await_exn prm in
  let domains = Miou.Domain.available () in
  Miou.parallel fn (List.init domains (Fun.const ()))
  |> List.iter (function Ok () -> () | Error exn -> raise exn)

Another invariant is that the domain (randomly) chosen to execute the task by Miou.call is always different from the domain in which the task was created.

let () = Miou.run @@ fun () ->
    let p =
      Miou.call @@ fun () ->
      let v = Stdlib.Domain.self () in
      let q = Miou.call @@ fun () -> Stdlib.Domain.self () in
      (v, Miou.await_exn q)
    in
    let v, u = Miou.await_exn p in
    assert (v <> u)

In the example above, p could run on dom1 and q would have to run on dom2 (impossible to transfer the task to dom0!). This shows that, no matter how many tasks you perform (with Miou.async or Miou.call or Miou.parallel) or applications you try to run with Miou, No_domain_available will never be raised if you have at least 3 cores (dom0, dom1 and dom2).

It is important to understand that Miou.call/Miou.parallel does not create a new domain. The domains are already allocated by Miou, from the beginning, and are available to execute tasks thanks to Miou.call. Once Miou is able to allocate 2 other domains (+ dom0), you have enough to run all the parallel tasks you want in any configuration.

So, 2 tasks prm0 and prm1 which transfer to dom1 for example, will run in parallel with respect to dom0 but cooperatively with respect to each other in dom1.

let dom1 = Miou.Domain.Uid.of_int 1

let () = Miou.run @@ fun () ->
  let prm0 = Miou.call ~pin:dom1 @@ fun () ->
    assert ((Stdlib.Domain.self () :> int) = 1)in
  let prm1 = Miou.call ~pin:dom1 @@ fun () ->
    assert ((Stdlib.Domain.self () :> int) = 1)in
  Miou.await_exn prm0;
  Miou.await_exn prm1

NOTE: It's advisable not to use pin, as this is a fairly experimental aspect of the API and I'm not yet fully aware of its implications.

@mbarbin
Copy link
Contributor Author

mbarbin commented Jan 5, 2025

Very interesting about the use of 3 domains for Miou.call - it sounds like you read my mind about what were going to be my next question :-)

By the way, while reading more about domains and cores, I had a little exchange with a robot. Here is an extract from it:

mbarbin:

What prevents using several domains on a machine that has only one core? Would that still allow threads running on different domains not to be blocked by each other waiting on blocking unix operations? As in, I am to understand that a blocking call in a thread blocks all the threads that are running in the same domain, but if the domain are distinct, then the threads of another domain are not blocked. Is that still true when the domain run on the some CPU core?

robot:

Yes, you can use several domains on a machine that has only one core, and it would still allow threads running on different domains not to be blocked by each other waiting on blocking Unix operations. Here's how it works:

Blocking Calls in a Domain:

In OCaml, if a thread within a domain makes a blocking call (e.g., I/O operation), it can block all other threads within the same domain because they share the same runtime and event loop.

Multiple Domains:

When you use multiple domains, each domain has its own runtime and event loop. Therefore, a blocking call in one domain does not block the threads in another domain. This allows for better concurrency and parallelism, even if the domains are running on the same CPU core.

Single Core Execution:

On a single-core machine, the operating system's scheduler will time-slice the execution of domains, switching between them to give the appearance of parallelism. While only one domain can execute at any given moment, the OS can switch between domains, allowing them to make progress independently.

In summary, using multiple domains can help avoid blocking issues caused by blocking calls, even on a single-core machine. The OS scheduler ensures that domains can run independently, preventing a blocking call in one domain from affecting the execution of threads in another domain.

Do you agree with the robot, or is it hallucinating? I understand you lose true parallelism doing this, but if you focus on cooperation, does allocating 3 domains on a single core, with the Miou.call strategy as discussed, be considered as a cheap solution to avoid blocking calls?

@dinosaure
Copy link
Contributor

does allocating 3 domains on a single core, with the Miou.call strategy as discussed, be considered as a cheap solution to avoid blocking calls?

To cut a long story short, it is indeed possible to allocate 3 domains even if you only have one core, but you will run into a contention problem in the kernel, which will try to make you believe that these domains are running at the same time when in fact it will decide to pause them frequently and run them one by one on the only core it has available.
This pausing by the kernel is what we're trying to avoid. It is extremely costly. In this case, you could start from this model and enforce the use of 3 domains with Miou.run ~domains:2 @@ fun () -> ... but your program will be extremely slow.

It's in this case that an asynchronous waitpid via Miou is more interesting because you'd avoid the pause required by the kernel since Miou would take care of pausing your function to do something else.

@mbarbin
Copy link
Contributor Author

mbarbin commented Jan 9, 2025

In this case, you could start from this model

Atm it looks like I'll be doing some exploration of Miou with enough cores for this simpler approach to work OK, so for now it looks as if the async waitpid isn't going to be in my critical path. Still interested to discuss more perhaps a little later down the line. I'm happy to close this issue for now, or keep it around at your convenience. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants