Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop bytecode VM and compile to LLVM IR #574

Open
24 of 37 tasks
adazem009 opened this issue Sep 8, 2024 · 7 comments
Open
24 of 37 tasks

Drop bytecode VM and compile to LLVM IR #574

adazem009 opened this issue Sep 8, 2024 · 7 comments
Labels
P0 Priority: Critical task

Comments

@adazem009
Copy link
Contributor

adazem009 commented Sep 8, 2024

This player is going to be feature-complete soon, so I should start thinking about what to do with the current bytecode based virtual machine. It wasn't the best idea because it isn't fast. Scratch code can be compiled to LLVM IR instead which is then compiled to the architecture specific machine code by LLVM.

If anyone has experience with this, feel free to comment with any ideas :)

Notes:

  • I don't want to write a JIT. I'd prefer ahead-of-time compilation when projects are loaded (just like the current compiler does it).
  • This will require rewriting/refactoring all Scratch blocks because they rely on the bytecode VM.
  • Since it'll take a long time to implement, I'll most likely keep the current implementation and make the LLVM-based compiler optional while it's being developed.

How will it work?

  • Every Scratch script will have LLVM IR which will be executed using LLVM's ExecutionEngine with JIT compilation. It should be fast enough. If not, it should be possible to implement hacks like compiling to native code and running it from memory, but that's not portable.
  • Since Scratch scripts must be able to yield at random points (for example at the end of a loop, if not running without screen refresh), coroutines will be used.
  • A coroutine will execute the sequence for the appropriate part of the script (block functions in the sequence will be declared with extern "C").
  • If the code has to yield, the coroutine will suspend (e. g. in a loop). The execution context will store the coroutine handle so it can resume later.
  • If the script has to restart (current VM has a function that makes it jump to the beginning), it'll just destroy the coroutine handle and call the coroutine.
  • If the code can run without screen refresh, there's no need to use coroutines which should be faster.

Progress:

  • API preparation: Refactor Value class
  • API preparation: Refactor List class
  • API preparation: Add Thread class
  • API preparation: Refactor extension API #393
  • Add Compiler class (in dev folder to keep the old implementation)
  • Add code builder class (abstract)
  • Add LLVM code builder class
  • Add a class for executing compiled code
  • Add API for Scratch blocks
  • If statements
  • Loops
  • Yielding
  • Operators
  • Math instructions
  • Variables
  • Lists
  • Promises
  • API for testing Scratch blocks
  • Motion blocks
  • Looks blocks
  • Sound blocks
  • Event blocks
  • Control blocks
  • Sensing blocks
  • Operator blocks
  • Custom blocks
  • Variable blocks
  • List blocks
  • Edge-activated hats
  • Monitors
  • Implement garbage collection (to avoid memory leaks in suspended scripts that are never resumed).
  • Implement type prediction within scripts
  • Implement type prediction within sprites (local variables/lists)
  • Implement type prediction within project (global variables/lists)
  • Cache converted custom block arguments
  • Implement custom block inlining
  • Remove bytecode VM
@adazem009
Copy link
Contributor Author

@Arcnor Just letting you know I've opened the issue to work on this.

@Arcnor
Copy link
Contributor

Arcnor commented Sep 8, 2024

@Arcnor Just letting you know I've opened the issue to work on this.

Hey @adazem009 , that's fantastic, and thanks for keeping me posted :). It makes sense not wanting to write a JIT, it's not an simple task and very easy to fall down the rabbit hole :D. But there are some implementations you can use IIRC. There are also other targets that are not LLVM that might help you as well, there are things like LibJIT that provide you a big base to construct upon. Finally LLVM itself has ORC, and the tutorial covers this, although I've never used it (nor the predecesor, can't remember its name) and it's not an actual JIT implementation but "hooks" to write your own.

Anyway, I wish you luck!

@adazem009
Copy link
Contributor Author

adazem009 commented Sep 8, 2024

@Arcnor Just letting you know I've opened the issue to work on this.

Hey @adazem009 , that's fantastic, and thanks for keeping me posted :). It makes sense not wanting to write a JIT, it's not an simple task and very easy to fall down the rabbit hole :D. But there are some implementations you can use IIRC. There are also other targets that are not LLVM that might help you as well, there are things like LibJIT that provide you a big base to construct upon. Finally LLVM itself has ORC, and the tutorial covers this, although I've never used it (nor the predecesor, can't remember its name) and it's not an actual JIT implementation but "hooks" to write your own.

Anyway, I wish you luck!

Thanks. I mostly think it doesn't make sense to write a JIT because (probably) all Scratch projects compile very quickly with the current compiler, only JSON parsing (to generate AST) can take a long time. I just hope compiling with LLVM won't slow it down.

I'll most likely use LLVM as long as I can call C++ functions from the IR and write unit tests for blocks easily (easier than it is now, with the complicated bytecode VM).

@davidtheplatform
Copy link

(reply to a discord message, but I'm putting it here since its relevant)
The way I set up threespeed is theres a python script that converts a project.json file to c++. This conversion is basically 1:1, ie. each scratch block becomes a single function call to the corresponding C++ implementation.
This makes writing the compiler pretty easy.
I also decided that each block stack would get its own native thread. This was probably a bad idea since most performance-intensive projects are mostly single threaded anyways and it comes with all the normal multithreading problems.

If you decide to compile the code instead of making a bytecode interpreter, the main problem you'll run into is how scratch handles reentrancy. If you have a long-running event handler and it gets called again, the first instance immediately stops. With a bytecode interpreter this is easy to handle, for native code it isn't since this isn't how most languages handle reentrancy.
You might be able to solve this in an elegant way if you don't just generate function calls, although it sounds like you're going down a similar path as what I did.

As for compile time, the sb3 -> c++ compiler written in python is much faster than gcc. The project.json is already an AST and you have to parse at most 5mb of data for most projects.

@adazem009
Copy link
Contributor Author

adazem009 commented Sep 8, 2024

(reply to a discord message, but I'm putting it here since its relevant) The way I set up threespeed is theres a python script that converts a project.json file to c++. This conversion is basically 1:1, ie. each scratch block becomes a single function call to the corresponding C++ implementation. This makes writing the compiler pretty easy. I also decided that each block stack would get its own native thread. This was probably a bad idea since most performance-intensive projects are mostly single threaded anyways and it comes with all the normal multithreading problems.

If you decide to compile the code instead of making a bytecode interpreter, the main problem you'll run into is how scratch handles reentrancy. If you have a long-running event handler and it gets called again, the first instance immediately stops. With a bytecode interpreter this is easy to handle, for native code it isn't since this isn't how most languages handle reentrancy. You might be able to solve this in an elegant way if you don't just generate function calls, although it sounds like you're going down a similar path as what I did.

As for compile time, the sb3 -> c++ compiler written in python is much faster than gcc. The project.json is already an AST and you have to parse at most 5mb of data for most projects.

These things (thread/script management) are currently handled by the Engine class (https://github.com/scratchcpp/libscratchcpp/blob/537ff52a7500d17d07df7ea31c7023358824e344/src/engine/internal/engine.cpp). I just need to make the native code be able to stop, so it can resume later. That's probably the trickiest part. I'm not sure how that's going to work yet.

@adazem009
Copy link
Contributor Author

I found this, but I'm not sure it's related: https://llvm.org/docs/Coroutines.html
It seems I can make a coroutine suspend and then resume, but I don't know if I can do that with the main function as well.

@adazem009 adazem009 added P1 Priority: High and removed P4 Priority: Trivial labels Sep 9, 2024
@adazem009
Copy link
Contributor Author

adazem009 commented Sep 10, 2024

@Arcnor After some experiments with LLVM, I think I've figured out how to be able to suspend and resume compiled code (read @davidtheplatform's comment). See the "How will it work?" section of the issue description.

@adazem009 adazem009 added P0 Priority: Critical and removed P1 Priority: High labels Sep 16, 2024
@adazem009 adazem009 added P3 Priority: Low and removed P0 Priority: Critical labels Nov 3, 2024
@adazem009 adazem009 added P0 Priority: Critical and removed P3 Priority: Low labels Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 Priority: Critical task
Projects
None yet
Development

No branches or pull requests

3 participants