Description
Background
One of the primary goal of WASM is to run on the web, and web is async in nature. On the other hand, there is always a need to introduce "synchronize style" system calls that might block, most notable examples includes: file system operations, networking, GPU and accelerator synchronization in machine learning.
Choices
There are several ways to deal with these system calls:
- C0: always turn synchronize calls into asynchronize version(e.g. callbacks). This puts the burden on the compiler and language to implement pause/resume inside WASM, which brings additional overhead.
- C1: Allow synchronize calls in the native environment(e.g. a wasm vm), where synchronization is fine. This cause mismatch between the web and native execution.
- C2: Asynctify the system calls via compiler(save the wasm stack into linear memory every time system call happens), or support ti via corountine
- C3: Standardize mechanism to asynchify synchronize system calls.
Lightweight Asynctification
C2 is certainly one viable option, however it might bring additional execution time overhead. I will elaborate the choice C3 and discuss why it might need standardization (either in WASI or other part of the WebAssembly spec).
Because WebAssembly is executed as a VM, all of its states in the stack are already stored somewhere(e.g. a stack data structure in a WASM VM). When a wasm program calls into a system interface that might synchronize, the execution environment simply "freeze" the state of wasm VM(by storing potential register items if any, into a context). And calls an asynchronize version of the system call, then the execution environment can resume the execution when the callback is done.
This approach certainly have its limitations, since we can only resume at the callsite. And we cannot call into the wasm vm again before the previous system call is resumed.
However, by doing so, we get the asynctification "for free". Because there is no need of saving the stack(as they are part of the linear memory that get frozen).
We believe this is an important design decision for system libraries that will affect the future of machine learning and other applications in WASM. So I hope to use this thread to gather discussions.
Given that C3 requires standardization through of the execution environment, I hope to use this thread to seed the discussion of the topic. It would certainly become part of WASM JS API, but the implications also goes beyond JS, as it could also impact WASM VMs like wasmtime/wasmer.