Description
One important nature of the WebGPU is that it is asynchronize, which means synchronization can only be done through callbacks.
Such API deisgn forces async semantics on the application, and it totally makes sense on a host execution environment such as javascript. For example, in our recent work to support machine learning, we simply wrap the async interface as a JS future and use javascript's await for synchronizaiton.
On a native only wasm application though, the async nature of the API puts quite a lot of burdens on the application itself. For example, it is extremely painful to directly use the webgpu-native C API to write a C/C++ application, because there is no built in async/await mechanism in the language.
I want to use the thread to gather thoughts along the direction as there are quite a few design choices. These design choices relates to both the WASM execution env, as well as the header defintion. I will list them below
- C0: Only Keep Async C API, and rely on async/await support on the languages that compiles to wasm
- Explaination: it is certainly be easier to target the current C API using rust, because the language have native async/await support.
- C1: Introduce Sync API, and sync support on native
- Most of the the downstream APIs(metal, vulkan) do have a synchronization primitive, and we could just expose them as an API
- C2: Introduce Sync API, think about asynchization
- Same as C1, but we acknowledge the fact that async is the nature of WebGPU. Because the synchronization(blocking) happens in WASM to system boundary, there are certainly techiniques(with limitations) to turn the synchronization call to an async version. However, such feature either depends on the compiler, or the WASM VM(runtime). As a simple example, if we place a restriction that the async system call can only resume at the call-site. Then we could simply "freeze" the state of wasm vm, do other jobs, and then re-enter without any backup of the stack(because stack is already in the linear memory of the wasm), this removes the overhead of a pause/resume, but requires the support of the WASM runtime.
My current take is that C2 is the most ideal one, as it enables applications to write as native, but still deploys to most platforms, however, there are certain gaps in runtime(related to standardization) to make that happen.