Here are my two cents, based on my experience.
There should be very little gain splitting rendering in threads and compositing in practice. I worked with distributed rendering and there were significant challenges, and it was only worth to compose the image from multiple renders with separate objects if the scene was too large to fit in memory or the render process was very slow, like with ray tracing. The speed of downloading a texture to the CPU and sending it again is very slow compared to accessing internal textures in the GPU. Again, I don’t know if you can share contexts between web workers or a web worker and the main thread (a quick search returned nothing, and since offscreen rendering is new I would guess probably not).
I have used overlapping transparent canvas without problem before, but there was no z-buffer involved. Browsers didn’t seem to have a performance problem with that, but with BJS you’d have to sort objects and split them yourself – quite a problem. It could be useful for a HUD, but then it’s probably better to do them with pure HTML or have a webworker handling the HUD in another canvas.
Applications using C/C++/native engines usually have very fast frame render times. It has to be quick, since they often have several passes (check this article to see what Doom does), and when the hardware is too slow to keep up with the game they reduce the effects – adding more render threads wouldn’t help, you’re already GPU bound. And if you’re CPU bound, rendering things faster on the GPU won’t help either.
I haven’t kept up-to-date on what hardware can do to improve GPU render times, but I know there are a few extensions like https://www.khronos.org/registry/OpenGL/extensions/OVR/OVR_multiview.txt or WebGL Deferred Shading - Mozilla Hacks - the Web developer blog. Also, simulation is not my domain, but I know about PhysX/Bullet and other attempts of performing physics in the GPU. Apparently this was already thought of, not sure if it was actually implemented in BJS: Running physics in a webworker
Anything that is CPU bound could potentially perform better with a webworker – particle animations, collision detection, things like that. But webworkers are still a pain to work with and passing data has a lot of restrictions.
SharedArrayBuffer was deprecated and partially un-deprecated. Sending data by reference is somewhat muddy – possible, but could have some significant synchronization challenges and require a double or triple buffer. @aFalcon perhaps could link to the discussion he mentioned.
From my tests the bottleneck in BJS seems to be animating some objects (surprising me, since I imagined it’d be done in the GPU these days always, but apparently not in some cases) and
_evaluateActiveMeshes, which I guess is slow from sending data to the GPU and not actual CPU calculation (any BJS gods could confirm this?). I suppose animation could benefit from moving to a web worker, but it’s probably not trivial since BJS seems to need it to be done before the render phase. The best advantage of using offscreen to me is releasing the main thread, so even if the rendering is slow your browser doesn’t freeze.
I think it boils down to: what are you trying to achieve and what is your bottleneck?