Why WebGPU backend is slower?

I tried to play around with WebGPU renderer and found that on average it produces less FPS comparing to WebGL while having longer initialization times. The test https://playground.babylonjs.com/#60A8NI#41 is 22500 non instanced cubes rendered (but instanced WebGPU is also slower).

This is unexpected because this is exactly the situation where WebGPU should be shining - a lot of pre-initialized pipelines with a little overhead while switching between and no ANGLE layer in between of browser and graphics API.

Could you explain why this happens and what to expect in the future regarding WebGPU performance?

Tried in Chrome Version 96.0.4652.0 (Official Build) canary (arm64)

1 Like

1 Like

See the current status of WebGPU implementation and words about performance here: WebGPU Status | Babylon.js Documentation

The problem is that there are still API calls to the browser and those are taking time. In WebGL2, with VAO, you have a single call to setup the vertex and index buffers. In WebGPU you have a call to set the index buffer and one or more calls to setup the vertex buffers (on average 3 calls if you have position, normals and uvs). In addition to these calls, you need to call setPipeline, setBindGroup (possibly several times) and draw. So, for each mesh, it’s between 6 and 10 API calls you need to issue to draw it. Another problem is that the philosophy/design of the new API is completely different from the one of WebGL. As we need to be backward compatible and want everything currently working in WebGL to also work in WebGPU, we have a number of constraints to deal with that we would not have if we started a new engine from scratch.

Note that your test scene is not really suitable for perf testing because there are too many individual objects, you kill the fps only because the javascript side must handle all these objects (independent of WebGL or WebGPU): it must collect them, compute the world matrices, recompute the bounding boxes and display them. It also ends up issueing (tens of) thousands draw calls per frame which is not really sustainable. There’s nothing related to the gfx API in the top slowest functions (the “(anonymous)” line is the frame function of the PG):
image

There are ways to improve performances, and the main one is using bundles to wrap all the API calls needed to draw a mesh. We have added a new snapshot rendering mode (only available in WebGPU), see this doc page that are using bundles. Depending on your scene, it can help a lot with performances, especially when using the fast mode. From the doc (SR means “snapshot rendering”):

This is with a real scenery, namely moving around in a power plant and using shadow mapping.

Snapshot rendering has some drawbacks, however, it can apply only on specific cases (see doc), that’s why we are currently working on another mode called compatibilityMode (engine.compatibilityMode): when set to false, we will switch to a mode where we record a bundle for each mesh we draw and we will reuse this cached bundle in subsequent frames.

We are trying to make this mode work as broadly as possible (so with as few constraints as possible for the user), the difficulty being to update the bundle in cache when necessary, but trying to do it as infrequently as possible (because rebuilding the bundle + drawing it is slower than simply drawing the mesh). It’s a work in progress, don’t try to enable it in production, you won’t see any difference and you will instead have rendering artifacts (depending on your scene).

If we are succesful with this compatibilityMode (which we hope!), we expect to have the same level or better performances than WebGL in real world scenarios.

8 Likes

Great explanation! Thank you!

An anecdote regarding perf using the latest Gfx APIs:

I know a friend who is working for a game company. They had to port their engine to DX12 because one of their target is Xbox Series X. So, a team of 2/3 people worked during several months to port their engine to DX12. In the end, when they checked the performances, they were lower than with their old engine… That’s because the philosophy of the new APIs (like DX12) is quite different from the old ones, you need more restructuring of the engine than simply replacing a call by another one.

That’s the same story with WebGPU because as you may know, the WebGPU spec is modeled after the current Gfx APIs like DX12, Vulkan or Metal to get the most out of them. Regarding Babylon.js, we can’t rewrite a new engine specific to WebGPU so we have to try to fit the new API into the current design of the engine, using everything we can to get the best perf as possible (that’s why, for eg, we use bundles as a mean to encapsulate several API calls, even if they are not originally meant for that).

1 Like