CustomProceduralTexture + GPUParticleSystem: unstable performance

Have you been able to test the same PG in WebGL2 instead of WebGPU?

It would be interesting to get the figures for WebGL2, as in your questions, I understand you assume it’s a WebGPU only problem.

What I can tell, is that on my system (Windows 11, i7-12700KF, RTX3080Ti), using your PG above in the starting camera position (display area is 1278x1200) and using PIX for precise measurements:


  • a frame takes 2.935ms and the generation of the skydome texture takes 1.5ms (so, half the time of the full frame). It’s a theoritical fps of 60*16.666/2.935~340
  • the GPU is ~12% when not displaying the tab with the PG (Chrome displays an empty tab instead) and ~19% when the tab is active. Going fullscreen (2560x1440), the GPU is at ~23%


  • a frame takes 2.154ms and the generation of the skydome texture takes 1.44ms (~66% of the full frame). It’s a theoritical fps of 60*16.666/2.154~464
  • the GPU is ~12% when not displaying the tab with the PG and ~15% when the tab is active. Going fullscreen, the GPU is at ~21%

So, regarding the 90% GPU usage for some of your test computers, that’s something I don’t really understand…

Testing in WebGL2 will allow to know if it’s WebGPU related or not (maybe there’s a problem in the Metal implementation of Dawn?).

[…] I’m not very familiar with Mac, but I know they are using very high resolution: could it be that you are running at 5k or 8k fullscreen resolution? It may start to be taxing on performance, because you are using several particle systems with alpha blending, so there are multiple fullscreen blending going on each frame…

[…] I tested on my computer when having the nebula fully visible in full screen (2560x1440), and it obviously takes more time: 5.107ms in WebGPU. That’s still only 1/3 the budget you have for a 60fps rendering. Also, my GPU usage is only 4-5% more than in the starting position of the camera (so, ~25%). That’s far from the 90% usage you experience on your M1.

However, to understand the impact of the nebula, here’s an excerpt of the PIX report:

These DrawInstanced lines are the particle systems which draw the nebula. As you can see, it starts to add, because each one takes 0.3-0.4ms, and the bigger one takes 1.3ms! Globally, they take 3.26ms over the 5.1ms of the frame.

You understand why it takes so much time when you look at the “PS Invocations” counter, which is the number of times the pixel shader is run (so, it’s the number of pixels which is written to):

  • I was running at fullscreen in 2560x1440, which amounts for 3.686.400 pixels (I put a separator between groups of digits to ease readibility)
  • the drawing of all the particle systems writes to 335.221.505 pixels! That is 335.221.505 / 3.686.400 ~90 fullscreen rendering each frame (with blending, which taxes performances more than opaque rendering)

So, the problem is probably with the lot of overdraws incurred by the particle systems. With bigger resolutions, you are going to be fillrate limited.

You can improve things if you use BLENDMODE_ADD instead of BLENDMODE_MULTIPLYADD for the blend mode: the latter renders two times each particle system (one to multiply and another to add), as you can see in the report above. You could also try to use less particles.