CustomProceduralTexture + GPUParticleSystem: unstable performance

I was working on a prototype for a space scene with a CustomProceduralTexture for a skydome, plus some GPUParticleSystem instances for fogs/nebula.

The texture is a shader I adjusted using some dot noise 3D layers, and while in use alone, no performance problem was seem. The GPUParticleSystem I used with the standard “smoke_15.png” texture, just adjusting attributes, colors, to simulate different nebula density, in the end having less than 2000 active particles in the scene but still with a pretty decent visual result.

Here’s a PG showing it all together: https://playground.babylonjs.com/#IA9CYL#15

In my main system with a GTX 1660 Super, it’s fluid at 60fps with WebGPU rendering, or even with WebGL2. However, on some other setups friends tested, the performance varies a lot, reaching less than 15fps during quick camera movement in some cases.

So, if using WebGPU and GPUParticleSystem isn’t enough to deliver solid and constant rendering performance on multiple devices for a relatively simple scene, what else can I do the get better performance here? Or should I drop altogether some technique I used because is there a better approach for it?

Welcome aboard!

We’ve already experienced some weird things with fast mouse movements: is there a drop in fps even when using the keyboard instead of the mouse?

Looking at the PG, I think performance should be good on a wide range of devices. However, without more information (devices, screenshots from the Inspector stats panel, …), it will be difficult to help further.

Thanks for your reply @Evgeni_Popov, I’ll try to get more detailed information from the friends still.

Anyway, when it comes to the “weird things with fast mouse movements”… Is it something with the engine itself? Is there some documentation or code examples on how to deal with it?

No, it was not related to Babylon.js, it was more a problem either with the OS or the mouse (driver, …) I think.

@Evgeni_Popov, sorry to bother you citing directly…
But I got some detailed test results that be interesting to you:

Case 1

OS Windows 11
Hardware i5-10400F, GTX 1660 Super, 60hz screen
Idle: Runs at 60 FPS, no drops, around 35% GPU usage
Mouse/keys frenetic use: Keeps 60 FPS, no drops, around 45% GPU usage


Case 2

OS Windows 11
Hardware R5900x, RTX 3080, 144hz screen
Idle: Runs at 144 FPS, no drops, around 30% GPU usage
Mouse/keys frenetic use: Keeps 144 FPS, no drops, around 35% GPU usage


Case 3

OS: MacOS Ventura
Hardware: M1 Pro 8 Core, 120hz screen, running on power supply
Idle: Runs at 120 FPS, no drops, around 90% GPU usage
Idle looking straight to the nebula: Drop to ~115 FPS, around 95% GPU usage
Mouse/keys frenetic use: Keeps ~115 FPS, no more drops, around 95% GPU usage


Case 4

OS: MacOS Ventura
Hardware: M1 Pro 8 Core, 120hz screen, running on battery
Idle: Runs at 120 FPS, no drops, around 90% GPU usage
Idle looking straight to the nebula: Drop to ~110 FPS, around 95% GPU usage
Mouse/keys frenetic use: Keeps ~110 FPS, no more drops, around 95% GPU usage


While not terrible results, what are scaring us is this excessive GPU usage on M1 Pro hardware, and also the drops, while small, are apparently unjustifiable for a simple scene like this. We really wold like to understand better the possible causes.

Our assumptions for now are:

  1. Is it very low level related, regarding divergences on how WebGPU uses Direct3D 12 on Windows or Metal on macOS?
  2. Is it at the engine level, for example, how the custom shader is translated to WGSL?
  3. Is it within our choices, for example, the shader being used for the skydome stars to blink is too heavy?

Have you been able to test the same PG in WebGL2 instead of WebGPU?

It would be interesting to get the figures for WebGL2, as in your questions, I understand you assume it’s a WebGPU only problem.

What I can tell, is that on my system (Windows 11, i7-12700KF, RTX3080Ti), using your PG above in the starting camera position (display area is 1278x1200) and using PIX for precise measurements:

WebGL2:

  • a frame takes 2.935ms and the generation of the skydome texture takes 1.5ms (so, half the time of the full frame). It’s a theoritical fps of 60*16.666/2.935~340
  • the GPU is ~12% when not displaying the tab with the PG (Chrome displays an empty tab instead) and ~19% when the tab is active. Going fullscreen (2560x1440), the GPU is at ~23%

WebGPU:

  • a frame takes 2.154ms and the generation of the skydome texture takes 1.44ms (~66% of the full frame). It’s a theoritical fps of 60*16.666/2.154~464
  • the GPU is ~12% when not displaying the tab with the PG and ~15% when the tab is active. Going fullscreen, the GPU is at ~21%

So, regarding the 90% GPU usage for some of your test computers, that’s something I don’t really understand…

Testing in WebGL2 will allow to know if it’s WebGPU related or not (maybe there’s a problem in the Metal implementation of Dawn?).

[…] I’m not very familiar with Mac, but I know they are using very high resolution: could it be that you are running at 5k or 8k fullscreen resolution? It may start to be taxing on performance, because you are using several particle systems with alpha blending, so there are multiple fullscreen blending going on each frame…

[…] I tested on my computer when having the nebula fully visible in full screen (2560x1440), and it obviously takes more time: 5.107ms in WebGPU. That’s still only 1/3 the budget you have for a 60fps rendering. Also, my GPU usage is only 4-5% more than in the starting position of the camera (so, ~25%). That’s far from the 90% usage you experience on your M1.

However, to understand the impact of the nebula, here’s an excerpt of the PIX report:

These DrawInstanced lines are the particle systems which draw the nebula. As you can see, it starts to add, because each one takes 0.3-0.4ms, and the bigger one takes 1.3ms! Globally, they take 3.26ms over the 5.1ms of the frame.

You understand why it takes so much time when you look at the “PS Invocations” counter, which is the number of times the pixel shader is run (so, it’s the number of pixels which is written to):

  • I was running at fullscreen in 2560x1440, which amounts for 3.686.400 pixels (I put a separator between groups of digits to ease readibility)
  • the drawing of all the particle systems writes to 335.221.505 pixels! That is 335.221.505 / 3.686.400 ~90 fullscreen rendering each frame (with blending, which taxes performances more than opaque rendering)

So, the problem is probably with the lot of overdraws incurred by the particle systems. With bigger resolutions, you are going to be fillrate limited.

You can improve things if you use BLENDMODE_ADD instead of BLENDMODE_MULTIPLYADD for the blend mode: the latter renders two times each particle system (one to multiply and another to add), as you can see in the report above. You could also try to use less particles.

Think it could help adjusting anisotropicFilteringLevel on the textures too? thats a lot of read amplification lol

Yes, disabling anisotropic filtering would probably help too!

Another thing worth noting is that mipmap generation is enabled for the custom procedural texture. As it’s an 8192x4096 texture, this results in rather large mipmaps. Even if it’s not the most time-consuming thing, it’s always worth turning it off (6th parameter of the custom procedural texture constructor), as we (I) can’t really see the difference with or without mipmaps.

1 Like