GPU SphericalHarmonics Creation

Precision: 4.xxe-6
Engine: WebGL2 only
Playground:
https://playground.babylonjs.com/#2FDQT5#3056
Performance:
First run very slow, but faster after warmup.


Use this to benchmark:
https://playground.babylonjs.com/#2FDQT5#3058

Reference a topic here:

1 Like

That is something that @sebavan will appreciate for sure

This is great !!! did you compare the result with the cpu version ?

Would be amazing to have a PR for this :slight_smile:

Yes there is, see devtools console of https://playground.babylonjs.com/#2FDQT5#3056, there is max diff and raw gpu result vs cpu result, and first run time.


The first run on gpu is slower than cpu mainly due to readPixels (texture data readback and forced gpu sync).
To workaround it there are 2 options:

  1. use pbo buffered, async readback, delay it for a few frames until it’s done (Edit: the async ver https://playground.babylonjs.com/#2FDQT5#3059 does not seems to help much)
  2. do not readback until serialization (to keep the serialization structure stable), and pass it as a texture to downstream shaders

Also instead of all these complex shaders and draws, webgpu with compute shader, storage buffer, and atomic add can hopefully do the whole thing in a single pass, and the readback is async, might be a more future-proof option while strictly limited to secure context.

maybe we could use it as a texture ?

Last I tried, I had the same readback issues.

But that could be breaking for downstream shader devs, also it could slow down the next readPixels

At least this could be used for batched/massive high-res hdr to env conversion

1 Like

Oh, wait, something strange happened, if the cpu part moved ahead of the gpu part, it became slower, and the gpu part became faster.
https://playground.babylonjs.com/#2FDQT5#3060

Edit:
It might be somewhere else that requires gpu sync, using the no-readPixels ConvertCubeMapToSphericalPolynomial directly shows a very different speed.
https://playground.babylonjs.com/#2FDQT5#3061

Edit2:
A dummy readPixels cost more than 1s, there must be something wrong with it.
https://playground.babylonjs.com/#2FDQT5#3062

It might wait for the gpu to flush the previous work before reading back ?

Yeah I think so, using gl.fenceSync to wait for gpu sync gives a similar result with this playground https://playground.babylonjs.com/#2FDQT5#3064 .


And profiler shows dropped frame without any cpu or gpu activity during fenceSync.