Having X thin instances is faster than having X meshes (or even X instanced meshes), but it’s not necessarily faster than integrating all the data directly into vertex buffers. The GPU has extra work to do when you use instantiation compared with not using it. It’s probably only a small amount of work, but in the end it can add up (you have over 2.5 million instances!). What’s more, the basic mesh is just 2 triangles. Perhaps with a larger geometry, the difference in performance would be smaller.
Thank you @Evgeni_Popov~ If this behaviour is as expected, mesh with thin instance rendered much slower than without it, I have to try some other approches to render large number of splats (quads).
Instances and merges are both ways to optimize rendering performance.
The advantage of instance objects is that they have a smaller memory footprint.More flexibility in operation.
Node merge theory should render faster, but take up more memory.
For example: Babylon.js Playground (babylonjs.com)
In this example, if 10+ meshes need to be rendered, the performance of the instance object will be better than node merging.
Thank you for the nice explaination @xiehangyun, I get it the advantage of instancing is draw call.
I was trying to render large number of quads (to simulate gaussian splatting) within a single batch, so I need something like geometry merging or instancing.
It is strange that Three.js instanced mesh (I thought it is the equivalant of babylon instancing) with same number of quads (2.5 millions) can still run with a full fps 60. Example here
And babylon instancing as mentioned above have fps 15.
This seems to be a problem with Angle and DirectX…
If you change Angle’s backend to OpenGL (chrome://flags/ => angle), you’ll see that you get 60fps. Similarly, using WebGPU as an engine makes the PG run at 60fps.
The problem is that Angle reorganizes our buffers and de-interleaves them, causing cache misses to skyrocket! We’ll have to find out why, and probably open a ticket on the chromium tracker…
Using OpenGL as ANGLE backend or using WebGPU did fix this problem! Thank you Evgeni!
It would be great if Babylon instancing could have similar performance with three.js on the default D3D11 backend.
I find out the performance gap is due to the flag STATIC_DRAW / DYNAMIC_DRAW on binding buffer data. Three.js use static by default while Babylon use dynamic.
Maybe we can default to dynamic=false and mark it a breaking change? It would be a small one and an easy fix for people that don’t pass a value and expect dynamic=true, but at least the fastest mode would be enabled by default?
The level of impact would depend on the size of the data and on how Angle reorganizes the buffers. The problem in the PG is that the data is quite big, and because of Angle reorganization, we experience cache misses for each instance. With smaller data, the effect may be less dramatic.
I don’t reproduce on my iPhone SE / Samsung Galaxy A23: the 3 variations (Babylon dynamic/static and Threejs fiddle) have the same performance.