Mesh with thin instance slower than with full vertices?

xiasun · December 27, 2023, 12:16pm

Hello all,

I am trying to imporve the speed Gaussian Splatting, where I find the usage of thin instance a problem.

To simplify the problem, I tried to render certain number of quads with or without thin instance.

If I put geometry info of these quads whithin a single mesh, the fps is 60.
Draw 2 million quads with single mesh

While using thin instance will have much lower fps around 15.
Draw 2 million quads with thin instance

I wonder if there is any problem with my usages? I thought thin instance should have better performance. Many thanks!

Evgeni_Popov · December 27, 2023, 6:39pm

Welcome aboard!

Having X thin instances is faster than having X meshes (or even X instanced meshes), but it’s not necessarily faster than integrating all the data directly into vertex buffers. The GPU has extra work to do when you use instantiation compared with not using it. It’s probably only a small amount of work, but in the end it can add up (you have over 2.5 million instances!). What’s more, the basic mesh is just 2 triangles. Perhaps with a larger geometry, the difference in performance would be smaller.

xiasun · December 29, 2023, 1:44am

Thank you @Evgeni_Popov~ If this behaviour is as expected, mesh with thin instance rendered much slower than without it, I have to try some other approches to render large number of splats (quads).

xiehangyun · December 29, 2023, 2:58am

Instances and merges are both ways to optimize rendering performance.
The advantage of instance objects is that they have a smaller memory footprint.More flexibility in operation.
Node merge theory should render faster, but take up more memory.

For example:
Babylon.js Playground (babylonjs.com)
In this example, if 10+ meshes need to be rendered, the performance of the instance object will be better than node merging.

Millions of quads within one single mesh | Babylon.js Playground (babylonjs.com)
See, in this case the availability of the instance object is higher than the merge.

xiasun · December 29, 2023, 3:44am

Thank you for the nice explaination @xiehangyun, I get it the advantage of instancing is draw call.
I was trying to render large number of quads (to simulate gaussian splatting) within a single batch, so I need something like geometry merging or instancing.

xiasun · December 29, 2023, 3:54am

It is strange that Three.js instanced mesh (I thought it is the equivalant of babylon instancing) with same number of quads (2.5 millions) can still run with a full fps 60.
Example here

And babylon instancing as mentioned above have fps 15.

@xiehangyun @Evgeni_Popov

Evgeni_Popov · December 29, 2023, 10:59am

This seems to be a problem with Angle and DirectX…

If you change Angle’s backend to OpenGL (chrome://flags/ => angle), you’ll see that you get 60fps. Similarly, using WebGPU as an engine makes the PG run at 60fps.

The problem is that Angle reorganizes our buffers and de-interleaves them, causing cache misses to skyrocket! We’ll have to find out why, and probably open a ticket on the chromium tracker…

xiasun · December 31, 2023, 7:02am

Using OpenGL as ANGLE backend or using WebGPU did fix this problem! Thank you Evgeni!
It would be great if Babylon instancing could have similar performance with three.js on the default D3D11 backend.

sebavan · January 2, 2024, 7:25pm

@Evgeni_Popov any workaround possible on our side to match perfs ? while awaiting for the fix ?

xiasun · January 3, 2024, 2:34am

I find out the performance gap is due to the flag STATIC_DRAW / DYNAMIC_DRAW on binding buffer data. Three.js use static by default while Babylon use dynamic.

/// thinEngine.ts
public createVertexBuffer(data: DataArray, _updatable?: boolean, _label?: string): DataBuffer {
    return this._createVertexBuffer(data, this._gl.STATIC_DRAW);
}
public createDynamicVertexBuffer(data: DataArray, _label?: string): DataBuffer {
    return this._createVertexBuffer(data, this._gl.DYNAMIC_DRAW);
}

After adding a true as staticBuffer param to thinInstanceSetBuffer() usage, the fps raised to 60. see PG

quad.thinInstanceSetBuffer("matrix", matricesData, 16, true);
quad.thinInstanceSetBuffer("color", colorData, 4, true);

sebavan · January 3, 2024, 4:46pm

@Evgeni_Popov I wonder if this could impact in a lot of other places ???

Evgeni_Popov · January 3, 2024, 7:42pm

Maybe we can default to dynamic=false and mark it a breaking change? It would be a small one and an easy fix for people that don’t pass a value and expect dynamic=true, but at least the fastest mode would be enabled by default?

sebavan · January 3, 2024, 8:02pm

I am just so surprised of this level of impact knowing it is only a hint.

Is there a way to validate/repro on all platforms ?

And agree if it is as we speak, we should default to fastest

Evgeni_Popov · January 3, 2024, 8:45pm

The level of impact would depend on the size of the data and on how Angle reorganizes the buffers. The problem in the PG is that the data is quite big, and because of Angle reorganization, we experience cache misses for each instance. With smaller data, the effect may be less dramatic.

I don’t reproduce on my iPhone SE / Samsung Galaxy A23: the 3 variations (Babylon dynamic/static and Threejs fiddle) have the same performance.

Pryme8 · January 3, 2024, 11:02pm

This is an awesome thread. I’m glad someone is deep diving into this.

Topic		Replies	Views
Thin instances from a merge of meshes VS single meshes thin instances? Questions performance , thin-instance	2	631	February 27, 2023
Weekly Video: Faster Scenes and Smaller Scene Graphs with Thin Instances Announcements	12	839	August 16, 2020
Problem of Thin Instance Questions thininstances	5	64	August 28, 2024
Mesh rendered much slower with geometry at close position? Questions lighting , mesh , rendering	2	458	January 3, 2024
Mesh Instances vs Merged mesh Questions	7	961	August 8, 2022

Mesh with thin instance slower than with full vertices?

Related topics