Questions of Thin Instances v.s. regular Instances

According to my understanding, compared to regular instanced meshes, Thin Instances store all instance data in several Float32Array buffers, which reduces overhead on the JavaScript side and cost of loop of all mesh objects. Is this correct?:thinking:

However, the official documentation mentions two drawbacks of Thin Instances:

  1. The “all or nothing” rendering mechanism: either all are shown or all are hidden.

  2. High cost for adding/removing: adding or removing a thin instance is much more expensive than with InstancedMesh.

The “all or nothing” rendering mechanism raises a concern for me: Are these thin instances get frustum culled? If so, how? From the source code of _evaluateActiveMeshes, it seems if they jump the function there should not be frustum culling. There are also a forum post mentioning “You should trade more work on the GPU side (because some meshes may be sent to the GPU that would have been culled earlier) than on the CPU in that case”. But I also found a 2 years later answer saying “By default, thin instances are frustum culled”.
Does this mean that the “all or nothing” mechanism refers to all thin instances being submitted to the GPU (logically all visible), and then frustum culling is performed on the GPU side?:thinking:

The other drawback is the high cost of adding/removing thin instances. I understand the cost of removal: since the buffer is continuous, removing a thin instance in the middle requires moving all subsequent data, so we may need to rebuild the buffers each time. But for adding, can’t we just append to the end of Float32Array instead of rebuilding? We can even pre-allocate a buffer larger than needed, as the documentation suggests.
Or does the “high cost” refer not to the JavaScript side, but rather to the internal operations of thinInstanceSetBuffer (maybe recreate and copy?) and the cost of uploading to the GPU? :thinking:

If anyone can clarify these points (in-depth explanation from the engine’s perspective is better!), it would be greatly appreciated! :grinning_face:

The trick I use to delete a thin instance is to copy the last matrix over the deleted matrix’s position and reduce the thinInstanceCount by 1. This is very quick.

You may be able to use thinInstancePartialBufferUpdate on the newly moved matrix position for just a small data transfer from CPU to GPU. It is typical for many of the non-full-update methods to copy from beginning to end of the source array (or end of destination buffer). In those cases, make sure your source is a TypedArray and define an overlay typed array to contain the source offset and length of data you need to move. The “offset” parameter of thinInstancePartialBufferUpdate refers to the destination offset only.

Thank you for your solution! So does this mean the “high cost” do come from CPU2GPU, rather than javascript array management?

And, what about the first question? Whether frustum culling for thin instances exists?

Two thin instances are thought one mesh(the two thin instances both join the computation of the mesh’s world matrix) when passing _evaluateActiveMeshes while two instances are thought as two meshes, that’s why “all or nothing”.
When the mesh of thin instances is not culled, all thin instances will move to GPU, even if some of them may should be culled.