How to optimize a scene with 10k mesh instances?

Hey everybody.
This thread is going to be a mix of me sharing what I managed to figure out so far and asking for advice to push the optimization even further.

Here’s a playground of a scene similar to what i got locally and it implements some of the optimization methods talked about below.

In our project we have a field of up to 10k mesh instances of a very low poly mesh created with createInstance(). It’s basically a box with a small bevel and very simple.
The blocks act as a playing field and allow characters to move block by block (like a chessboard).
The blocks all share a texture, but can have a different base color that mixes together with the texture to allow a tint that can change on runtime which is achieved by Instance buffers.

Thanks to using instances we only have 1 draw call for the entire field, but the frame rate was still really bad in the beginning.
On a beefy pc we barely got ~60fps when the entire field is visible and the camera is still. When moving the camera it drops to 40~50 fps.
On an Iphone 6 it was ~19 fps when not moving the camera and sub 10 fps when the camera is being moved. Ideally we’d like the game to run at stable 60 fps on pc and more recent smartphones. The field seems to be the biggest hurdle but some calculations, particle effects and such will also dig into performance a little bit later.

When looking at the chrome profiler I could see that the functions in the render loop which take too long and cause the frame rate to drop are evaluateActiveMeshes and renderForCamera as you can see in the image.

At first I tried using an Octree, however I found the documentation for this quite confusing and am not even sure if my implementation is correct.

  var octree = this._scene.createOrUpdateSelectionOctree();
  meshes.forEach(mesh => {

After implementing this the performance actually got worse and i lost about 20fps.
1. How do you properly implement an octree for static meshes and does it even make sense in a case like this where it’s one open field?

After abandoning Octrees I tried many methods that are listed in , however most of them had no effect whatsoever in my case. The only thing that seemed to lead to a slight improvement was mesh.freezeWorldMatrix(); which saved about 2 milliseconds.

2. Should these functions be run on the mesh after you load it in or on each instance of it after you created it? e.g. mesh.material.freeze(); mesh.freezeWorldMatrix();

I then stumbled upon this amazing thread which describes a situation which is pretty close to mine.
Using the following allowed me to halve the time everything takes and have smooth 60 fps on pc even when moving the camera!

  meshes.forEach(mesh => {
    mesh.alwaysSelectAsActiveMesh = true;

However on smartphone it crashes when creating the mesh instances

I made a Google doc where I make a before and after comparisons for each optimization method and compare the performance in the profiler: Google Doc

Some additional questions:
3. When I create instances of a mesh does the performance decrease exponentially the more instances I have? (For example would a scene with 10k cubes run smoother than a scene with 10k cubes that have a bevel and a few more vertices?)

4. Do you guys think it’s even possible to get a scene with this many mesh instances to run on ~60 on mobile or is that a pipe dream?

5. One thing I haven’t tried yet is SceneOptimizer, would it make sense in a case like mine?

If your instances don’t move, you can freeze all their world matrices and set all them as always active.
If they move but you don’t care about the frustum culling, you can set them as always active and compute by yourselves their world matrices like explained here : Use Instances - Babylon.js Documentation

Try also to not make a loop on the 10K instances each frame if possible, just update the required ones.

Sometimes also for low poly models, the SPS can really compete with instances. Worth a check.

1 Like

First, thanks a lot for sharing!!!

Let me start from the end :slight_smile:

I am pretty sure it will do nothing in your case. The scene optimizer is great for auto-merging meshes, reducing shadow quality and material properties, but in your case the low FPS is coming from the number of meshes (which cannot be merged, i assume, as you need them all to move freely).

The results of your tests are very much what I expected - the only thing that really helps in this case (as you don’t have any fancy features) is freezing the world matrix (which will prevent 10K calls of compute world matrix) and making sure all meshes are selected as active meshes (remove the need to analyze so many meshes and evaluate whether or not they are in view).

There are a few downsides to both methods thou, and I think managing those two will actually make everything work much better:

  1. Freeze world matrix

When you freeze the world matrix you expect that this mesh will not be moving ever during the course of the game, but it is obviously not the case. You could unfreeze the matrix of meshes under the mouse pointer (for example) or for a selected mesh (on mobile). This way you can move some, but other save you those important 10 FPS that you are getting

  1. Active meshes

The reason ( i guess) your phone crashed is the amount of memory needed to constantly send all meshes’ (meta)data to the gpu (only an uneducated guess :slight_smile: ). The evaluate active meshes function is the one responsible to remove unneeded meshes from the rendering process - only visible meshes are rendered. The problem is - it takes way too long to generically analyze 10K meshes. what you can do, however, is come up with a mathematical way of analyzing this. You know the position of the meshes, you know the camera’s fov, you know the fixed size of the meshes. This way you could calculate which mesh is active and which is not, and set those to be true (and the other to be false). My guess - this will probably allow this optimization to run on mobile.

1 Like

An octree is useful only if your geometry is at least moderately complex and span a large area in your scene. If all / most of your meshes are always visible it is of no use, and the additional time to traverse the tree will actually make your fps worse, as you could see.

You can call mesh.material.freeze() right away, even if the mesh is not loaded yet. The system will really freeze the material only after it has been fully generated (meaning when the underlying effect has been correctly created).

If you freeze the active meshes, call the function by passing it true:


That way, computeWorldMatrix won’t be called on any mesh without you having to call mesh.freezeWorldMatrix on all meshes beforehand (also, small performance gain as you will avoid a loop on all meshes inside scene._evaluateActiveMeshes).

By doing so, if you want to update a mesh afterward, just update the position/rotation/scale properties and call mesh.computeWorldMatrix, no need to call unfreezeWorldMatrix / freezeWorldMatrix before/after.

I don’t think so, as it is still a single draw call to the GPU. So it all depends on the GPU, you could double your instances for free if the GPU is fast enough to process the rendering in parallel to the javascript code needed to handle a frame. You will have to perform testing to know for sure, however.

No idea as I don’t know mobile dev. If freezing all meshes + all materials, the javascript code load is reduced to a minimum, so much of the time will be passed on the rendering GPU side.

Note that _evaluateActiveMeshes is called in the course of renderFromCamera, so the timing of both functions are linked.


Definitely the key here!

1 Like

Sounds great, (but you know I am going to now try to break it), what about when mesh has children? I am pretty sure computeWorldMatrix of children take into account the parent’s matrix when recomputing, but what about when the parent moves / rotates / scales?

Also, what happens if both the parent & child move in the same frame?

1 Like

mesh.computeWorldMatrix() does not recompute children matrices, but there’s no parenting involved in @Stephen’s PG.

If there is parenting, I guess one will have to update manually the matrices of all meshes involved, parents + children.


@Deltakosh, I am noticing that that _isWorldMatrixFrozen is protected inside of Node. I was wondering if a getter for this could be added, even though code base is frozen?

Reason is I could freeze all or some matrices. If in my animation system, rotation / position is changed AND it is frozen, it could just calc a new one. Might be useful for others as well, to not to have everything hardwired at the application level, and just do it in a lower level when it can be detected it needs to be done.

edit: This would be my backup if too late for 4.1:

if (this._isMesh && this._node["_isWorldMatrixFrozen"]) {
    (<BABYLON.AbstractMesh> this._node).computeWorldMatrix(true);
    for (const kid of this._node.getChildMeshes() ) {
        if (kid["_isWorldMatrixFrozen"])kid.computeWorldMatrix(true);
1 Like

There is no _isWorldMatrixFrozen on node but on TransformNode and on TransformNode you can use the isWorldMatrixFrozen getter :wink:


Thanks again for the awesome replies!

I first tried implementing these methods with a level editor I build and I guess something else in that project also slowed down the phone and let it crash.
When I made a completely new scene which doesn’t have anything other than the blocks it actually managed to run at 60fps on more recent phones.
I still believe that it might be hard to keep that going once particle effects and other features are implemented, so we might need to reduce the size of the field.
But even then using these optimizations will free up some resources to spend on other things!

1 Like