A little background information on our application, and what we’re trying to enhance. We have a scene which is made up of thousands and thousands of meshes. These objects are being compressed into fewer meshes via the well known MergeMesh call. On top of this, we’ve taken the lovely Solid Particle System, and used this to manage operations (such as changing materials) we need on the particles. On top of this, we’re careful not load all the scene content at one time by prioritzing content closer to the camera. This implementation works very well for us.
We’ve been interested in enhancing some of the performance of our implementation, and in doing so, we’ve pinpointed 3 areas where performance enhancement could occur:
- The time taken to produce the meshes. Note, we are doing the mesh creation on the fly in other web workers, and then pushing the merged mesh content back to the main thread.
- FPS performance.
- Memory consumption of the scene content.
Since our scene is typically made up of quite a few duplicate shapes - for example, let’s just consider that we have many simple cylinders - we’ve been interested in investigating the usage of instanced meshes. We know the restrictions of instanced meshes, first and foremost, and I suggest anyone who is learning about them first start here: Use Instances - Babylon.js Documentation
Our desire is that the scene which typically has 10,000 or even 50,000 cylinders can benefit greatly from instances. Note, with our solid particle system implementation, we do many tricks with reducing the actual burden of the scene content. Our current implementation caps the number of vertices allowed to be present in the scene.
The hope is that instances will be able improve all three areas of performance noted above AND will allow us to load even more of the actual scene. Below, I’d like to detail a few of my observations, and to log a few questions.
-
Time taken to produce the meshes
Certainly creation of instances is super fast AND we obviously don’t merge any geometry. Check on number one. Instance creation is much faster than even cloning meshes. -
FPS performance
I’ve noticed that FPS performance for many instances (50k) is actually very slow. I believe this is occurring for 2 reasons:
The first is the frustum check. Under the hood, the engine checks to see which meshes (instances included) are in the view frustum. This is not all that time consuming on a few hundred or even a thousand meshes, but it seems this check does get bogged down for thousands and thousands of meshes… even instances. Sure, we can disable, or even abstract the frustum check’s precision. Since our application does it’s own frustum logic in a separate web worker to determine what content gets loaded into the scene, this is not too big of a deal for us. Let’s proceed with the assumption that frustum inclusion/exclusion can be disabled in our solution to get past this bottleneck.
The second fps hit is the computation of the world matrix for each instance. Anything beyond a few ten thousand instanced meshes (hardware dependent of course) will start to seriously bog down. Throw in the fact that this computation occurs every frame, and it becomes apparent that many instances are not practical. The reader will note that this calculation can be disabled for every frame; see freezeWorldMatrix().
The idea then followed that we can load in the 50k instances very fast, freeze their world matrices, and do some magic when the camera is moving. For example, when the camera moves, we can have a small number of cylinders “active”, and only on camera idle, do we start to calculate the world matrices for dirty cylinders. This might be feasible, but I’m noticing that even loading 50k instances into the scene, and computing their world matrix once will choke.
- Memory Consumption
This is a very appealing winning case for instances. Since we would no longer be cloning geometry thousands of times, it appears that instances are far superior over solid particle systems in this regard. Sure we have some extra baggage for every instance with respect to matrices; a burden that merged meshes do not have… but it seems that this pales in comparison to the duplicated geometry that comes with cloning meshes.
So, now here’s the question. Is there a clever way to overcome the fps bottlenecks that comes to mind when loading many instances into the scene? It seems instances are a bit of a catch 22… you want them for many identical (or scalably identical) meshes, but you can’t have too much of a good thing, or you’ll bog down the engine anyway. Please, if I’ve misunderstood any of my findings, by all means let me know. The above is my observations from performance profiling tools and test code.