Different meshes that share the same material, batching drawcalls?

Two things:

  1. Can I have different meshes that share the same material, batched into a single draw call?
    • Premise: It seems that more drawcalls = more bad for mobile CPUs even with scene.freezeActiveMeshes(), just a couple dozen nearly 1/3rds absolute FPS.
  2. How can I clean up my imports so I don’t have multiple materials, shaders, and textures floating around the scene in addition to multiple __root__ nodes?

Playground: https://playground.babylonjs.com/#STN960#1

Edit: Updated playground showing an example of 100 separate meshes that are thin instanced (I didn’t use 100 different shapes for this, but you should get the point): https://playground.babylonjs.com/#STN960#2

4 Likes

Hi douglasg14b,

It sounds like merging meshes might be the right approach for problem 1; there are some caveats and things to consider, but if you have lots of different meshes that share the same material it can be a good trick to try.

For problem 2, as this pertains to scene structure I don’t think there’s a one-size-fits-all solution: depending on what you want from your scene (what can move, when should and shouldn’t there be culling, etc.), the structure you need may vary. However, to simplify the process in your case, one thing to try might be having all your meshes that share the same material in one GLB instead of six. That way you’ll only import a GLB once (bringing everything in by default with a single root and a cohesive set of materials), and to get the different hexes and arrange them, you just reach into that GLB’s hierarchy, moving and cloning and deleting pieces as you desire. Hope this helps, and best of luck!

1 Like

Hi!

For #1: In my active scene I’m using thin instances for each of the hex tiles, these thin instances are regularly removed/added/moved based on user interaction, often individually (Ie. user click on an empty tile to add a road). Won’t merging them all essentially prevent me from doing this entirely?

For #2, I’m aiming to completely dispose of the import since I’m merging each import’s submeshes, and creating a new mesh. I keep track of the meshes I’m using in my own fashio better suited to the scenes needs. I don’t use the import after I have merged it. Is that possible?

In the meantime I’ll look into what I need to do to export them all as a single GLB.

Yes, if you need them to be dynamic then merging them isn’t really an option, and even thin instancing might be more costly than you want because adding and removing thin instances can be expensive. Regular instancing might provide a good balance, though it won’t give you quite as much speed in non-dynamic moments as thin instances will. Pretty much any way you slice it, rendering a bunch of meshes that all need to behave independently is going to have performance challenges, especially on mobile.

For your particular scenario, if you want to be able to render as many of these hex tiles as possible, I can think of two more things that might be worth trying, both of which have to do with the assets themselves.

  1. Split the hexes into pieces so that you don’t have geometry for a bunch of hex sides that can’t be seen because they’re right up against each other. Managing the logistics for this may not be simple, but it’ll save you a decent chunk of geometry, especially as your scene gets very large.
  2. Consider doing the tops all as one type of mesh controlled by thin instances and custom attributes. This would be very tricky and would involve the use of a custom vertex shader, but the idea would be to have every tile top be the same geometry so that ever tile is created at the beginning and they’re all one giant collection of thin instances, and making tiles appear, disappear, or change type would be a matter of changing the attribute which would tell the vertex shader to hide or show the appropriate geometry. For example, if you provided a custom attribute that was simply a number indicating which top was active for a particular hex (0 for none, 1 for pure grass, etc.), then the vertex shader might know to scale all vertex positions to 0 unless they belong to the desired top (designed by UV position, perhaps). Again, this would be pretty tricky and not very general, but it could in theory allow you to use a single collection of thin instances for all the hex tops.

There are even crazier directions you could go using custom shaders to move more and more of this work to the GPU; the point at which the speed that might give becomes not worth the complexity depends on your scenario. Regarding the extraneous root nodes, I didn’t realize they were empty before, but you should just be able to dispose() them if you no longer need them. Getting a huge amount of apparently dynamic stuff to render on mobile is going to be a challenge, but it’s a super cool challenge—especially since the experience you’re making looks pretty cool too!

1 Like

Thank you for the reply!

Let me provide you with more context:

First perf thread: [Performance] What is eating up frametime in this scene with only simple 3d hexagon meshes?

Thin instance animation: Can thin instances be animated? - #11 by douglasg14b

  • I’m using thin instances because normal instances are unworkable on mobile (Even with the usual perf optimizations). The performance degradation is cripplingly bad just having 500+ instances of a simple mesh just sitting around (500 tiles for example). We’re talking a couple FPS on older phones, almost entirely CPU bound.
    • Even after perf optimizations, they are still pretty borked, thin instances is the only thing that has brought big perf jumps aside from freezing the scene.
  • I’m not animating thin instances directly. I actually remove the thin instance (By updating the buffer), add a normal instance in it’s place, animate that, and then remove & replace it with a thin instance on animation end. The performance characteristics are almost entirely bound to the CPU cost of having normal instances on scene, the actual buffer updates are negligible, and are performed in an O(1 + 16) fashion.

Alright, that context out of the way, let me respond to your comment:

  1. This is a performance optimization I can make at a later date, I don’t have problems with tris/vertex counts right now, but cutting away sides that are unseen and only having tops is already in the books as a battery usage optimization.
  2. This is a good suggestion, but given the complexity and level if jank I would inherit (Plus I have zero exp on shaders, this would be quite difficult for me), I want to leave this on the table as a last resort.

Thankfully the stuff only needs to be dynamic when interacted with, if it was possible to batch thin instances in a way (So all thin instances from different parents that share a mat share a single buffer or something?) that would be workable as well, it would just take an extra layer of management on my end, but not that hard.

I actually made this sort of thing un unity first before ditching it for HTML/CSS/JS (My game is very UI heavy, babylon is just one particular part of the game), and had pretty good perf, though I do understand that JS is going to perform much worse than C# on that front.

Are there any in-built features or things I can change or setup that might enable this? Similar to Unity - Manual: Draw call batching (Specifically static batching, which seems similar to thin instances, except it can combine all geometry as long as it shares a material)

3 Likes

Other than the things you’ve already tried, I don’t know of more built-in features to look at the moment. I do believe a proposal kind of like what you’re describing came up recently in a discussion about Babylon React Native optimizations, but I don’t remember whether part of it was applicable to Web. @sebavan, do you remember that proposal about reusing some part of the thin instance logic for disparate geometries?

1 Like

Yup I do remember the idea but I am not sure it would be applicable there.

@douglasg14b I really enjoy the batching idea and I wonder if @Evgeni_Popov could have a crazy sleeve trick to have separate hierarchy targetting a kind of virtual ThinInstance system so that we would copy in all the required info (by copy it is even more share) ?

I don’t really see how to do that…

To me, Unity static batching is a lot like MergeMeshes, except they fill the vertex buffers each frame to account for the culling whereas if you use MergeMeshes you will use the same (full) buffers each time (but won’t incur the culling + filling vertex perf penalty).

I think you should try using MergeMeshes because even if you send more data to the GPU it could still be a win, depending on your scene.

[EDIT] Ah, forget it, you are using thin instances…

What I don’t understand is that if you are using thin instances, you don’t have multiple meshes with the same material? If you have multiple meshes with the same material and each are using thin instances, can’t you use a single mesh and put all thin instances on this one?

1 Like

I’m using multiple different meshes, as in different assets with different polygons (The playground in the OP shows an example of these meshes). But they all share a material. Maybe I’m doing something wrong in that I can use thin instances with multiple different types of meshes with a single parent?

Imagine you have a sphere, a cube, and a pyramid (and like 100 more shapes). They all share the same material. You make n# thin instances of of each to fill in your shape garden. They must individually be positionable, rotateable, intractable and addressable. Can they all share the same parent for thin instances, and/or can they all have their meshes merged? How would you get all the shapes to only require 1 draw call instead of 100+?

Edit: Updated playground showing an example of 100 separate meshes that are thin instanced (I didn’t use 100 different shapes for this, but you should get the point). Each column is a separate thin instance under the premise that it’s a different asset: https://playground.babylonjs.com/#STN960#2

They share a material like this, which is a texture with the color palette on it:

image

1 Like

Depending on your real scenario you may try to use for some optimization

scene.blockMaterialDirtyMechanism = true;
scene.cleanCachedTextureBuffer();
scene.blockfreeActiveMeshesAndRenderingGroups = true;
scene.skipFrustumClipping = true;
scene.clearCachedVertexData();
scene.freezeMaterials();

Here - https://playground.babylonjs.com/#STN960#4 the heap, fps and some other parameters seem better than in original example.

1 Like

Here’s a CompositeMesh class which is doing something along the lines of the “static batching” of Unity:

https://playground.babylonjs.com/#46PNW4#6

It will create some big vertex buffers for position/normal/uv to hold all the meshes+their thin instances you pass when calling addMesh. If you update the position/rotation/scaling of the mesh or the matrices of the thin instances (or thinInstanceCount) you will need to call update() to refresh the buffers, which can be slow if there are a lot of data to regenerate.

Of course, if you use a lot of meshes and/or thin instances, the vertex buffers can get very big. It’s a trade-off between using more memory/having better perf.

Using this class in your PG:

https://playground.babylonjs.com/#STN960#9

If you want the thin instance picking to work with the composite mesh, you must:

  • enable thin instance picking on the meshes that have thin instances
  • override the scene.pointerDownPredicate property so that it does not check that the mesh is enabled: when using the CompositeMesh class, the meshes you pass to the addMesh method are disabled so that they are not displayed by Babylon

See:

https://playground.babylonjs.com/#STN960#7

Regarding the update() method being slow, you could improve it by only updating the data that you know have changed.

4 Likes

Oh :heart_eyes:

I haven’t had my morning coffee yet, but a browse through the code shows that it’s rebuilding the separate meshes that are added in as a single geometry right?

I’m absolutely ecstatic that you came up with this. I’m going to tinker around after some coffee.


Any chance you could add some comments explaining why/what some of it is doing in each major step (Assuming the reader knows very little)? Specifically the update() and rebuildGeometry() functions.

If there is room in this to skip update() and modify the specific data, that’s something I can work on abstracting and building into this class. Perf optimizations are a pleasure of mine, and I’d love to iterate on this. However, I need to understand how/why this is working first before I can get rolling on that.

2 Likes

Hm, I noticed that when doing this with my meshes top-surfaces are black? What would you speculate is the reason?

Edit: It looks like the normals are a bit borked. They’re flipped? The underside & inside gets illuminated. :thinking:

Example: https://playground.babylonjs.com/#STN960#12

Edit2:

If I flip the x and y of the normals assignment it’s no longer borked, but generated meshes are borked.

                for (let i = 0; i < normals.length; i += 3) {
                    v3.copyFromFloats(normals[i + 0], normals[i + 1], normals[i + 2]);
                    BABYLON.Vector3.TransformNormalToRef(v3, mat, v3);
                    this._normals[normalIdx++] = v3.y; // Used to be v3.x
                    this._normals[normalIdx++] = v3.x; // Used to be v3.y
                    this._normals[normalIdx++] = v3.z;
                }

Maybe there is something wrong when exporting my models (Or a step I need to perform on import?) The normals look fine in blender.

The problem is that the hex tiles are loaded from a .glb file, meaning the geometry is in a right-handed system: the indices have to be inverted to convert to a left handed system.

I have updated the addMesh method to pass a flag that indicates a mesh has its geometry in a right handed system:

https://playground.babylonjs.com/#STN960#14

Note that meshes without thin instances are also supported in ComposeMesh, so you don’t need to do:

hexTilePrefab.thinInstanceEnablePicking = true;
hexTilePrefab.thinInstanceAddSelf();

I think the code is quite straightforward once you know the layout of the arrays used to build a mesh from scratch. This doc should help you:

The code is simply creating the arrays that hold the data (positions, normals, uvs) of all the meshes/thin instances you pass in:

  • rebuildGeometry computes the size of the arrays and create them (empty)
  • update set the data into the array by basically copying the data from the original mesh arrays. The positions/normals are transformed according to the mesh world matrix, and if the mesh has also thin instances we must factor in the thin instance matrix (meaning, we simply multiply the mesh world matrix with the thin instance matrix)
4 Likes

I’ve been mulling over this a bit re: update() performance.

If I understand correctly, the order of indices doesn’t matter, and by extension the position of an “instance” in the buffer won’t matter either?

I could key ranges for each of the meshes & their instances, and then when an instance is removed splice that range out of the vertex buffer, and add the new positions to the end.This would save quite a bit of processing, but is still an O(n) operation, which sets a floor on performance. (Assuming normal’s are also updated the same way)

If the mesh has been moved/updated, I can just update that range directly, which is extremely cheap.

  1. Does it make sense to slice out ranges for instances that have been removed, and add new instances onto the end. Will this work as expected, or does the order matter?
  2. Similarly, updating the positions of a instance mesh directly in the buffer will work as expected?
  3. Any ideas on how removing an instance could be < O(n)?

Yes, I don’t really see another way of doing it. Order does not matter, except if for some reasons you want mesh X to be displayed before mesh Y (in which case the data for X must come before the data for Y in the positions/normals/uvs/indices arrays). But normally the order is irrelevant.

Yes, in the case of updates you can simply update the positions/normals (only if you change the rotation of the mesh matrix) in the arrays (you won’t have to update the indices and not the uvs neither if you don’t modify them).

What makes you think slicing and adding the new data at the end is O(n)? I don’t think it is because under the hood it should only be two buffer copies (something like memcpy) which should be fast. The best way to know is to test :slight_smile:

1 Like