Perf comparison of different strategies for creating mesh UIs

Testing PG - https://playground.babylonjs.com/#GJ69CX#60

No specific question; posting this to get any feedback from the team and for the community’s general information in case it helps. I wanted to look at the performance of various strategies for creating a UI attached to a mesh. My simple test case has two disc icons attached to a simple disc mesh. For my use case, I needed to be able to independently show or hide each of the icons. The strategies I looked at were as follows:

  • Using meshes - simply create a mesh for each icon
  • Using GUI - use the BJS GUI to create both icons
  • Using Dynamic Texture - draw the icons onto a dynamic texture on each mesh
  • Using a static texture - draw the icons onto a canvas, then use getImageData to extract raw image bytes from the canvas and create a raw texture to apply to the mesh. Take advantage of the fact that there are few combinations of icons so cache any created textures and use the existing texture instance whenever possible versus creating a new texture.

This is the test setup:

For each of these, I tested with and without cloning of the meshes (using instantiateHierarchy). Each test I reloaded the PG to get a clean instance, captured perf for 30 seconds, exported to CSV and then computed the average frame time and GPU time. Results are below:

Strategy Frame Time (ms) GPU Time (ms)
Meshes (no clone) 2.57 3.87
Meshes (clone) 2.39 3.87
GUI (no clone) 2.1 2.81
GUI (clone) 3.2 2.81
Dyn Texture (no clone) 2.04 5.63
Dyn Texture (clone) 1.8 2.75
Static Texture (no clone) 2.12 2.49
Static Texture (clone) 1.81 2.48
Static Texture (clone, freeze materials) 1.47 2.46
Dyn Texture (clone, freeze materials) 1.37 2.76

Some takeaways:

  • I was surprised to see that the GUI cloned was worse than the GUI not cloned and the worst overall. Initially I had found some issues with GUI cloning but @carolhmj fixed those and though it is much better, there still is a significant perf hit to cloning ADT’s currently, unless I’m doing something wrong.
  • Very little difference between the dynamic texture approach and the static texture approaches. I assume that under the hood, BJS Dyn Texture follows a similar approach to what I did in the static, i.e. gets the raw image data whenever one of the context methods is called, so in steady state, it is no different.
  • Freezing materials makes a big difference, at least with this many instances of a mesh. Previously I had experimented with freezing materials and found little difference, but this was a much more rigorous test.

So the winner for me is Dynamic Texture. It is the same performance as static and easier to implement. I believe I can apply the caching approach I did with static textures for dynamic textures and possibly get some additional improvement.

2 Likes

Very interesting. Thanks a lot for sharing. :hugs:
I’ll bookmark it and wait for the feedback. Especially for the case of cloning the GUI.
One thing I notice is that the GUI without clones performs not all too bad (better than I thought it would).
If you could expect the same sort of impact when cloning the GUI than the one with dynamic or static textures, it could become an alternative in most cases. Of course a hundred is already a lot.
Well, I’m happy to see that for a similar use case (about 70), I selected what seems to be the best approach with the use of dynamicTexture :smiley:
And in fine, my GUI for meshes for more complex interactions and design and for a total of about 25 in my scene (non-cloned) does not hit the performance all too much.
Now waiting on the debrief for adt.clone…

Very interesting results, thank you for sharing! Just to get a clearer picture, did you start to capture performance after all the meshes were cloned, or did you capture during cloning too? The clone method does serialize the entire content into a JSON then parses it again, but I think it’s worth diving deeper into performance for it.

I captured after everything was cloned. The scene with 100 meshes up pretty quickly and I captured 30 sec of perf after the scene perf stabilized, typically about 30 sec.

Using the dynamic texture approach I did some additional testing. These were the additional strategies I tested:

  • Freezing - add the freezing materials optimization for the main mesh materials and the UI materials
  • Freezing and Texture Caching - Additionally cache textures for each combination of enabled icons and assign the same texture instance to UI instances with the same set of enabled icons.
  • Freezing and Material Caching - Similar to texture caching, but cache and reuse the material instead.
  • Instanced UI Mesh - Create a unique mesh for each combination of enabled icons and create an instance of it for each UI that has the same set of enabled icons.
  • Cloned UI Mesh - Similar to the instanced case, but use cloning instead of instancing.

The results for these approaches were:

Strategy Frame Time (ms) GPU Time (ms)
Freezing 1.37 2.76
Cache Texture 1.3 2.83
Cache Materials 1.2 2.83
Instanced UI Mesh 1.41 6.47
Cloned UI Mesh 1.14 2.82

Some takeaways:

  • Freezing materials makes a big difference, at least with this many instances of a mesh. Previously I had experimented with freezing materials and found little difference, but this was a much more rigorous test.
  • Caching also makes a difference, though not as significant as I thought which tells me BJS is very efficient with it’s handling of multiple instances of a material or texture.
  • Surprised to see instancing perform much worse than cloning and at a very high GPU cost. There must be some substantial overhead to instancing and you must need a lot of meshes for it to provide a benefit.
1 Like

Hello again! I dug a bit more into the GUI cloning example, and found out some things. First, is that the “cloning” case is creating some unnecessary textures. Here’s the number of textures in the cloning case:
image
and in the non cloning case:
image

This will be partially fixed by merging this PR: [Texture] Add option to material cloning to not clone the same texture multiple times by carolhmj · Pull Request #13807 · BabylonJS/Babylon.js (github.com), but you still have to make adjustments to your code to guarantee that the cloning and non-cloning case are creating the same number of materials and textures, or else the comparison test won’t be accurate.

Could you share a playground with the instancing? The extra cost might be relative to frustum culling, but it’s hard to know without looking at the code.

1 Like

Sorry for the late reply. I was on vacation last week. Will test again once the PR is merged and ensure that the number of textures matches the expected texture. This is the test I used for instancing. Line 139 has the flag to control whether to use instances or not. I am going to be revisiting the GUI approach due to some interaction requirements (hover behaviors for icons in the UI) that will be difficult to achieve using a dynamic texture. If there is value, I can test using createInstance with the GUI to see how those numbers compare as well.

Updated test results for GUI cloning versus non-cloning. I did 3 30 sec tests for each, to address small differences I was seeing between runs.

Strategy Frame Time (ms) GPU Time (ms) PG
GUI no clone (run 1) 1.96 6.20 https://playground.babylonjs.com/?snapshot=refs/pull/13807/merge#NJQ3JC#108
GUI no clone (run 2) 1.88 6.16
GUI no clone (run 3) 1.89 5.93
GUI no clone (avg) 1.91 6.1
GUI clone (run 1) 1.84 5.74 https://playground.babylonjs.com/?snapshot=refs/pull/13807/merge#NJQ3JC#97
GUI clone (run 2) 1.79 5.79
GUI clone (run 3) 1.81 5.95
GUI clone (avg) 1.81 5.83

These results show that after the fix, the clone method is slightly faster than the no clone, which matches what we would expect.

1 Like

@carolhmj Any idea when we can expect a release with your PR? There have been a few releases since it was merged but it does not appear to be in the latest code.

that shouldn’t be happening, why do you say so? Are the test times different?

I didn’t see the PR in the release notes for any of the releases and the code wasn’t in the PG last time I checked. It’s there now, though. Thanks.

I posted this in my first reply but here it is again in case you missed it. https://playground.babylonjs.com/#NJQ3JC#27 Did you have any thoughts on the perf issues with instancing in this case? Can instances even have different materials? I suspect so since as I understand it they just share geometry.

Thank you for this effort. And these new results (although not as impactful as I would have hoped :thinking:) still show an improvement over the regular/non-cloned :sweat_smile: version. I suppose we’ll take every little bit of added performance we can get, won’t we?