Coding question: reducing draw calls of multiple instanced objects

Hi guys:

I’m not a coder by trade and I’ve run into a little wall of my own here (which I think its possible to do but I haven’t been able to get it done :frowning: ). So posting for help. In my game environment, I have a lot of different types of static meshes instanced and placed at specific coordinates. This is equivalent to,

a) cube with say X instances
b) cone with Y instances
c) sphere with Z instances
and so on and forth. Let n represent the number of types of items.

For each type of item with its instances, the draw call is 2. Therefore if my environment has 30 different types of items all with their instances, I’d get a draw call of 60. The problem here is that all these items are static props and a draw call of 60 or n x 2 will blow up pretty quickly. This will manifest as perf hit on fps for low end devices. And since they are static instanced assets, I thought it might be possible somehow to have all these consume only 2 draw calls instead. Which means to say, regardless of the value of n, the renderer/gpu will “see” every object as instanced from the n group of items and therefore use 1 draw call for all instances from said group.

So my question here is: whether its possible currently to do so in bjs or am I mistaken and this is better moved to the feature request section (My gut feeling is that its possible but I can’t seem get my head around it :frowning: ).

The Solid Particle System could better fit your needs, in particular an immutable SPS

More discussion at Instanced Meshes and Performance

4 Likes

I have tested sps, it does not give 1 draw call for all the instanced objects in my environment, which is what I’m asking.

SPS uses a single mesh and as such there should be 1 draw call per frame as I understand it. Not sure what you mean by

as SPS does not use instances.

Have you checked out the number of drawcalls in some of the SPS playgrounds in the docs.

Pinging @jerome but he wont be about before tomorrow.

1 Like

I got 2 draw calls for the sps mesh in my testing. I will need to redo the codes for sps again but I distinctly remember not getting any fps perf benefit which was why i ditched in favor of instances.

Yes, on my low end device, I’m getting generally 10-20fps (G210M driver). The fact that this device might be out of spec for webgl has crossed my mind but I’m not 100% convinced since it could run full world mmos at 30fps in its heyday.

Top is sps, bottom is instances. I think one of them, prolly sps, is consuming a lot of mem, my browser is sluggish as of writing. In the top right, you can see the perfs, while draw call is very nice for sps, the fps isn’t. And draw call is bad for instances but fps is slightly better and I can get 30fps depending on what the cam sees. Implementation are straightforward from the docs, (maybe I should try recycling, hmm).

1 Like

Are you running a single sps across your entire scene? I think I ran into this issue where most of the content of the sps was behind the frustrum, but couldn’t be culled because some small portion was in frame. Maybe chunk up your sps a bit?

1 Like

Project looks impressive. Hope you succeed in getting the performance you want whichever method you use :slightly_smiling_face:

Unless you have already tried these they may be worth a read

https://doc.babylonjs.com/how_to/solid_particles#sps-management
https://doc.babylonjs.com/how_to/optimizing_your_scene

Also I see you have 40 meshes in the SPS scene are you building just one SPS?

1 Like

@withADoveInOneHand Well, just for the quick test, I made a single sps for all the flora assets which contains 30 different meshes imported. Therefore in the pics, outside of the terrain, everything else is 1 sps. I’m not sure if this is the correct method, I could break the single sps into 30 separate ones, but I doubt the fps will improve.

@JohnK Thanks a lot for all the docs, help and advice! The screenshots are nowhere near what my regular working rig shows. I had to pull out a chunk of code for testing. Yes, as I said above, it is all 1 sps at the moment and I have tested with much of what was in the docs (freezeWorldmatrix, material freeze etc). perf diff have been minor.

fwiw and to anyone else interested, I did some hunting and believe that it is indeed possible to instance multiple meshes at once in 1 draw call via opengl ARB_multi_draw_indirect. But it seems like webgl2 does not support it ?

Almost always informative to run some profiling on a desktop. Chrome seems to be the best for that. Bottlenecks, when pronounced, are usually shared across platforms.

Sometimes you are not looking in the right place for the low hanging fruit.

2 Likes

Good idea but I literally have nothing to profile about. :frowning: createscene, importmesh, setup sps/instances, assign materials, render. Instead I went with a webgl2 conformance chk which ran for half a day and threw a whole bunch of errors.

  "failures": [
    "conformance/canvas/webgl-to-2d-canvas.html",
    "conformance/extensions/webgl-compressed-texture-s3tc.html",
    "conformance/extensions/webgl-compressed-texture-s3tc-srgb.html",
    "conformance/reading/read-pixels-test.html",
    "conformance/rendering/bind-framebuffer-flush-bug.html",
    "conformance/textures/misc/compressed-tex-image.html",
    "deqp/functional/gles3/shadertexturefunction/texturesize.html",
    "deqp/functional/gles3/transformfeedback/basic_types_separate_points.html",
    "deqp/functional/gles3/transformfeedback/basic_types_separate_lines.html",
    "deqp/functional/gles3/transformfeedback/basic_types_separate_triangles.html",
    "deqp/functional/gles3/transformfeedback/random_separate_points.html",
    "deqp/functional/gles3/transformfeedback/random_separate_lines.html",
    "deqp/functional/gles3/transformfeedback/random_separate_triangles.html",
    "conformance2/extensions/ext-texture-filter-anisotropic.html",
    "conformance2/glsl3/sampler-array-indexing.html",
    "conformance2/glsl3/tricky-loop-conditions.html",
    "conformance2/rendering/blitframebuffer-resolve-to-back-buffer.html",
    "conformance2/rendering/draw-buffers.html",
    "conformance2/rendering/framebuffer-render-to-layer-angle-issue.html",
    "conformance2/rendering/framebuffer-texture-changing-base-level.html",
    "conformance2/rendering/instanced-arrays.html",
    "conformance2/rendering/vertex-id.html",
    "conformance2/textures/misc/copy-texture-image-same-texture.html",
    "conformance2/textures/misc/tex-3d-mipmap-levels-intel-bug.html",
    "conformance2/textures/misc/tex-input-validation.html",
    "conformance2/textures/misc/tex-mipmap-levels.html",
    "conformance2/textures/misc/tex-storage-compressed-formats.html",
    "conformance2/textures/misc/tex-unpack-params-with-flip-y-and-premultiply-alpha.html",
    "conformance2/textures/canvas/tex-2d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/canvas/tex-3d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/canvas_sub_rectangle/tex-2d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/canvas_sub_rectangle/tex-3d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/image/tex-2d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/image/tex-3d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/image_data/tex-2d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/image_data/tex-3d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/svg_image/tex-2d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/svg_image/tex-3d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/video/tex-2d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/video/tex-3d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/webgl_canvas/tex-2d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/webgl_canvas/tex-3d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/image_bitmap_from_image_data/tex-2d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/image_bitmap_from_image_data/tex-3d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/image_bitmap_from_image/tex-2d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/image_bitmap_from_image/tex-3d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/image_bitmap_from_video/tex-2d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/image_bitmap_from_video/tex-3d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/image_bitmap_from_canvas/tex-2d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/image_bitmap_from_canvas/tex-3d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/image_bitmap_from_blob/tex-2d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/image_bitmap_from_blob/tex-3d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/image_bitmap_from_image_bitmap/tex-2d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/textures/image_bitmap_from_image_bitmap/tex-3d-rgb10_a2-rgba-unsigned_int_2_10_10_10_rev.html",
    "conformance2/uniforms/large-uniform-buffers.html",
  ],

Not 100% sure but it seems like the rendering and texture failures indicate the device doesn’t support webgl2 completely. I need a faster way to determine if a device running bjs is conformative but I guess that’s for another day.

Thanks all for input and help, I have to go bang my head at another wall, cheers!

Yeah, sometimes you can something too big for a device, which is too old / small.

Profiling is all about what happens in the render loop though, not so much one time setup, unless that is a problem. Profiling is how it was found that freezing world matrix calculations was a good optimization. It can also give you a break out of cpu vs gpu time.

One final note, your meshes could have way too much vertex detail. That is going to waste resources. Your textures, if any, could also be way too big. If you are spending a lot of your time in GPU, perhaps KTX gpu compressed textures may help. Without profiling, it is all just shots in the dark though.

1 Like

All good points. Imho, profiling is when you’ve got the prototype up. What I needed here was benchmarking. I wouldn’t have posted this thread IF I knew my device couldn’t support instances/particles/some bjs feature etc. Some urls to test my device against standard scenes would have made everything crystal clear. Even better if said pages had perf results against various commonly available devices. Shrugs, moot point tho, the team is as busy as it can already be…

1 Like

I’m curious how your vertex density affects you. Yours seems in the running for best “standard scene”.

I’ve been personally very cautious of complex materials, so interested in your experiences there as well. (Is your SPS a single multimaterial for instance?)

fyi: terrain is from here (PBR texture splatting (up to 64 textures)). I’ve moved away from pbrcustommaterial to custommaterial (pbr gains were marginal for use case). Around 24k verts total, largely square tiles.

30 types of flora assets: vertex count vary from 100 to max of 1200. total number of particles/instances is around 650 at the moment.

All flora assets share a single 1k diffuse texture with alpha only. As simple as it gets and non-final, so its png at the moment. On mesh import, all the flora assets are assigned the same material and then var SPS = new BABYLON.SolidParticleSystem('SPS', this.scene, {useModelMaterial:true});. Nothing complex and not using sps multimaterial. On my regular rig, I can get 60fps (instances or particles) without issue, tho my target is desktop only. Will have to keep monitoring as more meshes are introduced during gameplay.

Personally, I really would love to get all instances in 1 draw call as it would be tremendous down the pipeline. But we’ll see… Hope it helps.

When you create your SPS with the parameter useModelMaterial, it enables automatically the MultiMaterial support
https://doc.babylonjs.com/how_to/solid_particles#different-materials
So, from this moment, there’s may be more than one draw call to render the full SPS.

[EDIT] to improve the fps when using a very large SPS, just reduce the calls to setParticles() to the very needed level : not on all the particles if possible, not every frame if possible

1 Like

I’m not using setParticles, its an immutable sps.

var SPS = new BABYLON.SolidParticleSystem('SPS', this.scene, {useModelMaterial:true});
SPS.addShape(tree1, this.allTree1.length, {positionFunction: myBuilder1});
SPS.addShape(tree2, this.allTree2.length, {positionFunction: myBuilder2});
...
var mesh = SPS.buildMesh();
mesh.freezeWorldMatrix();
mesh.freezeNormals(); 
mesh.receiveShadows = true;
this.shadowGenerator.getShadowMap().renderList.push(mesh);

May I ask if instead of using useModelMaterial:true, I set mesh.setMaterialByID("myMatHere"); does it also invoke multimaterial ?

No, the MultiMaterial support in the SPS is activated only with the parameters enableMultiMaterial or useModelMaterial set to true.

1 Like