Dears and then @Evgeni_Popov ,
I’m facing an issue where I suddenly realized that I have an unreasonable number of draw calls in my scene. Point is I’m not ready to share the scene (I would like to keep with the WOW factor if possible ) and I cannot reproduce in a PG since it could come from anything in this fairly complex assembly using lots of alpha, multiple gui’s, cameras, picking with ray and colliders and stuff.
In particular, the main question I have regarding the current state of my WIP scene while just starting to look at optimization and finally cleaning this assembly before moving to the last steps, is:
“Why, when facing a wall (simple cube or instance) of an opaque mesh, I still get all the draw calls from everything that’s behind and is not in view of the camera?”
Whilst my second question would be:
“Why do I get two draw calls on some of my single opaque mesh or clone or instance (whilst on some others of the same type, I get only one)”.
I have to say that after spending nearly 2 days on dismantling the entire thing, this all still confuses me a lot and at this point, I guess I’'m gonna need to have an expert look at this.
@Evgeni_Popov Since you are my favorite hero and likely the most knowledgeable person in this domain, could I ask you to kindly have a quick look at my scene and try to orientate me towards a strategy and eventually feed-me with some of your insight for my 2 questions above. I know I’m asking a lot and it’s ok if you say ‘no’. Else, could I PM you the dev link to my scene?
Thanks for considering my request and meanwhile, have a great day
I’m no expert but I think that even if the geometry is not visible in frame, if the bounding box (or some portion of it) for that mesh is in frame then that counts as 2 draw calls (1 for the mesh, 1 for the material).
But I could be wrong. I’m curious to hear from the experts on this too
Yes, I understood the same and also had a look at the culling strategies. But this is somehow ‘not a satisfying answer/solution’. So (same as you) I’m sure willing to better understand this method, find the go-around and in case it does not exist, eventually understand/challenge on why this is?!
Ok, fair enough, may be. I could accept that since I believe some of these meshes do have a unique material and are not instances. I would need to check if this is always the case (but I have 2000 meshes/clones/instances to go through if I wanted to checked them all (clearly I don’t )
If the meshes are in the frustum of the camera, they will be drawn. The fact that there’s a wall in front of them is not relevant. If you want to avoid drawing the meshes in this case, you can have a look at the hardware occlusion queries mechanism:
If shadows are enabled, or if there are some effects (like glow or highlight layer) enabled, meshes can be drawn more than once and will generate multiple draw calls.
Spector can help you understand why your meshes are drawn multiple times.
Let me first take note of the above and put some additional efforts into it. If I don’t manage, I will gladly take your generous offer. Meanwhile, thanks a lot for the info and for your time looking into this and have a great day
My attemps with occlusion query was not successful. It appears that making a query on 2000 meshes, clones and instances with alphaIndex, depthPrePass and stuff does not really work. Meshes are flickering and in fine, the drawcalls does not really drop a lot. Though I have to say I have little to no understanding of how this is working so it’s likely I messed it up BTW: It seems to me that the link for the ‘advanced occlusion demo’ is the same one as for the ‘basic’ (from the DOC, featured PG)
With regards to the link shared by @MarianG (thank you) and the doc page for optimization, I did a number of attempts at these. I have added all those that do not mess-up the scene, like freezing meshes and materials, removing pointer move, blocking dirty mech, use geometry map id, material mesh map, etc… Altogether, all these have only little impact.
Considering the above, the numbers and my scene, I figured that my main issue is clearly around the 2k+ meshes, 300+ materials and 1k textures (although 80% of em being instances or clones) and then surely, the fact that the scene is mostly made from alpha blended meshes with alphaIndex. Since OIT did not work for me and neither does the performance mode
So I went back to my meshes (instances, clones and merges) keeping in mind the 1+1 mesh+material drawcall and finally made a finding which potentially can make a huge difference. I found that my mergeMeshes function I was just copying with all parameters for all uses had its parameter of ‘multimaterial’ set to true. And here comes my finding. It appears that even if all merged meshes (or clones) have a material and although this material IS THE SAME, setting multimat to true on merge, generates 1 additional draw call for each. As if all materials were different. Wheras, I was thinking (expecting) that being the same material for all, setting multimat to true would have no impact.
But it does, it definetely does. For this test, I merged 12 meshes (8 meshes + 4 clones) with multimat set to true and ended up with 14 draw calls. Then I did the same, setting multimat to ‘false’ and I get (guess what) …1 drawcall ! That’s right, a single draw call instead of 14.
In conclusion: Before likely coming back to you for the part I’m afraid to say I don’t have a faen clue of how it works and have you eventually help me putting the final touch of optimization, my duty now is to first clean this aspect, organize my meshes merges and instancing of hierarchy to drastically reduce the number of draw calls. It will likely take some time but expect me to return here when this is done.
And meanwhile, thanks (everyone) for your help, support and input on this.
Have ALL a great day
EDIT: I guess I forgot to mention something of importance regarding my finding on mergeMeshes.
Note that assigning a new (single) material to the merged group after merge does not change the number of drawcalls if the parameter of multimat on merge is set to true! Checked and confirmed.
Yes, occlusion queries can be tricky to use and can be effective only on specific cases (when you have large occluders that you know will occlude a large part of the scene whatever the position of the camera, like in a city with big buildings for eg).
Using the multi-material option in MergeMeshes is only useful if your meshes have different materials. If they all use the same material, it’s best to disable this option, as you’ve discovered!
It was a late finding. I would have liked to know it before. I mean, I probably did read the text from the doc multiple times but it seems like I’m recording only the parts that suit me at the time of reading
On the other, it helped me realize some other issues coming from the fact that I had no plan. I was just simply building day after day without really knowing where I was heading. In doing this, I also wrongly assumed that instancing everything possible was the safe option towards performance but in fine, this might just not be the case. Since the scene has multiple cameras and views where eventually all objects can be in the frustrum, instanciating single mini-meshes generates just too many drawcalls for little to no benefit when some of these objects are not in the frustrum. So, I guess now, with this understanding, I can plan a strategy which will also be a ‘compromise’ towards the use of resources from the different views of the scene. Time to start organize things a little and finally, make a plan
Thanks again for your support and likely… I’ll be back
Also, note that generally, on the Web, the limiting factor is the CPU, not the GPU. In this report:
As long as GPU frame time is below 16ms, if your FPS is below 60fps, it means your are limited by the CPU (so, Frame total is > 16ms).
You said you had 2k meshes in your scene, and it could be a problem, as those 2000 meshes must be looped over for culling, dispatching, etc every frame. Try to lower as much as possible the number of meshes the system must deal with every frame (and that will also decrease the number of draw calls at the same time!).
Yes, that’s what I will be working on now.
On the other, actually I got these numbers (SS below). Although these results are ‘twisted’ because they come from my relic of a 2008 mac pro with 1GB GPU mid-range from 2013. CPU is not a problem because it’s still an 8-core just with slow memory. I use this rig to test my BJS scenes thus making sure that the scene will run even on the lowest rig reasonably still in service On my MBP 2019 and my two other rigs less than 3y old with a 4GB+ GPU, the numbers of course are very different. But I still do not achieve constant 60FPS (likely due to this issue of the number of meshes and drawcalls). Since, I avoided to use everything from gl, hl and even shadows and I have just two lights. So apart from the alpha, the multiple gui’s (for meshes + 3D + FS) and the large number of meshes, materials and tex, there isn’t much that can impact perf I believe.
Edit: SS below from MBP 2019, 4GB Radeon pro 5300M. The resolution is @retina (size of screenshot, 3584x2240px). Avg FPS with all or most object in frustrum is ~25 (It was around 40-45 before I created the main sphere object featuring 60 thumbs and stuff). I might have just hit a limit with this.
Yes, in this case it’s clearly a problem with the GPU, but that’s because there are too many meshes, which lead to too many draw calls.
Also, if blending is enabled for all materials, it can suck a bit of performance. You should try to simply disable blending on all materials to see the impact of the setting.
Regarding instancing, try to use thin instances over InstancedMesh as much as possible, as thin instances are way faster than mesh instances on the CPU side (on the GPU side it’s the same thing, as all visible instanced meshes will be grouped to generate a single draw call).
Thanks for all the info. I’m continuying my journey with optimization (a long journey as it seems )
Yes, well one of the issues in this scene is the numbers (of possibly instanciated objects). If I would have 200 or even 50 and randomly placed or placed with Maths, there would be little questionning about the best option. But I have rather groups of 7,10 or 12 and eventually the entire group is next instanced from the hierarchy. It’s all small groups, mostly manually placed and with some needing cloning because of either animation and/or material (or would require to create a map for matrix and at this ‘late’ stage, would likely have my head banging against the wall
Though, quick update:
I made some other late findings. Fact is, I have some ‘interiors’. Toilets and showers and lockers, that can only be seen and interacted with when inside these closed spaces. So, since ‘occlusion’ doesn’t seem to be an option for my use case, I’ve been looking at a work-around. I think (I hope) I found a suitable approach working through intersects. I create a box-like intersect object to detect intersection with the camera (better said the parented to the camera mesh). Then, I show interior objects on intersect detection and hide them when not intersecting. Here we are talking about 400+ objects and drawcalls when all are in frustrum. This is currently wip but I have high hopes for this approach. I hope you won’t give me an information that will break my expectations, do you?
Else, apart from working on heavily merging meshes and reducing draw calls, I also figured that I had implemented a ‘draft’ collision detection. Basically detecting collisions on all meshes and just excluding some. So, I’m currently working towards a collision map which should drastically reduce the number of colliding meshes and I hope this will also help towards the time needed between each frame, will it?
Thanks for following and for your continuous and invaluable support and meanwhile, have a great day
I’m gonna mark the above as a solution although there’s no single solution/answer to this.
I’m still working on the entire thing and (again) made some findings on the way, this time regarding the ‘inter-frame’ load, increased by redundant calls on animations and stuff.
Quick update: I’m done now with my ‘rooms/interiors cleaning’ phase and got some very good results on the number of draw calls in most situations (30-50% less draw calls). At this point, I’m still wondering (QUESTION) if it’s better to make the hierarchy notVisible or not Enabled while having to get all descendants to set the state OR simply push the parent out of the maxZ of the camera? May be @Evgeni_Popov you could once again enlight me towards this?
Else, as per the title of the post, and for anyone reading this, I will try to resume some of my findings:
DRAW CALLS in a scene can be added to the original single draw call of mesh or instance by:
Post-process. Anything in PP that creates another pass in the depth renderer such as gl, hl, glow
Depth renderer. A custom depth renderer to sort objects in scene
Showing bouding-boxes (one of my very late findings when I suddenly realized that showing the bouding box of mesh in the Inspector to help me identify the mesh actually creates an additional draw call)
Merged meshes with multimat set to ‘true’ (even though all meshes would use the same material)
… And then the list most probably extends.
For the sake of others finding their way around this (rather complex) topic, I’m also gonna link this post with this previous one:
At this moment, as a conclusion - and whilst I still have loads to learn about perf gain and scene optimization - my thought are that:
There is no single, one-fits-all, way of optimizing a scene.
I also think that in most cases, the strategy and the methods used are “a compromise”. I.E. Older and low-perf GPU’s eventually deal better with clones rather than instances (up to a certain amount). I.E It might be better to try balance the FPS between the case where all objects are in frustrum or only few. Seeing FPS drop from 200 to 15 or the other way round doesn’t feel like a good experience to me.
Multiple calls on pseudo scene-optimization features with exceptions on each frame might not always be a good method (and could even harm perf). Colliders and complex checks on runtime should also be avoided if possible.
It’s sometimes better to have less meshes (even a bit bigger) rather than lots of instances.
There seem to be kind of a limit to avoid with the number of meshes and drawcalls (above which normal/average GPU perf will start to drop quite dramatically).
Of course, not discussed here - the load on textures and materials also needs to be considered…
So, ‘Yes’, a fairly complex topic for a simple designer like me… and to be honest, one I do not really enjoy working-on. Though, I agree it is of great importance to the overall experience - reason why I will continue to work it until I get at least a reasonable quality towards this aspect.
Again, Thanks (Everyone) for your insight and support on this.
It’s better for performance to disable a mesh (mesh.setEnabled(false)) because there’s an early test in scene._evaluateActiveMeshes to skip all processing if a mesh is disabled.