Mission to improve render performance

mise · September 3, 2020, 12:17pm

Hi all,

I’m almost ready to go beta with my game tuggowar.io and finally have time to work on the rendering performance (the goal is to get at least a 4x boost).

For this post I’m looking for ideas on where to look next.

How I’m measuring performance
I boot into a performance test which prints out some stats:
#/s | ms
updates 30 0.81 (capped to 30, where meshed get created, removed and positioned)
renders 52 2.93 (just calls to babylon render(), so that is: 52 renders/s(fps), 2.93 ms per render call)
meshes 111
render time / mesh: 0.026334

Of course, in the full game there can be 4x as many meshes in the scene and on my Samsung S7 I currenly get 35 renders/s on average, which is the biggest issue.

Info & what I’ve done so far

The camera is static and in orthographic mode, meshes move around on 3 axes but are flat and (all polygons) are always facing the camera
There’s only ambient lightning
There is a lot of text, so a lot of meshes with opacity textures
Almost everything you see is it’s own mesh (a card for example is already around 10 meshes)
Most rounded things are actual round meshes because opacity textures on square meshes performe worse
I’m caching a lot of generated Meshes and Textures
mesh settings:
mesh.convertToUnIndexedMesh();
mesh.freezeNormals();
mesh.cullingStrategy = AbstractMesh.CULLINGSTRATEGY_BOUNDINGSPHERE_ONLY;
mesh.doNotSyncBoundingInfo = true;
mesh.ignoreNonUniformScaling = true;

Instances
My first thought was, for cached meshes, to use .createInstance() in stead of .clone(), expecting the rendering to become faster.
Unfortunately this has the opposite effect of what I expected:
updates 30 0.8
renders 52 3.15
meshes 75
instances 36
render time / mesh (incl instances): 0.028363

Does anyone understand why this could be?

Throttling
If I set 4x CPU throttling in Chrome the stats become
updates 24 2.59
renders 21 15.75
meshes 71
instances 34
render time / mesh: 0.14944

In my mind, 4x CPU throttling should have not that much effect on render time, which should be handled by the GPU. But definitely it should not cause as much as a 5x slowdown. Surely I’m doing something very inefficiently (GPU is enabled for sure).

To reiterate, I am looking for all tips to improve render performance. I know WebGL is slow, but it seems to me this is a pretty basic scene that should be able to render much faster, especially concidering the static camera and 2D nature of it, but I have no idea how to exploit that.

I can show code later of course. It’s pretty basic, there’s only a couple lines of code that interact with Babylon.

Thanks for your help,
Mise

mise · September 3, 2020, 2:59pm

I’m now looking into _evaluateActiveMeshes.
Every mesh in my scene that isVisible should always be active and vica versa, but when I freezeEvaluateActiveMeshes a lot of the meshes are not rendered anymore.
Can I simple call setActive somewhere to control it myself?

sebavan · September 3, 2020, 3:00pm

This has tons of tips you might want to look at Optimize your scene - Babylon.js Documentation

In your case you could rely on the alwaysSelectAsActiveMesh to bypass the evaluate for meshes which are always on screen and probably freeze materials and/or static meshes.

Instances should definitely help as it reduces the draw calls so it is utterly strange it does not.

Evgeni_Popov · September 3, 2020, 3:37pm

Regarding instances not being faster, there is a small penalty on the CPU side to using instances because some buffers have to be refilled each frame. With only a few instances, the penalty on the CPU side may be greater than the boost on the GPU side, all the more if the GPU is not the bottleneck.

If possible for you you should try to use the thin instances, there’s no penalty to use them, even if there are only a few. But if the GPU is not your bottleneck, I think you should focus more on the js side (freeze all you can as @sebavan suggested, merge meshes if possible, etc).

mise · September 3, 2020, 3:38pm

Yes, I’ve gone through the tips a couple of times already.
I agree that the problem seems to be the CPU.

I don’t think alwaysSelectAsActiveMesh is relevant for me as I don’t have frustrum clipping enabled. It has no effect either way, when I use freezeEvaluateActiveMeshes, new meshes I add to the scene are not rendered until suddenly they all are (I haven’t figuered out yet what causes this change). In principle what I want is very easy: isVisible means render it.

sebavan · September 3, 2020, 3:42pm

you could potentially also freezeEvaluateActiveMeshes and unfreeze on demand only when you add/remove meshes getting the best of both ?

mise · September 3, 2020, 3:50pm

I don’t see why it should ever need to evaluate it, because I can just tell the engine which meshes are active.

mise · September 3, 2020, 4:04pm

Ok, let’s make this a little simpler
here’s profiling of a static scene (so no meshes added/removed, no animation) with 6x CPU throttling.
I have freezed the evaluation.
On paper the CPU should’t have to do much here, yet we’re dropping 65% of frames.

I can try thin instances, I’ll have to look into that more.
I can also try merginig meshes if you think it will give me a lot of benifit.
I am also going to try sharing more Materials.

by the way: if someone knows how to get rid of the Update Layer Tree that would be great, because that seems like a waste of time to me in a DOM with only a canvas.

sebavan · September 3, 2020, 4:47pm

Evaluate actually do more than building a list it also takes car of all the dispatch to the proper step in the rendering like opaque alpha test and alpha blend + sorting which explains the heavy cpu usage on large scene graphs.

sebavan · September 3, 2020, 4:48pm

and yup in your case material sharing will help

mise · September 3, 2020, 4:56pm

I’m not setting any parent meshes, so every mesh is on its own, could that be something that matters a lot for performance?

sebavan · September 3, 2020, 4:57pm

This should not impact much

JCPalmer · September 7, 2020, 11:28am

Actually there is more work needed to compute the world matrix of a mesh with a parent. Heavy nesting of parenting is something I have had to undo in the past.

mise · September 8, 2020, 1:11pm

in reference to my last CPU profile picture:
this is taken on a static scene, so afaik I’m not running any code other than render(), can someone explain why a function like setTexture is called every frame? Looking at the babylon source this seems to be making a call to the GPU, I don’t understand that.

sebavan · September 8, 2020, 1:47pm

setTexture is used to set the texture associated to a shaderProgram. so it only sets the current texture index to the actual texture pointer which has a really tiny cost but is necessary as soon as some of the active textures are changing between draw calls.

This is not transferring the texture on this call, the bandwidth is reduced here.

mise · September 9, 2020, 4:54pm

Thank you for sticking with me
It’s good to know that it is not a costly action, but forgive me if I will inquire some more.

As I’m not changing anything between draw calls, I hope to learn what the CPU is doing exactly and why in order to know how I can do less work, or more work on the GPU. (If needed I can work more directly against WebGL)

I ensured freezeMaterials(), freezeActiveMeshes() and freezeWorldMatrix().
Then we’re left with the following functions being called each frame:

gl.uniformMatrix4fv
gl.bindBuffer
gl.bindTexture
gl.bufferSubData

note that drawElements is not taking a lot of time at all, the GPU is not busy at all and still the FPS is 47.

So the main question is: why are these calls needed?

I have learned a lot already in the past days and faced with tough desicions I’m still missing some insight.

Deltakosh · September 9, 2020, 5:16pm

they are required because webgl is a state machine. You set the states, do the rendering, set the new states, do the rendering etc…

so all the gl.xxx functions are about setting the state for a given mesh

Deltakosh · September 9, 2020, 5:17pm

You can reduce this calls by reducing the drawcalls
To achieve so you can:

use instances or thin instances
merge meshes
share materials

Evgeni_Popov · September 9, 2020, 5:23pm

Also, RenderGroup.renderSorted takes quite a bit of time: do you use a lot of transparent objects?

You should expand this item in the profiling report above to see what’s really going on.

mise · September 10, 2020, 12:04pm

yes, I am using a lot of alpha blending (text/icons) and there’s also just a lot of different text/textures. which all makes using instances harder as well. I’m looking into using less alpha blending and merging meshes.

I’m also seeing some opportunities in the function you mentioned:

I don’t know why the getter on alphaIndex takes so long
the distance to camera for me is always exactly analogous to alphaIndex, so this can be skipped in my case

It seems like both these could in principle be 0 ms if the getter on alpha index is fast.

I need to think about whether I should optimize my scene for Babylon or go with a fork, or if Babylon is just not optimized for this special case and I should completely rethink my rendering strategy. It could also be that the team wants to support more feature flags that would benefit this Orthagraphic static camera type scene, let me know!

In either case I super appreciate your help because I already know so much more.

Topic		Replies	Views
Render performance Questions	26	2518	April 30, 2022
Rendering performance issues Questions	20	1292	August 14, 2023
Question (or a lot of questions) about performance and optimisations Questions	50	1505	September 30, 2019
WebXR performance on a Quest 1 when close to a mesh Questions	9	400	April 20, 2021
Need Help Optimizing Performance for New Game (CPU bottleneck) Questions	19	171	December 9, 2024

Mission to improve render performance

Related topics