Curious about this.totalVertices.addCount() in the evaluate active mesh loop

Hi there! We are doing some performance debugging and of course we ended up staring at our good friend the evaluate active mesh loop quite a bit. :slight_smile:

We are wondering what the purpose is of this.totalVertices.addCount is? It’s done on meshes even before it has determined which ones are active or not, so it’s doing calculations for meshes that we wouldn’t expect to need to be evaluated much at all.

Is there any way that calculation could run only for actually active meshes as determined a little bit further down in the loop? Curious to know more about the purpose of this calculation and why it’s done for every mesh in the loop instead of just the active ones!

2 Likes

And on this topic: similarly curious about the mesh._internalAbstractMeshDataInfo._currentLODIsUpToDate = false

This is taking up a lot of time and happens quite early in the mix, for every mesh whether it’s enabled or not!

Pinging @Cedric the king of Meshes

2 Likes

I’ve checked the code and have no idea why this exists. Maybe some legacy to preserve back compat. History points to a 5 years old bug fix that doesn’t seem related.
Do you remember something @sebavan ?

As you can see a few lines down, mesh._internalAbstractMeshDataInfo._currentLODIsUpToDate = true; takes 1.0ms. So I don’t think it’s the line itself that’s taking time (it’s a simple variable assignment - there’s no setter involved).

If you see 470.4ms for mesh._internalAbstractMeshDataInfo._currentLODIsUpToDate = false;, I think it’s because you have a large number of meshes being culled, so this line is executed a lot of time unlike the line that sets the variable to true.

In any case, I’m always a bit suspicious of timing at line level…

As for the total number of vertices, the aim is to give the number of vertices for all the meshes in the scene, not the number of vertices for the visible meshes. “Active” counters are counters for visible meshes.

2 Likes

Hmm, thanks @Evgeni_Popov ! Any chance one can opt out of “count all the vertices of meshes that aren’t active”, or have that happen less frequently ? It’s just not a metric that we look at, and not worth the performance hit given that it happens so frequently. We would prefer not to have those counted if we were given a choice!

And that’s curious about the LOD setter - almost all of our meshes are set to alwaysActive, so I don’t think a lot of meshes are being culled. We do have lots of disabled meshes in the scene though.

Thanks! Agree that the line timings are suspect, especially after learning _currentLODIsUpToDate = false is not a setter.

For the counters, we are also seeing significant self time in getTotalVertices, so that’s a more believable number for the line timing. Can the counters be disabled or are they necessary for the engine? I think we’re mostly just suprised that meshes with setEnabled(false) could still have a measurable impact on our render loop, although this is measurement is from an extreme case where the disabled meshes outnumber the enabled meshes by orders of magnitude (something we’ll be addressing as well)

1 Like

Can’t find the docs, but, based on it being a public property and this note from old release notes, it seems like we could set scene.getActiveMeshCandidates to a custom method if we wanted to exclude some meshes from that evaluateActiveMeshes loop, is that correct?

Replacing IActiveMeshCandidateProvider and the according scene setter by a set of custom predicates scene.getActiveMeshCandidates, scene.getActiveSubMeshCandidates, scene.getIntersectingSubMeshCandidates and scene.getCollidingSubMeshCandidates (sebavan). This helps opening more customization to everybody.

Even if that option to set a custom method ^ were true, seems like it would be non-trivial effort to maintain that array of meshes to pass in there. We’re going to experiment with it but it would definitely be ideal if we could just truly exclude inactive meshes from as much as we could in the loop, especially this vertices count thing.

“If a tree falls in a forest and no one hears it, does it make a sound?”
“Is a mesh really inactive if its verts are still being counted?” :laughing:

1 Like

Before going any further, I would want to make sure that mesh._internalAbstractMeshDataInfo._currentLODIsUpToDate = true is really a problem, because I can’t understand how it can be…

So, are you able to use a Babylon.js package without this line and make some tests again?

If you are using UMD packages, it should be a matter of finding the line in the package and simply commenting it out. If you are using ES6, you should be able to do the same thing by modifying the file node_modules/@babylonjs/core/scene.js.

We will try to confirm that but I think the one that is more problematic is the vertices count @Evgeni_Popov !

For this one, you could simply do scene.totalVerticesPerfCounter.addCount = () => void(0);:

@Evgeni_Popov would that work even though the problem is not the counter function it’s the mesh.getTotalVertices call that is in the evaluate active mesh loop? Would setting it as you say overwrite the function in the evaluate active mesh loop somehow?

No, if getTotalVertices is the problem, it won’t fix it…

It’s true that this._totalVertices.addCount(mesh.getTotalVertices(), false) makes a few calls and a number of checks each time… Adding a condition not to execute the line would probably help when the condition is false, but for backwards compatibility we would set it to true by default, which would mean an extra conditional check for everyone…

cc @sebavan for his opinion, but be patient, he is currently on vacation.

In the meantime, are you able to test the impact for your use case by adding a condition like:

if (!this.disableTotalVerticesPerfCounter) {
    this._totalVertices.addCount(mesh.getTotalVertices(), false);
}

and add a disableTotalVerticesPerfCounter = true property to the scene class? Don’t replace !this.disableTotalVerticesPerfCounter with false because the JVM could completely remove the conditional block in that case…

I made some simple changes to the function and ran some tests.

With 10000 instances (all being set to alwaysSelectTrue) and half of them disabled I was able to get the following.
Normal function: 113-128 abs fps
Modified function that moves the mesh.isEnabled continue higher in the logic and nerfing out the vertices count: 130-144 abs fps
A second variant of the modified function with the mesh._internalAbstractMeshDataInfo._currentLODIsUpToDate = false nerfed out to 165-180 abs fps.

So there are some gains to be had it looks like.

I don’t see any difference when the line mesh._internalAbstractMeshDataInfo._currentLODIsUpToDate = false is commented or not (in any case, it would be a bug to comment it out): 180-185 fps on my computer (Windows 11, Chrome), this._totalVertices.addCount(...) being commented out.

“Interestingly”, with the normal code (nothing commented out), I have better performances: 195-200 fps…

Now, doing several runs in sequence:

200 fps when both lines are commented out
175 fps when both lines are not commented out

New run:

185 fps when both lines are commented out
185 fps when both lines are not commented out

New run:

167 fps when both lines are commented out
185 fps when both lines are not commented out

Each time I wait at least 20s or 30s and monitor the fps to (try to) pick the right value.

As you can see, the timings are not reliable, sometimes one is better than the other, and sometimes not. To me, it means there’s no real difference between both cases.

I have the same behavior in Firefox, except that the fps is much lower (78-80 fps).

@Evgeni_Popov - I instrumented the loop to get better timings. Framerate was really low due to the extra profiling, so take with a grain of salt, but we should still be safe in comparing the relative times on these console.time blocks

So yes, the _currentLODISUpToDate assignment is not actually taking time, was just sampling error in the profiling.

The vertex counting, on the other hand, is taking as much or more time than computing world matrices.

The isReady/isEnabled/isZeroScale checks are also taking significant time, with isReady even showing up on its own for self time (note that all meshes were ready in this scenario). Since isEnabled is a simpler check than isReady, we could benefit from short-circuiting if we put the isEnabled check first, although that may be specific to our situation with many disabled meshes.

2 Likes

Hmmm, that’s really interesting because on my tests it was clearly different. I was never able to get to 200 though so you must have a beast of a system.

Thanks for taking a look <3

In my experience, per line timings are simply unreliable and can’t be trusted (or maybe it is that we don’t interpret them correctly, I don’t know).

Your first screenshot:

You said the wrong timing for mesh._internalAbstractMeshDataInfo._currentLODIsUpToDate = false is a sampling error, but I don’t see how Chrome could make such sampling errors… When it samples the running code, it is either running the line or not, and should account for the time of this line or not.

In this screenshot, this line takes 3x more time to execute than mesh.computeWorldMatrix, whereas in your latest screenshot:

it is computeWorldMatrix that is taking 3.6x more time to execute… Same thing for totalVertices.addCount which is faster than computeWorldMatrix in this snapshot, whereas it is 2.5x slower in the first one.

The test you should perform is to look at the fps depending on things being commented out or not, as thanks to @Pryme8 we can now easily do these tests, and see if you can detect a significant difference (which is not easy either, see my other post above). Note that you should use the code with the if (!this.disableTotalVerticesPerfCounter) around _totalVertices because we won’t be able to just remove this line.

I agree we should change the order of the if (!mesh.isReady() || !mesh.isEnabled() || mesh.scaling.hasAZeroComponent) test, though. I will make a PR for that.

Regarding the mesh.isReady function, it has to check a number of things before returning true or false, there’s really no way around it…

In my experience, performance profiling of js code is always difficult, if not impossible, even when you try to get rid of everything that could interfer with the results (close all running apps on your computer, use an anonymous window in your browser, run your tests several times and take the mean, …). I spent hours trying to do this, and often the end result was that… there were no results, I couldn’t make reproducible test cases!

In the present case, as we know what to test / change in the code base, we can run a real world scenario (PG, project) and measure the impact: that’s probably the best thing to do.