Curious about this.totalVertices.addCount() in the evaluate active mesh loop

Here is one for you @Evgeni_Popov
I nerfed out the mesh.computeWorldMatrix in the evaluation loop, and as far as I can tell everything still works.

Im assuming this would break some of the picking methods though, or some of the frustrum culling things.

Is there a way to know when it would be ok to skip this or not?

The function went from 4ms to 2ms calculation time which is pretty sizable, but with something like this it would be save to assume its important for things that I’m not thinking about.

It is never ok :slight_smile:

In your example, computeWorldMatrix is called indirectly by the instance rendering code, which at some point is doing instance.getWorldMatrix(), which is what triggers the call to computeWorldMatrix.

You can use the world matrix freezing feature, if you know you won’t change some world matrices.

So then for instances does that mean its being called twice?

Because going off that idea I tried:
image
got me to:
image

while:
image
got me to:
image

Of course this is a scene with a huge amount of instances, but if the getWorldMatrix gets called is there anything before that which would need that calculation?

I agree about the line timings, which is why I instrumented the code with console.time and provided the accurate timings for you to review instead:

The screenshot with the code was just for reference so you could see what the console.time labels corresponded to.

Since the profiler works by sampling the location of the code pointer form time to time, there is always some sampling error where the lines of code are under or overrepresented in the sample due to random chance. Don’t know why you’re nitpicking on this point when I’ve been trying to agree with you on it the entire time: there is no problem with the _currentLODIsUpToDate assignment.

It’s not useful to use the line timings that we both agree are unreliable nor to compare across profiles that weren’t made under identical conditions. Use the console.time tags in the bottom-up chart. We spent 145ms counting vertices for runtime statistics versus 16.1ms computing world matrices. That’s means we’re spending a really meaningful amount of time performing non-rendering tasks in the render loop.

Fully in agreement about running real world scenarios - that’s what I’ve provided for you here.

As I said above, I couldn’t reproduce the slowdown when the line isn’t commented versus when it is.

This is something I’d like to see, a real PG (or better: several PGs) where the framerate is significantly different in both cases => it’s not only me, other people in the team will also run the tests, to get a wider range of results.

Performance snapshot figures are not sufficient to justify code modification. They’re great for pointing out code that might be a bottleneck / paint point / in need of improvement (as you’ve done in this thread for this._totalVertices.addCount), but the next step is to test what happens when the code is updated to fix / improve what was found in the first step.

Don’t get me wrong: it’s obvious that this._totalVertices takes time, and if it were possible to simply delete the line, there’d be no discussion and we’d do it! But that’s not possible, the only thing we can do is something like:

if (!this.disableTotalVerticesPerfCounter) {
    this._totalVertices.addCount(mesh.getTotalVertices(), false) ;
}

So, what should be tested are PGs when:

  • A: using existing code
  • B: using the code with the if condition and disableTotalVerticesPerfCounter=false.
  • C: using the code with the if condition and disableTotalVerticesPerfCounter=true.

Then compare the results.

C will probably be equal to or faster than A (still, we need numbers, because if we can’t prove that it’s faster in some cases, there’s no point in making the change).

The concern is A versus B, B being what we’ll get by default if we make the change. If B is significantly slower than A, we probably won’t make the change, because that’s what people will experience with the default configuration.

We need to be very careful because this code is a hot path, it’s called at every frame and for every mesh, so we need to be very sure that we’re not hurting performance when we make changes.

I’ll try to do some testing on my own, but I’m very busy at the moment, so I can’t commit to a timetable.

[EDIT] I should answer this one:

I meant real word scenarios than I and others in the team can run and test for themselves :slight_smile:

1 Like

One change we can make now is to use a counter that contains the current total number of vertices, instead of calling this._totalVertices.addCount at each loop:

This won’t solve everything (especially if the mesh.getTotalVertices() call is the most time-consuming), but it may help a little.

Actually I think it could be a viable solution. You could implement the frustum check in getActiveMeshCandidates and keep only the meshes that pass the test. To avoid doing the frustum check again for the meshes that you know are visible, you should set mesh.alwaysSelectAsActiveMesh = true:

It is the same PG than the one I used in my tests above. This time it is running at ~240 fps.

1 Like

Unless you are looking at all the meshes then it drops in performance quite a bit to about ~60-70 fps
vs this one that does the instance check I have been asking about which holds ~170-180 fps.

With this one that filters the meshes in the eval step getting an even faster ms on the mesh selection process.

Indeed, overriding getActiveMeshCandidates is only a win if a lot of the meshes are out of the frustum.

Not only that, but you’ve merged two conditions into one and the line mesh._internalAbstractMeshDataInfo._currentLODIsUpToDate = false; isn’t executed as often as it was before your changes.

This may work in this PG, but may be broken in others. I wouldn’t risk changing the order of operations in this loop, the stakes are too high.