Hardware instancing and sprite rendering performance (or lack thereof)

Hello… again! It’s me, with yet another sprite-related issue.

I finally managed to finish my animation system, using the APIs provided by BABYLON.SpritePackedManager and BABYLON.Sprite. Unfortunately, there’s a huge problem: It’s incredibly slow when adding more than a couple dozen sprite instances: Performance degrades rapidly after just 100-150 sprites (even on my 4.4 GHz Desktop PC) and it’s nowhere near enough for what I had in mind.

Some of that might be lack of optimization on my part, but seeing how other engines and games can render tens of thousands of sprite instances without issues, it appears there’s something fundamentally “wrong” (for lack of a better word) with the way my animations are implemented.

After doing a lot of research, this is my understanding so far:

  • Sprites instantiated are all part of the same mesh represented by the sprite manager, and all rendered in one draw call (since it uses just one texture)

  • Whenever the position of any one of those instances changes, BJS must rebuild the mesh/resend the vertex data for the entire mesh? This would mean, with a large number of animated sprites, it’d have to do this virtually very frame, explaining the serious slowdown I’m experiencing

  • It should be possible to use hardware instancing for sprites just like it’s used for regular meshes and particles; if sprites are simply rendered on a plane and “animation” is merely changing the position and/or UV data of the mesh’s vertices, then it’s quite similar, no?

  • From some old forum threads, it appears BJS doesn’t support instancing for sprites, and implementing it would require some shady, I mean, shader stuff which I didn’t really understand

Please do correct me if I got something wrong!

Now, this makes me wonder:

  1. Is it in fact possible to do exactly what BJS’s sprite APIs do, except using instances?

  2. If so, why does the sprite manager use just one mesh instead of proper instancing?

  3. If not, how could this be achieved? It must be possible since I’ve seen it in other engines, but I’m unsure how they did it

  4. How is this different from the SolidParticleSystem? Clearly it’s capable of rendering lots of particles, but it doesn’t seem to have any support for sprite-like animations

The goal would be one draw call per texture AND benefiting from instanced meshes to allow rendering thousands of individual sprites [edit: at 60 FPS, obviously].

All the sprites I’m using are similar in the sense that they represent the same entities, but not identical, meaning they need to be animated independently by changing UVs/cellIndex, position, alpha, visibility, rotation, and scaling/size.

Ideally, if there’s a way to do this it should be just as easy as using the sprite manager/sprite APIs, since that’s (presumably) why they exist in the first place. Is there anything like it, and if not how would one best go about creating it? Hopefully one of you has some ideas :slight_smile:

Thank you for your time!

For your need I think they are some options:

  • Make the sprites smarter by updating only a subset of the buffer and not all of it. The thing is that you may need to update it all anyway if all sprites are moving
  • Use SPS
  • Use meshes with billboarding: https://www.babylonjs-playground.com/#854CEJ#6

Anyway, we should first start by a repro in the playground to see how we can help you

2 Likes

Interestingly, in trying to reproduce the problem in a PG I managed to render about 35000 sprites without comparable FPS drops, even if their animation isn’t as elaborate as in my app. PG: https://playground.babylonjs.com/#35U9I8#14

This clearly means my animation code is too slow, and indeed I managed to render about 15000 sprites now, after a lot of optimization. The biggest issue seems to be my need to synchronize the position of many sprites (using some as anchors for others) every frame, as otherwise moving the camera causes visible glitches.

I’m still interested in alternative solutions, so how would I go about this?

  • Make the sprites smarter by updating only a subset of the buffer and not all of it. The thing is that you may need to update it all anyway if all sprites are moving

As for the other suggestions, I assume I’d have to re-implement the sprite API on regular (instanced) meshes or particles in order to use the same functionality? I think that’s a bit inconvenient and there’s a non-zero chance I might get it wrong and make things even slower since I don’t yet understand how it all works internally.

Perhaps the sprite manager and sprite classes could be extended to (optionally) use instancing if that’s actually possible? Surely others might rather use their APIs over creating particles and managing it all manually, as well? :slight_smile:

I can take a stab at it if I know what needs to be done. I’m not asking anyone to do all the work by themselves :smiley:

well technically the problem will be the same with instances because we will still need to update a big buffer filled with individual sprite information.

That being said, instead of sending 4 vertices per sprite I could sent only one + 3 instances. I’ll make that update (I’ll also check if we can be smarter with the update):

Ok so instances are in. But now before evaluating how to improve the appendvertex we need to measure

So can you run a profile on your system with the new build from preview (will be live in a couple of hours)

I need to see which code is the bigger offender. Is that inside your code, inside the sprite update, etc…

2 Likes

Thank you! It took some time for me to get around to looking into this more, as I had to fix some unrelated issues before I could update.

I’ve done some profiling and already noticed quite the improvement :slight_smile:

In this playground, I’m getting approximately 12-15 FPS now, after increasing the amount of sprites drastically. While it doesn’t have the same advanced animations that I use in my app, the profile might still be of interest?

As for my app, the profile indicates most of the processing time is spent on the actual animation state changes, which I probably would need to optimize further if more units are to be rendered. It should be noted that, before upgrading, I was able to render at most ~50000 fully-animated sprites at 35 FPS, while now I still got more or less 60 FPS with the same amount. In fact, simply managing the units seems to be the bottleneck now (in my not-very-optimized app), while BJS doesn’t seem to have any trouble actually rendering them at all.

I’m amazed at the number of sprites I can display, so again, thank you so much! :slight_smile:

My pleasure! Feel free to ping me if you see something that can be done at engine level