GUI/ADT - linkWithMesh, but link with 3D tracking

Preface: (I’m not sure if this is possible, but I figure it is, just might be difficult. And sorry for the length just want to make sure I convey why individual ADT’s isn’t a good solution here, things that I’ve tried, and the many problems I BELIVE this would have solved/would solve if this request is possible)

Request: linkWithMesh, but instead of linking in 2D space, the GUI node would use the rotation, size on screen, and culling from the linked mesh, which would basically be a plane impostor, so that all the impostors in the scene could exist in a single full screen ADT (or just 1 additional ADT, versus many)

Reason/Use case: Performance, and ease of building 2d heavy games in babylon, i.e. card games -

While I’m almost about to finally release the (card) game I’m working on, optimizing performance has been a massive time sink for me. Originally all my cards were individual ADTs on planes, including the cards in play (which made it really easy to program the prototype initially), but doing that I couldn’t get > 30fps on a mid range imac, and was the start of a long and seemingly endless series of optimizations and workarounds to get consistent 60 fps. To give you an idea of the main things I’ve had to do to optimize: (Initially I had an average of 200 or so draw calls)

  1. Make custom sprite shaders which allow instancing meshes with their own instance buffers to control which sprite frame is rendered, for virtually everything in the game (i.e. card images, names, numbers, all the pieces that make up the card i.e. frames, attribute wrappers, card backs, generic ui elements, etc. - But this is really really really time consuming to do and hard to program, as it requires you to have all these different sprite shaders, i.e. card attributes, numbers, frames, card names, card type names, card images, card frames on board, then you need to create instances from the source mesh shaders, piece them all together, etc.

Picture will be easier to understand so: the following is a hover state from one of my card fronts, note that there are 7 unique shader materials/source meshes here, 8 draw calls) along with 1 512x512 ADT per card front for the descriptions, since the descriptions are dynamic. (But adding another card front, i.e. X cards in the player hand, will now only add 1 draw call per card displayed, since only the description mesh is not instanced.

The difficulty isn’t just managing all that, but also creating the assets upfront (Rendering the card names, types, via html or another babylon scene and saving them to disk, grouping them together into reasonable sized textures and turning them into sprites, etc). - After doing the above, I got down to 40 average draw calls or so, which is about where I’m at now. But performance still wasn’t great on my 2015 imac w Radeon R9/2048mb card (not great but still decent card), especially when heavy animations were going on. Which led me to the next step:

  1. Use basis for everything possible rather than raw jpg or png. - After doing this, I finally was able to get 60 fps almost always while no heavy animations/game state changes are taking place, with animations still feeling smooth and close to 60 when meshes are being disposed and garbage collection and all that is occurring.

However: I am only able to run the game in non retina mode for most graphics cards. Setting hardware scaling to 0.5 for most graphics cards still results in 30 or so FPS, which I believe is somehow due to the card descriptions, even though I only ever create them when a card front is shown, and there are generally never more than 1-10 visible at a given time (cards in the players hand). - Note I’ve experimented with all versions of solving that problem by grouping the descriptions together into 1 giant ADT for the descriptions and passing that to a sprite shader (for 1 draw call), which made performance significantly worse, grouping them into 1024x1024 shaders, etc. But for whatever reason individual 512 ADTS for the card descriptions has been the best, worst performance route, even though when I was using 1024x1024 ADTS for the entire card fronts, that was killing performance entirely.

Anyways, the thing I realized the other day and reason I am bringing this up is, if this is possible, I believe performance could be massively improved, along with being able to skip texture compression, since:
Browser would be keeping the images in memory, but wouldn’t be sitting individually on the GPU, since the fullscreen texture is sent to GPU on each frame as far as I understand it. (I’m not sure if it’s expensive to read the GUI images that end up as part of the ADT when scene is combined, but since browsers are optimized for that kind of thing I’d imagine it wouldn’t be?)
And also it would reduce the whole process back to 1 draw call, and prevent me (and anyone else doing 2d/3d hybrid games) from having to do all this complicated sprite work.

@Evgeni_Popov what do you think about it ?

That requires a zbuffer for the plane to be rendered correctly with the other objects of the scene, so that the GUI element be rendered as a regular 3D object. What would be the difference with a regular plane?

Uploading some texture data to the GPU is notoriously slow and it is a synchronous process in that the data need to be uploaded before the texture can be used for rendering purpose.

However, once the ADT is created, if you don’t modify any control of it it is essentially free and used as a regular texture: do you modify some controls that would lead to the ADT being redrawn each frame, and so being uploaded each frame? Else I don’t really understand how the ADT can be the bottleneck…

If you generate the ADTs once and reuse them “as is”, then you could also try to create a regular texture from the data bits of the ADT to see if ADT itself is the problem (I think a dynamic texture created on the GPU is less performant than a static one, but @sebavan will know better).

Also, have you tried a tool like Spector to see what’s really going on when a frame is rendered in your game? Sometimes you can be surprised…

However:

means you are fill rate bound (fillrate = screen pixels * shader complexity * overdraw). I guess your ADTs are still 512x512 in retina mode (?), in which case ADTs are not the problem as it will take the same amount of time to handle them in both cases.

It could be your shaders, doing too many/heavy computations or texture reads. You could try to disable them one by one (meaning just return a fixed static color) and see if that helps to identify the culprit (if any).

Also, this doc from Unity can help (even if it’s for mobiles): https://docs.unity3d.com/Manual/MobileOptimisation.html

Sorry for slow reply - The difference is that it would all be grouped into one single draw call for the single texture, rather than many draw calls / many source meshes and many textures. - Like the vast majority of a simple plane heavy game like mine could be programmed using this strategy, which would make it incredibly easy to program as well as efficient. - But would also allow for the GPU not to be overloaded with images, since there would be just 1 texture (being compiled from the canvas elements/dom images etc), which would be sent once per render. (At least theoretically)

But: to make it less theoretical, here’s an example scenario I’ve encountered, while building a deckbuilder type scene. I’ve used lots of large (1024x1024, about 1MB average per image) GUI images in fullscreen mode, and have never really ran into any performance issues doing that. (While showing 4-5 of those planes at a time.) But originally I was trying to use those same 1MB images on regular textures, and GPU would start to show signs of choking or serious slowdown after the cards were being lazy loaded 5 at a time, after a few scrolls and 30+ of those were getting sent to GPU. - Some of this is from memory as it was awhile ago so I’d have to go back and do a full load as many images as I can performance test, but I do know that I’ve never had any problems with a single full screen ADT no matter how many images I was using or loading, which I assume is because it is sitting in the browser memory and only being read when necessary.

  • regarding the description meshes issue, I never did figure it out exactly. But no I was only changing the ADT when the game state was updated with a change that would cause the description to change which is a rare occurrence, and I was using the ADT as a regular texture, passed to a custom instanceable shader that could use the textures as a sprite sheet. - And am doing nothing but really simple sprite math, in the vast majority of my shaders, no loops or multiple texture fetches, so don’t think that was it, I’ll probably revist the issue at some point but 512 ADTs seems to be working fine for now, in 1.0 scaling mode at least.

I would say that 40 draw calls is already a relatively slow number of calls, so reducing it further may not be the solution to your problem: the fact that at 0.5 hardware scaling you experience slow downs would point to a fillrate problem (as described previously) and not a problem with the number of draw calls issued (except maybe if a call is exceptionally slow on your GPU/Driver (?)).