Designing for low end GPUs

Anyone have any tips for designing a game for low end GPUs?

Here is my current understanding of roughly optimal rendering of a simple cube mesh:

https://playground.babylonjs.com/#17ABFT#2

Does anyone know tricks beyond what’s in that PG? It is just instances with a frozen material. I like to pretend that it is a ballpark estimate of how many really simple entities could be in a game. If anyone knows a general approach that is even more lowfi please chime in :smiley:

I’ve figured that low poly models would matter, and I’m sure that they do to an extent… but it isn’t so clear cut. For example changing from cubes to spheres did not always drastically drop the FPS the way I thought it would.

I’m curious if anyone could share how many cubes they can render before consistently dropping below maximum frame rate. It would be nice to get some chromebooks or anything that typically isn’t known for 3D power.

I did not actually fullscreen any of these, so the resolution is maybe not so relevant

Onboard gpu:
300 cubes @ 60 FPS- ASUS T100T, 1280x800 (2015 tablet) - sometimes can only do 100 cubes; may have some funky temperature/power throttling

Discrete gpu:
1100 cubes @ 60 FPS, 1920x1080 - GTX 660m (2015 gaming laptop)
600 cubes @ 144 FPS, 2560x1440 - Ryzen 1700 + GTX 1070 (2017 gaming desktop)
700 spheres @ 60 FPS, (?? faster for some reason)
2200 cubes @ 60 FPS 1680x1050- same as above, different screen
1900 spheres @ 60 FPS

1 Like

There’s documentation on various techniques and tools for optimising performance here Optimizing Your Scene and also the SceneOptimizer class

2 Likes

It’s almost project specific performance targetting.

You can try to get stats with 1000 cube instanced and have nice perf, but then in your project have only 50 objects but with their 50 materials + textures and get a significant FPS drop.

As inteja links above, you can adapt your project using some technics, but don’t forget to be sober from the very beginning of your project if you know you will target low end devices: polycount, texture size, number of drawcalls, physic use, standard material vs PBR, etc.

1 Like

@timetocode, I agree with @Vinc3r about designing for targeted devices and thinking in terms of polycount, number and size of textures, number of loaded meshes, the type of animation you are using (node vs. skinned vs. morph target), lighting calculations for analytical lights and IBL versus unlit materials.

However, trying to figure out what you can build starting with script generated cubes sans textures is a hard way to understand what your assets can look like. Unless, of course, you are designing a game where your are is cubes. I find that when designing for any device, low or high, I need to look at the concept art of the game I am making and then put a little time into grey boxing assets to get in engine. This is the easiest way to know if I am overshooting the budget.

It can be as simple as generating a few spheres in a DCC package that are subdivided to an appropriate count (and making a few different options) and exporting out several options with a single mesh and with multiple meshes (say 1, 3, and 5). I make sure to add a material per mesh so that you can test multiple materials as well and assign a full PBR texture set to each so I can test loading multiple textures. You can start with large 4K textures if you really want to set a floor and see where your frame rate ends up when using these files.

That gives you levers to play with by dropping the number of materials or texture size, reducing the number of meshes per object, and playing with poly count. Yes, it does take some time to set up, but I see this time as super valuable because it helps you set your standards for your assets quickly. It’s also easier to find optimizations for assets that don’t perform well than starting from a low bar that runs well and try to figure out how much headroom you have to up the quality bar.

And if you want a quick set of assets with animations, you can always test your scene with some of the assets from https://www.mixamo.com/ which have complex models and skinned animations.

4 Likes

Thanks everyone. Some great info from the specific engine flags to the thought process behind designing for certain hardware.

In theory what are the performance differences between node animations and skinned animations? I can say from testing that skinned animation on a gaming rig performs better than node animations… but what should we expect on low end gpus? My limited tests weren’t finding a difference and I wasn’t sure why.

I’m making an 8-12 player FPS at the moment, but I’d like to do something wilder on the next project. I’ve even been wondering if using the particle system to render entities would be viable…I’ve got a network engine that can do 50-150 players, and while that can be limiting as to the type of game that can result, I do want to try to make something at a larger multiplayer scale in 3D.

I can totally see what @PatrickRyan means about starting from a low bar and worrying about head room - I began this most recent project with cubes that could shoot each other via raycasts, and now every nice looking thing I add is quite a bit more complex than a cube and I’m re-benchmarking and wondering. I’m definitely beginning the next project with a clearer aesthetic/performance range in mind.

For others who come across this thread in the future, I do have quite a few scaling notes. They’re specific to a graphically simple multiplayer first person shooter which is heavily server side (much NullEngine), with clientside prediction of movement and weapon handling.

Rendering:

  • hardwareScaling - huge performance changes
  • instances - modest difference on the low end, pretty large difference on the higher end, downsides are very minimal so should be used whenever possible
  • merging meshes for static objects - big difference
  • objects made out of merged voxels whose texture is just a 1x256 strip of pixels, low poly and it is like not having textures - I’ve got nothing to compare to (such as standard low poly), I’ve never made a 3d game, but it seems like there are advantages here
  • culling methods / octrees for rendering or picking - no difference for my particular game
  • reflections - surprisingly reflecting a skybox seems viable on low end gpus (and looks great!), reflecting a whole bunch of other meshes is too much

Collisions / custom physics / linear alg

  • moveWithCollisions - in multiplayer or naively constructed scenes with lots of meshes use intersectsMesh paired with computeWorldMatrix or raycasts instead
  • intersectsMesh - after some spatial logic, this is a solid and scalable tool
  • raycasts - same as instersectsMesh, w/ a little bit of spatial logic these scale nicely too
  • voxel collisions/raycasts - also very fast with the right algos… certainly viable for several hundreds or thousands of checks per frame… not really BJS related here
  • Vector3 toRef functions are worth using in loops that would otherwise end up creating many vectors, my update loop of a classic first person shooter character involved ~50 vector operations. Sometimes visible in profiled garbage collection, though js has some pretty amazing optimizations for looping code that does not change the shape of the memory that it uses.

Sound

  • wavs are big, mp3s are small but not great when looped
  • Sound constructor - if the same sound can be played more than once at the same time, it may need cloned (depends how many times in total this is going to happen in a game)
  • Sound attachToMesh - only use a few per scene. If many meshes can be the sources of many sounds, then the sounds need to come from a pool and probably be positioned once (not moved along with a mesh).
  • Concurrency - a game that can play many sounds at the same time has all sorts of considerations, and often needs to prevent similar sounds from playing at roughly the same time, as well as stop already playing sounds when a similar sound needs to be played closer to the player. Creating an unbounded number of sounds will introduce some sort of bit-crunch artifacts and eventually break all the audio
4 Likes

@timetocode, the main trade off for animation in terms of resources of node animation vs. skinned animation vs. morph target animation is in the amount of data you have in your files. It can be broken down in two main categories.

File Size

  • Node animation will be the smallest amount of data as you have TRS data for each mesh in your file.
  • Skinned animation will be larger in terms of nodes, curves, and skinning data. Each vertex in the file must have data for normalized skin weight for up to four bones. You will have an increase in nodes as you will carry all of the joint nodes which can be a significant number even for simple rigs. And each joint can carry TRS data as well so the number of curves stored increases as well.
  • Morph target animation will likely be the largest as you need to carry multiple copies of the same mesh with deformations on each mesh to change the positions of the vertices. The number of curves aren’t significant in that you carry one curve per morph target, but the duplicate meshes mean a larger file. The real problem is that most animation can’t get by with just morph target animation so these files often contain skinned and/or node animation as well.

Frame Update Overhead

  • Skinned animation is likely the most complex in terms of resources needed per frame as you are dealing with a complex parent structure for the joints which each can have animation, then layer on top of that the skinning information per vertex for up to 4 bone influences to compute the final position for each vertex per frame.
  • The other two aren’t as complicated as morph target is a linear interpolation between to positions per curve. It is a little more complex with multiple morph targets, but not as complex as skinned animation.
  • Node animation is the simplest and cheapest of the animation types per frame.

Complexity of the file is the main trade off here. Larger files mean more load time which magnifies as rigs get more complex and more animations are used. Add to that the times where you need multiple meshes in a file skinned to the same rig, you are then adding draw calls to the mix.

Starting with your concept art and then diagramming things like the kind of animation needed, and how complex the rig needs to be can start you thinking about where you need to optimize from the start. After that, it’s likely easier to simplify a rig and mesh to gain performance instead of starting from simple node animation on a primitive and then try to scale up from there.

3 Likes