Optimizing VAT Baking to near-instant and using Float16 for 50% memory savings

Hi all, I’m working on optimizing all parts of my game currently and the next up was VAT Textures. While I am pre-baking the VAT and pulling in through http in runtime, the time it took to bake VAT was something that stuck out as ripe for optimization

Part 1: Speed (many orders of magnitude increase)

Final numbers from current vat baking to this implementation: ~1.5 minutes to 300ms
Output from the playground: Took 297.89999997615814ms to bake 72 animations and a total of 6802 frames

Took a bit to tweak the setup and dial this in, but we should be able to programmatically render scene in a blocking way (no need to defer to engine loop here it’s all contiguous data) to get the skeleton matrix transformations per animationgroup frame in a loop as fast as possible.

Part 2: Size (50% reduction)

Like the logs above show, the VAT I am baking aren’t small - many humanoid models have upwards of 72 animations and the usual method of baking through the core API/Baker producing Float32Array yields vat data around 11mb, compressed 6mb on disk, but nonetheless will need to be decompressed in memory and uploaded to the GPU.

For my purposes (may not suit all needs), using Float16 precision is perfectly fine. It has precision to 3 decimal places and there is no noticeable change in animations from 32 bit to 16 bit precision. The savings, of course, is exactly 50% on disk, memory and GPU memory.

Here is the playground. There are some other artifacts around instancing and texture atlasing, but mainly wanted to showcase:

bakeVat - fast VAT baking
textureFromBakedVertexDataHalfFloat - Raw Texture for VAT from half float

Note: The baking here is specific to float16 and would need to be modified as a config param or separate function to bake the normal float32. Would also be a tad quicker with no need to translate the matrix values. This is just a proof of concept.

7 Likes

cc @Raggar

1 Like

Woot! We mut introduce it in the engine. Wanna do a PR? we can offer the option to switch to f16 and keep the current precision for instance

1 Like

Absolutely! Probably don’t want to fudge with the existing method to avoid breakage… Will throw something up and field some design questions, like if we want it to by sync (stop the world on the main thread - I think this actually makes sense with the benchmarks vs. overhead of throwing this on a worker). Might be a few days til I get to it, on vacation right now.

I made a simpler PG based on the current VAT doc PG comparing the two side by side and loading in the sync VAT

Here are the numbers:

1013.8999999761581 ms to render async
18.5 ms to render sync

Got my first Babylon PR up :slight_smile: Add faster synchronous method for VAT baking on vertexAnimationBaker by knervous · Pull Request #16749 · BabylonJS/Babylon.js · GitHub

1 Like