GPUParticleSystem syncing with VAT animations on mesh instances

So this was a fun deep dive getting to know what drives the particle systems, particularly the GPU particle systems, in Babylon. Here’s the setup and the PG

  • Models with animations that have pre-generated VAT to create entity instances for scale
  • Item models that the entities will hold in their hands (swords, axes, shields etc.) that use the same VAT manager to follow animations
  • Particles that emit around the item (like a flaming sword) and follow animations
  • Particles that emit from left and right hands of the entity for a spell casting particle effect

There’s a good bit of code in this PG and the entity instancing/item instancing have been showcased in other projects, what I wanted to demonstrate was a lean way to have GPUParticleSystem emitters follow and conform to bone positions and rotations.

The relevant code:

// Alignment matrix to translate the emitter shape UP
// And offset slightly to the hilt of the weapon
const qAlign = BABYLON.Quaternion.RotationAxis(
    new BABYLON.Vector3(0, 0, 1),
    -Math.PI / 2
);
const alignMat = new BABYLON.Matrix();
BABYLON.Matrix.FromQuaternionToRef(qAlign, alignMat);
alignMat.addTranslationFromFloats(-1.65, 0, 0);

// One scoped world Matrix to avoid GC
const worldMat = new BABYLON.Matrix();

// A hack to the particle system's emitter to conform to the interface
// Needed to be recognized as an object that has a world matrix.
itemParticleSystem.emitter = {
    position: true,
    isEnabled() {
        return true;
    },
    // Entrypoint for on-demand CPU matrix compute
    // As lean as possible
    getWorldMatrix() {
        // Determine current frame for VAT manager
        const fromFrame = animationBuffer.x;
        const toFrame = animationBuffer.y;
        const total = toFrame - fromFrame + 1;
        const t = manager.time * animationBuffer.w;
        // Sample a random bone anchor based on time so we can generate particles emitting
        // from multiple points over time.
        const anchorIdx = (t % boneAnchors.length) | 0;
        const offsetBase = boneAnchors[anchorIdx];
        const off = (fromFrame + Math.floor(t % total)) * floatsPerFrame + offsetBase;
        // Fill in matrix based on the offset in the raw VAT data
        BABYLON.Matrix.FromArrayToRef(vatData as any, off, worldMat);
        if (canUseFloat16) {
            for (let i = 0; i < 16; i++) {
                (worldMat.m as any)[i] = BABYLON.FromHalfFloat(vatData[off + i]);
            }
        }
        alignMat.multiplyToRef(worldMat, worldMat);
        // Add to original transformNode position to keep it "moving" with an entity
        (worldMat.m as any)[12] += transformNode.position.x;
        (worldMat.m as any)[13] += transformNode.position.y;
        (worldMat.m as any)[14] += transformNode.position.z;
        return worldMat;
    }
} as any;

Here is a short clip of rendering 105 separate uniquely animating entities with random held items and alternating between a held flame effect or spell casting

5 Likes

OMG this is really good
Totally deserve a tweet from @PirateJC :smiley:

1 Like

So I’ve been hanging around this Playground for awhile using it to benchmark certain points of scale and realized it will be possible to use thin instances in my project’s setup, so using this area to prove it out. Additionally, I had an interesting epiphany:

Previously, attaching meshes like weapons/shields to an existing mesh was done through piggybacking off the parent’s VAT system via skeleton, manager, and manually setting and instance’s VertexBuffer for matricesIndices and matricesWeights
I had two points of concern with this approach:

  • Each base item instance needs to be assigned a VAT manager, which is specific to the parent model. So n number of parents (human, elf, dwarf, etc.) all need base copies of the item in order to keep the VAT skeleton aligned.
  • The VAT is duplicated in another material. This might seem trivial but at scale, with 30+ unique entities with VAT data on the order of 5mb-10mb, this takes up space.

The solution: Attaching and updating each thin instance’s matrix every frame loop to keep in sync with the parent’s bone position/rotation and root position. So this is trading some light matrix CPU math and looping over all managed item instances per frame.

I ran this at scale at on my machine noticed:

  • 50 instances: Not measurable with performance.now()
  • 500 instances: <1ms
  • 1500 instances: 1-2ms
  • 2500 instances: 1-3ms

This point was convincing enough to use this solution in my project, as the practical threshold will always be below 300.

const vatView = canUseFloat16 ? new ((window as any).Float16Array as any)(vatData.buffer) : vatData;
const worldMat = new BABYLON.Matrix();
const meshSet = Array.from(new Set(items.map(i => i.mesh)));
scene.registerBeforeRender(() => {
    manager.time += scene.getEngine().getDeltaTime() / 1000.0;
    const now = performance.now();
    for (const { animationBuffer, boneAnchor, mesh, thinInstanceIdx, transformNode } of items) {
        const fromFrame = animationBuffer[0];
        const toFrame = animationBuffer[1];
        const total = toFrame - fromFrame + 1;
        const t = manager.time * animationBuffer[3];
        const off = (fromFrame + Math.floor(t % total)) * floatsPerFrame + boneAnchor;
        BABYLON.Matrix.FromArrayToRef(vatView as any, off, worldMat);

        (worldMat.m as any)[12] += transformNode.position.x;
        (worldMat.m as any)[13] += transformNode.position.y;
        (worldMat.m as any)[14] += transformNode.position.z;
        mesh.thinInstanceSetMatrixAt(thinInstanceIdx, worldMat, false)
    }
    (window as any).perfMath = performance.now() - now
    // This is the fastest practical way I've found to update a GPU buffer inline. It avoids
    // Creating array copies or doing any math on the vertex count
    // And assumes the full byte length
    for (const mesh of meshSet) {
        mesh._thinInstanceDataStorage.matrixBuffer?.updateDirectly(mesh._thinInstanceDataStorage.matrixData!, 0);
    }

    (window as any).perf = performance.now() - now;
});

Here is the PG. This was a fun round working here again. Until next time!

3 Likes

OMG :slight_smile: runs at 120fps on my laptop lol