Draft: Compat Animation System

Note

This is a very early draft, and could be changed in future, and might not be implemented.

Background

This is the redesign of the long-rejected proposal into a plugin.

Motivation

Currently animation system in babylon.js is very flexible, but not yet optimized for simple animation, like the GLTF ones.
GLTF animations are the main source of animation running in babylon.js, and gltf defines a much simplier animation model.
Sometimes a gltf model can contains more than 10k animation channels/samplers, and millions of keyframes, which can hit the limit of heap memory and CPU performance.
So users could be given the choice to tradeoff flexibility over performance, both for CPU and memory.

Goals

  • Compat, animations should be stored in binary format whenever possible, not only keyframes, but also samplers and runtime data.
  • WASM-first, main compute should be in wasm, and SIMD accelerated whenever possible.
  • One call per frame, all animation sampling should be done at one js-wasm call, no more.
  • Use js objects only if required, everything possible to go into the wasm heap should be there.
  • GLTF-compatible, it should support the animations decleared in core GLTF 2.0 spec, while KHR_animation_pointer kept for future.
  • Babylon.js-compatible, it can be run like an AnimationGroup in babylon.js (if advanced features not used).
  • Optional, user can choose to patch babylon.js to make it enabled by default, like gltf loader, serialization, but only after explicitly called by user.
  • Minimal runtime, the js runtime should be minimal (no emscripten), and the wasm ABI (and mem layout) should be stable. Also, if the wasm binary can be reduced to less than 4k, it can be created synchronously on chrome (the size limit have been increased a few versions ago, but it can not be assumed that all users have latest modern browsers).
  • Self-contained, no external runtime dependency except for babylon.js
  • Immutable, the animation system is immutable after constructed.
  • Zero heap growth after animation after constructed.
  • Serializable, wasm heap can be serialized to base64 and loaded from, or optional raw data if user need to process the serialized data later.

Non-goals

It is not a goal to:

  • Replace the current animation system of babylon.js.
  • Compatible with the animation curve editor.
  • Be able to change channels/samplers/keyframe data at runtime.
  • Support old browsers without wasm support.
  • Make a multithreaded runtime, which suffers too much limit by browser vendors.
  • Optimize animation channels like resampling, duplicated frame cleaning, channel target merging, constant channel purging, which should all be done at the model level, via the gltfpack or gltf-transform tool, before it was imported. (Constant samplers, if detected, could be evaluated at construction time, and moved out of the per-frame update list, but it’s keeped in mem)
  • Align keyframe data, since unaligned access is pretty fast for modern browsers.
  • Have per-channel loopMode or animation offset.
  • Non-float inputs/outputs (should be dequantized/denormalized during construction if any)
  • Control the playback of every channel, all channels must start / stop / rest once.
  • Support all the advanced advanced animation features (blending, weights, etc.)
  • Run animations on GPU like Baked Texture Animations
  • Support sparse or interleaved accessor, the runtime sampler will only contain tightly packed values (stride == element size)

Data structure

// for each animation group, there should be an animation system like this
struct animation_system_header {
    uint32_t version;
    size_t byte_length;
    struct animation_system * animation_system;
};

struct animation_system {
    size_t frame_data_length;
    float * sampler_frame_data;
    struct animation_sampler *samplers;
    size_t vec3_linear_count;
    struct animation_sampler *vec3_linear_samplers; // fast path for most-used samplers (branchless)
    uint32_t approximate_slerp; // zeux's onlerp4
    size_t quat_slerp_count;
    struct animation_sampler *quat_slerp_samplers;
    size_t other_count; // fallback path for most-used samplers (with branches)
    struct animation_sampler *other_samplers;
};

// should mostly be cgltf compatible
typedef enum animation_interpolation_type {
    animation_interpolation_type_linear,
    animation_interpolation_type_step,
    animation_interpolation_type_cubic_spline,
    // cgltf_interpolation_type_max_enum, used to represent const sampler
    animation_interpolation_type_const
} animation_interpolation_type;

typedef enum animation_value_type {
    animation_value_type_vec3,
    animation_value_type_quaternion,
    animation_value_type_vec4,
    animation_value_type_weights
} animation_value_type;

struct animation_sampler {
    animation_interpolation_type interpolation;
    animation_value_type value_type;
    uint32_t frame_count;
    uint32_t value_size; // do we need a stride here?
    float min_frame;
    float max_frame;
    uint32_t curr_frame_offset;
    uint32_t value_changed;
    float *curr_value;
    // input
    float * frames;
    // output
    void * values;
};

Data layout (low-high)

  • 1k unused data for wasm
  • 1k stack
  • a 12-byte header to help with relocate/serialize
  • frame data (input/output data like gltf)
  • curr value data for samplers
  • animation system metadata
  • sampler data (sorted by type and interpolation)

Last 5 part of data should be in a continous memory block, if a wasm heap comes with multiple animation systems, their pointers should not overlap with other animation systems.

Api designing

C api

// this allows multiple animation system in one heap, but this could cause mem leak
size_t process_frame(float curr_frame, struct animation_system * animation_system);
// for deserialization into a different memory block
void relocate(struct animation_system * animation_system, intptr_t offset);

And js should fetch data directly from heap.

js api

class AnimationSystem {
    ins: WebAssembly.Instance;
    heap: WebAssembly.Memory;
    samplers: Uint32Array;
    currentValues: Float32Array;
    pointer: number;

    targets: NodeTarget[];

    animation: BABYLON.Animation;// contains 2 frames

    // AnimationSystem itself is the target
    set frame(value: float): void;// this triggers wasm compute and set value to target

    // should patch AnimationGroup.Parse / AnimationGroup.prototype.serialize for serialization, not to make a subclass
    // it should be possible to append babylonjs animation channels to this animation group
    animationGroup: AnimationGroup;
}

// Not a class, to avoid overheads
interface NodeTarget {
    node: Node;
    morph?: MorphTargetManager
    // each samplers is evaluated once, in case of sampler having many channels
    // this might worth to be moved into the wasm heap
    translation?: number;// index of samplers, void if no channel
    rotation?: number;// index of samplers
    scale?: number;// index of samplers
    weights?: number;// index of samplers
}

Serialization

AnimationSystem should be serializable, where used memory block and base pointer serialized, when deserialized, the memory block is put into the new area, and the relocate function is used to move the pointers in the memory block.

Deserialization is not supported before explicitly called by user to patch AnimationGroup.Parse.

Animation process

scene._animate() →
animatable._animate() →
RuntimeAnimation.animate() →
animation._interpolate() (This makes a dummy animation whose frame is babylon.js frame, and value is gltf frame ) →
RuntimeAnimation.setValue() →
AnimationSystem.set frame() (setter implicitly called by setValue) →
wasm.process_frame() →
Iterate targets and fetch sampler value and set to babylon.js object

Concept Mapping

1 WebAssembly.Memory – 1 WebAssembly.Instance – 1 or many AnimationSystem

1 GLTF animation – 1 AnimationGroup – 1 TargetedAnimation { target: 1 AnimationSystem, animation: 1 Animation } – 1 RuntimeAnimation – 1 Animatable

Also note that if the gltf contains currently unsupported channels or samplers, the AnimationGroup might contains more BABYLON.TargetedAnimation for unsupported channels or samplers.

Benchmarking

Use the stress test model with minimal draw calls, collect FPS and heap memory (firefox heap memory can only be measured via devtools, chrome can use performance.memory api.

2 Likes
  • Could advanced animation features (blending, weights, etc.) still work?
  • Any clue how this will compare versus VATs (perf, memory)?

This could be a problem. In the current state, if you have a mixamo rig and like 50 animation groups (running, walking, jumping, etc.), you cannot load all these animations on startup. This is taking way too long. So you end up requesting single animation on demand (which loads surprisingly fast). **If I remember correctly I was at like 12sec per skeleton with then maybe 50 animations.

Do I read the bullet point right in that lazy-loading animations is not possible anymore?

I don’t think so, actually I’ve never used these features in prod. But since the animation group is still a babylon.js AnimationGroup, it’s possible to combine the compat animation system and babylon.js animation channels with these advanced animation features in one animation group.

This is planned to support all core animations in glttf 2.0 spec, while Baked Texture Animations supports only the skeleton animations.
This is more like a fast path for most common animation cases.

Yeah, that makes sence, let’s change it to one animation system per animation group, and the wasm heap is append-only. Users can choose to use one wasm heap and append animation groups to it (and can not be deleted later since wasm spec does not allow to free/discard allocated memory), or use one wasm instance and heap for each animation group, less efficent but more control.

Very cool project, I like the idea of trying to compact and optimize everything in BJS hehe. I’m curious about your SIMD implementation in C - are you going to provide runtime checks for processor capabilities? Are you going to bootstrap from a library that abstracts intrinsics or target each processor type for SSE2, AVX2, AVX-512 etc?

I’ve been working with SIMD with a few projects recently. One big WIP i have going is called “ShaderObject” which has one underlying buffer/arena that projects into WASM through dynamic assemblyscript emission/compilation and structs in webgl2 and webgpu. Assemblyscript is very quick to work with, I’m surprised by its efficiency… Can run ops on 150k instances in a hot loop in under 1ms. I’m also working in Rust in another project and using the wide crate which sits on top and has more out of the box control with runtime checking and provides the intrinsics.

Here’s one part of this for reference, the parent class does ASC compilation on the fly here.

On web thing goes simple, since simd128 is the only simd widely supported on browsers, there is few thing that can be manually optimized.
The C impl would likely be cglm, a lib supported simd128 for years.