Note
This is a very early draft, and could be changed in future, and might not be implemented.
Background
This is the redesign of the long-rejected proposal into a plugin.
Motivation
Currently animation system in babylon.js is very flexible, but not yet optimized for simple animation, like the GLTF ones.
GLTF animations are the main source of animation running in babylon.js, and gltf defines a much simplier animation model.
Sometimes a gltf model can contains more than 10k animation channels/samplers, and millions of keyframes, which can hit the limit of heap memory and CPU performance.
So users could be given the choice to tradeoff flexibility over performance, both for CPU and memory.
Goals
- Compact, animations should be stored in binary format whenever possible, not only keyframes, but also samplers and runtime data.
- WASM-first, main compute should be in wasm, and SIMD accelerated whenever possible.
- One call per frame, all animation sampling should be done at one js-wasm call, no more.
- Use js objects only if required, everything possible to go into the wasm heap should be there.
- GLTF-compatible, it should support the animations decleared in core GLTF 2.0 spec, while KHR_animation_pointer kept for future.
- Babylon.js-compatible, it can be run like an AnimationGroup in babylon.js (if advanced features not used).
- Optional, user can choose to patch babylon.js to make it enabled by default, like gltf loader, serialization, but only after explicitly called by user.
- Minimal runtime, the js runtime should be minimal (no emscripten), and the wasm ABI (and mem layout) should be stable. Also, if the wasm binary can be reduced to less than 4k, it can be created synchronously on chrome (the size limit have been increased a few versions ago, but it can not be assumed that all users have latest modern browsers).
- To reduce it to less than 4k, libc’s math functions (
acosf,sinf) can not be used, either a polynomial approximation being used, or entirely dropsslerpand fully go toonlerp, both costs precision.
- To reduce it to less than 4k, libc’s math functions (
- Self-contained, no external runtime dependency except for babylon.js
- Immutable, the animation system is immutable after constructed.
- Zero heap growth after animation after constructed.
- Serializable, wasm heap can be serialized to base64 and loaded from, or optional raw data if user need to process the serialized data later.
Non-goals
It is not a goal to:
- Replace the current animation system of babylon.js.
- Compatible with the animation curve editor.
- Be able to change channels/samplers/keyframe data at runtime.
- Support old browsers without wasm support.
- Make a multithreaded runtime, which suffers too much limit by browser vendors.
- Optimize animation channels like resampling, duplicated frame cleaning, channel target merging, constant channel purging, which should all be done at the model level, via the gltfpack or gltf-transform tool, before it was imported. (Constant samplers, if detected, could be evaluated at construction time, and moved out of the per-frame update list, but it’s keeped in mem)
- Align keyframe data, since unaligned access is pretty fast for modern browsers.
- Have per-channel loopMode or animation offset.
- Non-float inputs/outputs (should be dequantized/denormalized during construction if any)
- Control the playback of every channel, all channels must start / stop / rest once.
- Support all the advanced advanced animation features (blending, weights, etc.)
- Run animations on GPU like Baked Texture Animations
- Support sparse or interleaved accessor, the runtime sampler will only contain tightly packed values (stride == element size)
Data structure
// for each animation group, there should be an animation system like this
// 12 bytes on wasm32
struct animation_system_header {
uint32_t version;
uint32_t byte_length; // total byte length of the continuous data block
struct animation_system * animation_system;
};
// 44 bytes on wasm32
struct animation_system {
uint32_t frame_data_length;
float * sampler_frame_data;
uint32_t sampler_count; // total sampler count, = vec3_linear_count + quat_slerp_count + other_count
struct animation_sampler *samplers; // base pointer to all samplers (contiguous)
uint32_t vec3_linear_count;
struct animation_sampler *vec3_linear_samplers; // fast path for most-used samplers (branchless)
uint32_t approximate_slerp; // zeux's onlerp (polynomial approximation of slerp via adjusted nlerp)
uint32_t quat_slerp_count;
struct animation_sampler *quat_slerp_samplers;
uint32_t other_count; // fallback path (step, cubic spline, vec4, weights)
struct animation_sampler *other_samplers;
};
// should mostly be cgltf compatible
typedef enum animation_interpolation_type {
animation_interpolation_type_linear,
animation_interpolation_type_step,
animation_interpolation_type_cubic_spline,
// cgltf_interpolation_type_max_enum, used to represent const sampler
// const samplers are pre-evaluated at construction, curr_value is set,
// and they are excluded from the three processing lists
animation_interpolation_type_const
} animation_interpolation_type;
typedef enum animation_value_type {
animation_value_type_vec3,
animation_value_type_quaternion,
animation_value_type_vec4,
animation_value_type_weights
} animation_value_type;
// Per-sampler runtime state (hot-write, contiguous stream).
// Split from animation_sampler to separate hot-write state from readonly metadata,
// reducing false sharing and improving cache utilization (see cache-analysis.md).
// Allocator over-allocates curr_value to value_size floats (padded to 4 for SIMD).
struct animation_sampler_state {
// current keyframe index hint for temporal-coherent linear scan
// process_frame walks forward/backward from this index
uint32_t curr_key_index;
// set to 1 when the sampled value changes during the current process_frame()
// call, explicitly reset to 0 when the sampled output is unchanged
uint32_t value_changed;
float curr_value[0]; // interpolated output (flexible array, size = value_size)
};
// 36 bytes on wasm32 (was 44 before split-state)
// Readonly after construction — all per-frame mutation goes through state pointer.
struct animation_sampler {
animation_interpolation_type interpolation;
animation_value_type value_type;
uint32_t frame_count;
// output element count per keyframe:
// linear/step: component count (3 for vec3, 4 for quat, N for weights)
// cubic_spline: 3 * component count (in-tangent + value + out-tangent per GLTF spec)
uint32_t value_size;
float min_frame;
float max_frame;
// pointer to runtime state (curr_key_index, value_changed, curr_value)
struct animation_sampler_state *state;
// input: keyframe timestamps (float seconds, sorted ascending)
float *frames;
// output: keyframe values (tightly packed, stride = value_size)
// vec3 linear SIMD fast paths use 16-byte glmm_load(), so construction must
// guarantee one extra float of safe overread at the end of the vec3 stream
float *values;
};
Recommended vec3 overread-safe packing strategy
For a first TS implementation, the simplest safe rule is:
- Keep vec3 keyframes tightly packed as 3 floats per keyframe for ABI compatibility.
- When allocating the
valuesblock for a vec3 linear sampler, reserve one extra float after the final keyframe. - Initialize that extra float to
0. - Do this per vec3 sampler, not just once globally, so every sampler remains independently relocatable and serializable.
This preserves the current runtime ABI while making every 16-byte glmm_load() stay within allocated memory.
Sampler array layout
The samplers array is contiguous and sorted by evaluation category:
samplers[0 .. vec3_linear_count-1] → vec3_linear
samplers[vec3_linear_count .. vec3_linear_count+quat_slerp_count-1] → quat_slerp
samplers[... .. ...+other_count-1] → other
The three sub-pointers point into this array:
vec3_linear_samplers = &samplers[0]quat_slerp_samplers = &samplers[vec3_linear_count]other_samplers = &samplers[vec3_linear_count + quat_slerp_count]sampler_count = vec3_linear_count + quat_slerp_count + other_count
This contiguous-array invariant is required by relocate(): it iterates samplers[0..sampler_count) and assumes the three category pointers are subranges into that single array, not separately allocated sampler blocks.
Const samplers (animation_interpolation_type_const) are pre-evaluated at construction time: their curr_value is set once and they are excluded from all three processing lists. They remain in the samplers array for relocation but are never re-evaluated.
Data layout (low-high)
Inside wasm linear memory, each animation system occupies a contiguous block:
[stack] ← WASM stack (256 bytes, addresses [0, 256), grows downward)
[header] ← animation_system_header (12 bytes)
[frame data] ← shared keyframe timestamps (sampler_frame_data)
[state stream] ← contiguous sampler states (16-byte aligned entries)
[system] ← animation_system struct (44 bytes)
[samplers] ← sampler array (sorted: vec3_linear, quat_slerp, other)
[keyframes] ← per-sampler frames[] and values[] arrays
The WASM stack is configured to 256 bytes (-z stack-size=256), used only for float prev[4] arrays in vec3/quat evaluators (64-byte stack frame) when comparing old vs new values via pointer-taking vec3_equal/vec4_equal. The evaluate_other SIMD path avoids the stack entirely by caching old values in v128 locals (wasm operand stack → JIT registers).
All parts of a single animation system must be in a contiguous memory block. Multiple animation systems can share the same wasm memory (and instance) as long as their blocks don’t overlap. Data starts at address 256 (after the stack).
Api designing
C api
// Evaluate all samplers at curr_frame (in seconds).
// Returns number of samplers whose value_changed flag was set to 1 during this
// call; samplers whose sampled output is unchanged are explicitly reset to 0.
// Processes vec3_linear, quat_slerp, then other. Const samplers are skipped.
uint32_t process_frame(float curr_frame, struct animation_system *sys);
// Adjust all internal pointers by offset (new_base - old_base).
// Used after deserializing/copying a memory block to a different address.
// Fixes: animation_system ptr, sampler_frame_data, all sampler sub-arrays,
// and for each sampler: state, frames, values pointers.
// Requires sys->samplers to be the base of one contiguous sampler array that
// contains all vec3_linear/quat_slerp/other sampler subranges.
void relocate(struct animation_system_header *header, intptr_t offset);
And js should fetch data directly from heap.
JS API
class CompactAnimationSystem {
private _instance: WebAssembly.Instance;
private _memory: WebAssembly.Memory;
private _headerPtr: number; // pointer to animation_system_header in wasm memory
private _systemPtr: number; // pointer to animation_system
private _targets: NodeTarget[];
// Shared typed array views over wasm memory (invalidated on memory.grow)
private _u32: Uint32Array;
private _f32: Float32Array;
// Called by RuntimeAnimation.setValue via the animation's value setter
set frame(value: number): void {
// calls wasm process_frame, then iterates targets
}
// Owned AnimationGroup created by createAnimationGroup().
// Null before creation and after disposal.
animationGroup: AnimationGroup | null;
}
// Not a class, to avoid per-instance overhead
// IMPORTANT: Create() mutates the target's sampler index fields (translation,
// rotation, scale, weights) to store sorted indices. Callers must pass
// transient/cloned target objects, not shared originals.
interface NodeTarget {
node: Node;
morph?: MorphTargetManager;
// sampler index into the global `samplers` array, -1 if no channel.
// These are NOT indices local to vec3_linear/quat_slerp/other subarrays.
translation: number;
rotation: number;
scale: number;
weights: number;
}
Ownership / disposal semantics
createAnimationGroup() establishes bidirectional lifetime coupling between the returned Babylon.js AnimationGroup and the CompactAnimationSystem:
- Disposing the
CompactAnimationSystemdisposes its ownedAnimationGroup - Disposing the returned
AnimationGroupalso disposes theCompactAnimationSystem - The coupling must be recursion-safe:
CompactAnimationSystem.dispose()marks the system disposed before it callsAnimationGroup.dispose(), while the hooked group-dispose path only calls back into the system when the system is not already disposing - This is required for GLTF loader integration, because loader users normally only receive the
AnimationGroup - This is also required for scene teardown, because Babylon.js scene disposal releases animation groups through
AnimationGroup.dispose()and does not know about the compact-system WASM allocation directly
If createAnimationGroup() is called more than once on the same system, it returns the already-owned group instead of creating a second AnimationGroup. This preserves the one-system ↔ one-group ownership model.
As a consequence, releasing the returned AnimationGroup is sufficient to free the compact animation WASM block and target references.
For glTF weights, the current loader implementation may create multiple weight-only targets that all reference the same compact sampler index — one per primitive mesh with a compatible morphTargetManager. This preserves Babylon.js glTF-loader behavior where one node-level weights track fans out to every compatible primitive under that node.
Serialization
AnimationSystem should be serializable, where used memory block and base pointer serialized, when deserialized, the memory block is put into the new area, and the relocate function is used to move the pointers in the memory block.
Deserialization is not supported before explicitly called by user to patch AnimationGroup.Parse.
Animation process
scene._animate() →
animatable._animate() →
RuntimeAnimation.animate() →
animation._interpolate() (This makes a dummy animation whose frame is babylon.js frame, and value is gltf frame ) →
RuntimeAnimation.setValue() →
AnimationSystem.set frame() (setter implicitly called by setValue) →
kernel.process_frame() →
Iterate targets and fetch sampler value and set to babylon.js object
Concept Mapping
1 WebAssembly.Memory – 1 WebAssembly.Instance – 1 or many AnimationSystem
1 GLTF animation – 1 AnimationGroup – 1 TargetedAnimation { target: 1 AnimationSystem, animation: 1 Animation } – 1 RuntimeAnimation – 1 Animatable
Also note that if the gltf contains currently unsupported channels or samplers, the AnimationGroup might contains more BABYLON.TargetedAnimation for unsupported channels or samplers.
WASM Feature Flags
- SIMD128: 128-bit SIMD for vectorized interpolation (vec3_lerp, quat_slerp, vecn_lerp, cubic_spline)
- Bulk Memory Operations:
memory.copyreplaces byte-loop memcpy, used in value comparison and state updates - Non-trapping Float-to-Int Conversions:
i32.trunc_sat_f32_seliminates trap-check branches in float-to-int casts - Sign-extension Operators:
i64.extend_i32_setc. for efficient sign-extended loads
All four features are baseline and supported in all modern browsers (Chrome 91+, Firefox 89+, Safari 15+).
Benchmarking
Use the stress test model with minimal draw calls, collect FPS and heap memory (firefox heap memory can only be measured via devtools, chrome can use performance.memory api.