Draft: Fast BoundingInfo

Note

This is a very early draft, and could be changed in future, and might not be implemented. This post is crafted with the help of AI, especially the algorithm part.

Motivation

BoundingInfo is the most frequently called object in the render loop.
update() runs once per mesh per frame, and isInFrustum() runs once per
submesh per frame. For a scene with 10,000 meshes at 60 fps, these two methods
alone account for 1.2 million calls per second.

The current implementation has two costs:

  1. Allocation overhead. Each BoundingInfo eagerly constructs a BoundingBox
    (8 local Vector3 corners, 8 world Vector3 corners, center, extendSize,
    directions, etc.) and a BoundingSphere (center, centerWorld, minimum, maximum).
    That is ~40 Vector3 heap objects per mesh, putting GC pressure on scenes with
    many meshes.

  2. Algorithmic overhead. The BoundingBox._update method transforms all 8
    corners through the world matrix using Vector3.TransformCoordinatesToRef, then
    reduces them with minimizeInPlace / maximizeInPlace. The frustum check in
    BoundingBox.isInFrustum tests each of the 8 world corners against each of the
    6 frustum planes (up to 48 dot products).

Both costs are well-studied in real-time graphics. The cglm library (used by
hundreds of C/OpenGL projects) provides battle-tested replacements:

  • Arvo’s AABB transform (glm_aabb_transform): works directly on the min/max
    pair and the three matrix column vectors, computing the new world AABB in ~50 ops
    instead of transforming 8 corners (~190 ops).
  • P-vertex frustum test (glm_aabb_frustum): for each frustum plane, tests only
    the single AABB corner most aligned with the plane normal (the “p-vertex”),
    requiring exactly 6 tests instead of up to 48.

Also check here for some previous study on this:

Goals

  • Reduce per-mesh bounding data from ~40 heap-allocated Vector3 objects and two
    class instances down to a single Float32Array(12) plus two scalars.
  • Replace the 8-corner AABB transform (~190 floating-point ops) with Arvo’s method
    (~50 ops), yielding 3.3x measured speedup on the update() hot path.
  • Replace the up-to-48-dot-product frustum check with the p-vertex method (6 plane
    tests), yielding 4.1x measured speedup on the isInFrustum() hot path.
  • Maintain full structural API compatibility with BoundingInfo so that existing
    user code – including mesh.getBoundingInfo().boundingBox,
    boundingInfo.minimum, isInFrustum(), intersects(), and all culling
    strategies – continues to work without changes.
  • Enable adoption as a standalone side-effect import (prototype patch) with no
    changes to Babylon.js core source files.

Non-Goals

  • Replacing BoundingInfo as the engine default. This should be an opt-in optimization;
    users who do not import the patch keep the existing behavior.
  • Changing the ICullable interface or any public engine API signatures.
  • Optimizing cold paths such as SubMesh.clone(), Geometry.clone(), or
    Geometry.Parse(), which construct bounding info once during loading.
  • SIMD / WebAssembly acceleration. The algorithms are scalar JavaScript tuned for
    JIT inlining. SIMD could be a follow-up.

Description

Data Layout

All bounding state is stored in a single Float32Array(12):

Offset  Content
------  ----------------------------
[0..2]  Local AABB min (x, y, z)
[3..5]  Local AABB max (x, y, z)
[6..8]  World AABB min (x, y, z)
[9..11] World AABB max (x, y, z)

Two additional scalars are stored as class fields:

  • _localRadius: number – half-diagonal of the local AABB
  • _radiusWorld: number – world-space bounding sphere radius

A reference to the current world matrix is held for proxy sync and OBB fallback.

Core Algorithms

AABB Transform (Arvo’s method)

For each output axis, scales each matrix column vector by both the local min and
local max component, then accumulates translation + min(a, b) for the new
world min and translation + max(a, b) for the new world max:

worldMin[i] = T[i] + min(col0[i]*lMin.x, col0[i]*lMax.x)
                    + min(col1[i]*lMin.y, col1[i]*lMax.y)
                    + min(col2[i]*lMin.z, col2[i]*lMax.z)

This is branch-free and uses only multiplies, min, max, and add – all of which
JIT-compile efficiently.

P-vertex Frustum Test

For each of the 6 frustum planes, selects the AABB corner most in the direction
of the plane normal (the p-vertex) using conditional selection:

pVertex[i] = normal[i] > 0 ? worldMax[i] : worldMin[i]

If dot(pVertex, normal) + d < 0, the entire box is outside that plane and the
test returns false immediately. This is the cglm glm_aabb_frustum algorithm.

N-vertex Completely-in-Frustum Test

The dual: selects the corner least aligned with the normal (the n-vertex). If
any n-vertex is outside its plane, the box is not completely inside.

Sphere Radius Heuristic

Matches the existing BoundingSphere._update formula:
radiusWorld = max(|col0+col1+col2|) * localRadius, where each col is the
corresponding matrix column sum component.

Hot Path Analysis

Call Site Method Frequency
scene._evaluateSubMesh isInFrustum() per submesh per frame
abstractMesh._updateBoundingInfo update() per mesh per frame
renderingGroup transparent sort boundingSphere.centerWorld per transparent mesh per frame
scene active mesh evaluation isInFrustum() per mesh per frame

Alternatives Considered

  1. Optimize BoundingBox._update in place. Would improve transform speed but
    cannot reduce memory footprint without breaking the public BoundingBox API
    that exposes vectors, vectorsWorld, center, extendSize, etc. as
    Vector3 instances.

  2. SoA (Structure of Arrays) layout across all meshes. Maximum cache
    efficiency but requires invasive changes to the scene graph, violates the
    per-mesh ownership model, and is incompatible with the existing
    mesh.getBoundingInfo() API.

  3. WebAssembly / SIMD. Could yield further gains but adds build complexity,
    has a call overhead for small batches, and doesn’t address the memory
    bloat. Can be layered on top of this proposal later.

  4. Do nothing. The current implementation is correct and well-tested. However,
    scenes with 10K+ meshes spend a measurable fraction of frame time in bounding
    info update and culling. The 3-4x speedup on these paths directly translates
    to more headroom for application logic and rendering.

4 Likes

The (AI?) text mentions “measured speedup”. So does that mean you have working code? If so, can you put it into some benchmark playgrounds? I think this would strongly strengthen your case for a PR.

You refer to this textual, non-code post, here, right? The actual code is/will be written entirely be you or taken from known human sources**?

**FYI: this is for copyright reasons. If AI used copyrighted code (e.g. proprietary, GPL), but this is not disclosed by AI, and this code ends up in the Babylon repo, there would be a copyright violation. Worse, AFAIK, we users would “inherit” this violation.

Tested with nodejs for the algorithm, code is generated by AI.
Node.js 22.16.0 (x64)

Test: Arvo vs 8-corner (identity) - OK
Test: Arvo vs 8-corner (translation) - OK
Test: Arvo vs 8-corner (scaling) - OK
Test: Arvo vs 8-corner (90° rotation) - OK
Test: Arvo vs 8-corner (SRT) - OK
Test: frustum check (inside) - OK
Test: frustum check (outside) - OK
Test: frustum check (partial) - OK
Test: random stress (10000 transforms) - OK

Benchmark (1000000 transforms):
  8-corner: 412.30 ms
  Arvo:     103.93 ms
  Speedup:  3.97x

Benchmark (1000000 frustum checks):
  8-corner: 399.25 ms
  p-vertex: 41.55 ms
  Speedup:  9.61x

34 passed, 0 failed

test-fast-bounding-info-standalone.zip (2.9 KB)

3 Likes

I explictly asked AI to reference code of cglm to make this test, which is MIT licensed.

Nice idea. I really like the plan so far

Just for reference, three.js uses 8-corner AABB transform with P-vertex frustum test

Another thing to consider that there is so many code accessed BoundingInfo.boundingBox for only minimumWorld and maximumWorld, and BoundingInfo.boundingSphere for only centerWorld and center, this proposal would have more performance gain after these moved to direct getters on BoundingInfo

1 Like

You are proposing some drastic gains, I hope this is valid and can land.

Given the purposed speed increase this should be given some consideration because it’s impact would be massive.

Wanna do a PR?

I’m kind of hesitate on this, adding getters makes the code using BoundingInfo less dependent on underlying boundingBox/ boundingSphere, allowing more opportunity for future optimizations like this draft, but adding getters adds more performance cost before optimizations being done, and could make existing code slower because of that. Modern jit can inline these small functions but it’s not stable, deoptimizations happen all the time, no #[inline(always)] or @ForceInline instruction can ensure it, so potential performance loss would be expected.

1 Like