Draft: Fast BoundingInfo

kzhsw · March 13, 2026, 8:09am

Note

This is a very early draft, and could be changed in future, and might not be implemented. This post is crafted with the help of AI, especially the algorithm part.

Motivation

BoundingInfo is the most frequently called object in the render loop.
update() runs once per mesh per frame, and isInFrustum() runs once per
submesh per frame. For a scene with 10,000 meshes at 60 fps, these two methods
alone account for 1.2 million calls per second.

The current implementation has two costs:

Allocation overhead. Each BoundingInfo eagerly constructs a BoundingBox
(8 local Vector3 corners, 8 world Vector3 corners, center, extendSize,
directions, etc.) and a BoundingSphere (center, centerWorld, minimum, maximum).
That is ~40 Vector3 heap objects per mesh, putting GC pressure on scenes with
many meshes.

A BoundingInfo retained 2.4KB on chrome heap1308×201 7.38 KB
Algorithmic overhead. The BoundingBox._update method transforms all 8
corners through the world matrix using Vector3.TransformCoordinatesToRef, then
reduces them with minimizeInPlace / maximizeInPlace. The frustum check in
BoundingBox.isInFrustum tests each of the 8 world corners against each of the
6 frustum planes (up to 48 dot products).

_updateBoundingInfo costs a total of 15% the CPU time with Animation Stress Test Model1335×377 31.3 KB

Both costs are well-studied in real-time graphics. The cglm library (used by
hundreds of C/OpenGL projects) provides battle-tested replacements:

Arvo’s AABB transform (glm_aabb_transform): works directly on the min/max
pair and the three matrix column vectors, computing the new world AABB in ~50 ops
instead of transforming 8 corners (~190 ops).
P-vertex frustum test (glm_aabb_frustum): for each frustum plane, tests only
the single AABB corner most aligned with the plane normal (the “p-vertex”),
requiring exactly 6 tests instead of up to 48.

Also check here for some previous study on this:

Goals

Reduce per-mesh bounding data from ~40 heap-allocated Vector3 objects and two
class instances down to a single Float32Array(12) plus two scalars.
Replace the 8-corner AABB transform (~190 floating-point ops) with Arvo’s method
(~50 ops), yielding 3.3x measured speedup on the update() hot path.
Replace the up-to-48-dot-product frustum check with the p-vertex method (6 plane
tests), yielding 4.1x measured speedup on the isInFrustum() hot path.
Maintain full structural API compatibility with BoundingInfo so that existing
user code – including mesh.getBoundingInfo().boundingBox,
boundingInfo.minimum, isInFrustum(), intersects(), and all culling
strategies – continues to work without changes.
Enable adoption as a standalone side-effect import (prototype patch) with no
changes to Babylon.js core source files.

Non-Goals

Replacing BoundingInfo as the engine default. This should be an opt-in optimization;
users who do not import the patch keep the existing behavior.
Changing the ICullable interface or any public engine API signatures.
Optimizing cold paths such as SubMesh.clone(), Geometry.clone(), or
Geometry.Parse(), which construct bounding info once during loading.
SIMD / WebAssembly acceleration. The algorithms are scalar JavaScript tuned for
JIT inlining. SIMD could be a follow-up.

Description

Data Layout

All bounding state is stored in a single Float32Array(12):

Offset  Content
------  ----------------------------
[0..2]  Local AABB min (x, y, z)
[3..5]  Local AABB max (x, y, z)
[6..8]  World AABB min (x, y, z)
[9..11] World AABB max (x, y, z)

Two additional scalars are stored as class fields:

_localRadius: number – half-diagonal of the local AABB
_radiusWorld: number – world-space bounding sphere radius

A reference to the current world matrix is held for proxy sync and OBB fallback.

Core Algorithms

AABB Transform (Arvo’s method)

For each output axis, scales each matrix column vector by both the local min and
local max component, then accumulates translation + min(a, b) for the new
world min and translation + max(a, b) for the new world max:

worldMin[i] = T[i] + min(col0[i]*lMin.x, col0[i]*lMax.x)
                    + min(col1[i]*lMin.y, col1[i]*lMax.y)
                    + min(col2[i]*lMin.z, col2[i]*lMax.z)

This is branch-free and uses only multiplies, min, max, and add – all of which
JIT-compile efficiently.

P-vertex Frustum Test

For each of the 6 frustum planes, selects the AABB corner most in the direction
of the plane normal (the p-vertex) using conditional selection:

pVertex[i] = normal[i] > 0 ? worldMax[i] : worldMin[i]

If dot(pVertex, normal) + d < 0, the entire box is outside that plane and the
test returns false immediately. This is the cglm glm_aabb_frustum algorithm.

N-vertex Completely-in-Frustum Test

The dual: selects the corner least aligned with the normal (the n-vertex). If
any n-vertex is outside its plane, the box is not completely inside.

Sphere Radius Heuristic

Matches the existing BoundingSphere._update formula:
radiusWorld = max(|col0+col1+col2|) * localRadius, where each col is the
corresponding matrix column sum component.

Hot Path Analysis

Call Site	Method	Frequency
`scene._evaluateSubMesh`	`isInFrustum()`	per submesh per frame
`abstractMesh._updateBoundingInfo`	`update()`	per mesh per frame
`renderingGroup` transparent sort	`boundingSphere.centerWorld`	per transparent mesh per frame
`scene` active mesh evaluation	`isInFrustum()`	per mesh per frame

Alternatives Considered

Optimize BoundingBox._update in place. Would improve transform speed but
cannot reduce memory footprint without breaking the public BoundingBox API
that exposes vectors, vectorsWorld, center, extendSize, etc. as
Vector3 instances.
SoA (Structure of Arrays) layout across all meshes. Maximum cache
efficiency but requires invasive changes to the scene graph, violates the
per-mesh ownership model, and is incompatible with the existing
mesh.getBoundingInfo() API.
WebAssembly / SIMD. Could yield further gains but adds build complexity,
has a call overhead for small batches, and doesn’t address the memory
bloat. Can be layered on top of this proposal later.
Do nothing. The current implementation is correct and well-tested. However,
scenes with 10K+ meshes spend a measurable fraction of frame time in bounding
info update and culling. The 3-4x speedup on these paths directly translates
to more headroom for application logic and rendering.

Joe_Kerr · March 13, 2026, 9:02am

The (AI?) text mentions “measured speedup”. So does that mean you have working code? If so, can you put it into some benchmark playgrounds? I think this would strongly strengthen your case for a PR.

You refer to this textual, non-code post, here, right? The actual code is/will be written entirely be you or taken from known human sources**?

**FYI: this is for copyright reasons. If AI used copyrighted code (e.g. proprietary, GPL), but this is not disclosed by AI, and this code ends up in the Babylon repo, there would be a copyright violation. Worse, AFAIK, we users would “inherit” this violation.

kzhsw · March 13, 2026, 9:05am

Tested with nodejs for the algorithm, code is generated by AI.
Node.js 22.16.0 (x64)

Test: Arvo vs 8-corner (identity) - OK
Test: Arvo vs 8-corner (translation) - OK
Test: Arvo vs 8-corner (scaling) - OK
Test: Arvo vs 8-corner (90° rotation) - OK
Test: Arvo vs 8-corner (SRT) - OK
Test: frustum check (inside) - OK
Test: frustum check (outside) - OK
Test: frustum check (partial) - OK
Test: random stress (10000 transforms) - OK

Benchmark (1000000 transforms):
  8-corner: 412.30 ms
  Arvo:     103.93 ms
  Speedup:  3.97x

Benchmark (1000000 frustum checks):
  8-corner: 399.25 ms
  p-vertex: 41.55 ms
  Speedup:  9.61x

34 passed, 0 failed

test-fast-bounding-info-standalone.zip (2.9 KB)

kzhsw · March 16, 2026, 8:22am

I explictly asked AI to reference code of cglm to make this test, which is MIT licensed.

Deltakosh · March 16, 2026, 3:41pm

Nice idea. I really like the plan so far

kzhsw · March 17, 2026, 12:38am

Just for reference, three.js uses 8-corner AABB transform with P-vertex frustum test

kzhsw · March 27, 2026, 3:16am

Another thing to consider that there is so many code accessed BoundingInfo.boundingBox for only minimumWorld and maximumWorld, and BoundingInfo.boundingSphere for only centerWorld and center, this proposal would have more performance gain after these moved to direct getters on BoundingInfo

Pryme8 · March 27, 2026, 5:13am

You are proposing some drastic gains, I hope this is valid and can land.

Given the purposed speed increase this should be given some consideration because it’s impact would be massive.

Deltakosh · March 27, 2026, 4:05pm

Wanna do a PR?

kzhsw · March 30, 2026, 1:37am

I’m kind of hesitate on this, adding getters makes the code using BoundingInfo less dependent on underlying boundingBox/ boundingSphere, allowing more opportunity for future optimizations like this draft, but adding getters adds more performance cost before optimizations being done, and could make existing code slower because of that. Modern jit can inline these small functions but it’s not stable, deoptimizations happen all the time, no #[inline(always)] or @ForceInline instruction can ensure it, so potential performance loss would be expected.

Topic		Replies	Views
Frustum check in simd Off topic performance , math , boundingbox , simd	2	92	December 26, 2024
New feature: BoundingInfoHelper Announcements	14	397	April 3, 2025
How to update bounding info properly? Questions	10	2384	June 21, 2021
Correct/performant way to build my own mesh/geometry? Questions	12	1923	April 28, 2023
Check `isInFrustum` but with a percentage of the camera view Questions	16	2353	October 12, 2022