Note
This is a very early draft, and could be changed in future, and might not be implemented. This post is crafted with the help of AI, especially the algorithm part.
Motivation
BoundingInfo is the most frequently called object in the render loop.
update() runs once per mesh per frame, and isInFrustum() runs once per
submesh per frame. For a scene with 10,000 meshes at 60 fps, these two methods
alone account for 1.2 million calls per second.
The current implementation has two costs:
-
Allocation overhead. Each
BoundingInfoeagerly constructs aBoundingBox
(8 localVector3corners, 8 worldVector3corners, center, extendSize,
directions, etc.) and aBoundingSphere(center, centerWorld, minimum, maximum).
That is ~40Vector3heap objects per mesh, putting GC pressure on scenes with
many meshes.
-
Algorithmic overhead. The
BoundingBox._updatemethod transforms all 8
corners through the world matrix usingVector3.TransformCoordinatesToRef, then
reduces them withminimizeInPlace/maximizeInPlace. The frustum check in
BoundingBox.isInFrustumtests each of the 8 world corners against each of the
6 frustum planes (up to 48 dot products).
Both costs are well-studied in real-time graphics. The cglm library (used by
hundreds of C/OpenGL projects) provides battle-tested replacements:
- Arvo’s AABB transform (
glm_aabb_transform): works directly on the min/max
pair and the three matrix column vectors, computing the new world AABB in ~50 ops
instead of transforming 8 corners (~190 ops). - P-vertex frustum test (
glm_aabb_frustum): for each frustum plane, tests only
the single AABB corner most aligned with the plane normal (the “p-vertex”),
requiring exactly 6 tests instead of up to 48.
Also check here for some previous study on this:
Goals
- Reduce per-mesh bounding data from ~40 heap-allocated
Vector3objects and two
class instances down to a singleFloat32Array(12)plus two scalars. - Replace the 8-corner AABB transform (~190 floating-point ops) with Arvo’s method
(~50 ops), yielding 3.3x measured speedup on theupdate()hot path. - Replace the up-to-48-dot-product frustum check with the p-vertex method (6 plane
tests), yielding 4.1x measured speedup on theisInFrustum()hot path. - Maintain full structural API compatibility with
BoundingInfoso that existing
user code – includingmesh.getBoundingInfo().boundingBox,
boundingInfo.minimum,isInFrustum(),intersects(), and all culling
strategies – continues to work without changes. - Enable adoption as a standalone side-effect import (prototype patch) with no
changes to Babylon.js core source files.
Non-Goals
- Replacing
BoundingInfoas the engine default. This should be an opt-in optimization;
users who do not import the patch keep the existing behavior. - Changing the
ICullableinterface or any public engine API signatures. - Optimizing cold paths such as
SubMesh.clone(),Geometry.clone(), or
Geometry.Parse(), which construct bounding info once during loading. - SIMD / WebAssembly acceleration. The algorithms are scalar JavaScript tuned for
JIT inlining. SIMD could be a follow-up.
Description
Data Layout
All bounding state is stored in a single Float32Array(12):
Offset Content
------ ----------------------------
[0..2] Local AABB min (x, y, z)
[3..5] Local AABB max (x, y, z)
[6..8] World AABB min (x, y, z)
[9..11] World AABB max (x, y, z)
Two additional scalars are stored as class fields:
_localRadius: number– half-diagonal of the local AABB_radiusWorld: number– world-space bounding sphere radius
A reference to the current world matrix is held for proxy sync and OBB fallback.
Core Algorithms
AABB Transform (Arvo’s method)
For each output axis, scales each matrix column vector by both the local min and
local max component, then accumulates translation + min(a, b) for the new
world min and translation + max(a, b) for the new world max:
worldMin[i] = T[i] + min(col0[i]*lMin.x, col0[i]*lMax.x)
+ min(col1[i]*lMin.y, col1[i]*lMax.y)
+ min(col2[i]*lMin.z, col2[i]*lMax.z)
This is branch-free and uses only multiplies, min, max, and add – all of which
JIT-compile efficiently.
P-vertex Frustum Test
For each of the 6 frustum planes, selects the AABB corner most in the direction
of the plane normal (the p-vertex) using conditional selection:
pVertex[i] = normal[i] > 0 ? worldMax[i] : worldMin[i]
If dot(pVertex, normal) + d < 0, the entire box is outside that plane and the
test returns false immediately. This is the cglm glm_aabb_frustum algorithm.
N-vertex Completely-in-Frustum Test
The dual: selects the corner least aligned with the normal (the n-vertex). If
any n-vertex is outside its plane, the box is not completely inside.
Sphere Radius Heuristic
Matches the existing BoundingSphere._update formula:
radiusWorld = max(|col0+col1+col2|) * localRadius, where each col is the
corresponding matrix column sum component.
Hot Path Analysis
| Call Site | Method | Frequency |
|---|---|---|
scene._evaluateSubMesh |
isInFrustum() |
per submesh per frame |
abstractMesh._updateBoundingInfo |
update() |
per mesh per frame |
renderingGroup transparent sort |
boundingSphere.centerWorld |
per transparent mesh per frame |
scene active mesh evaluation |
isInFrustum() |
per mesh per frame |
Alternatives Considered
-
Optimize
BoundingBox._updatein place. Would improve transform speed but
cannot reduce memory footprint without breaking the publicBoundingBoxAPI
that exposesvectors,vectorsWorld,center,extendSize, etc. as
Vector3instances. -
SoA (Structure of Arrays) layout across all meshes. Maximum cache
efficiency but requires invasive changes to the scene graph, violates the
per-mesh ownership model, and is incompatible with the existing
mesh.getBoundingInfo()API. -
WebAssembly / SIMD. Could yield further gains but adds build complexity,
has a call overhead for small batches, and doesn’t address the memory
bloat. Can be layered on top of this proposal later. -
Do nothing. The current implementation is correct and well-tested. However,
scenes with 10K+ meshes spend a measurable fraction of frame time in bounding
info update and culling. The 3-4x speedup on these paths directly translates
to more headroom for application logic and rendering.

