Summary
The thin-instance picking path in InternalPick() and InternalMultiPick() currently calls thinInstanceGetWorldMatrices(), which lazily allocates a cached Matrix[] by calling Matrix.FromArray() once per thin instance. That hidden allocation cost shows up the first time thin-instance picking is used on a mesh and can materially increase GC pressure and peak memory for large thin-instance sets.
This proposal recommends adding an internal fast path for picking that reads _thinInstanceDataStorage.matrixData directly, hydrating a temporary matrix with Matrix.FromArrayToRef() and then multiplying it with the mesh world matrix into another temporary matrix from TmpVectors.
That change should:
- remove implicit
Matrix[]allocation from the thin-instance picking path, - reduce GC churn and peak memory when thin-instance picking is enabled,
- make picking observe the current CPU-side thin-instance matrix buffer instead of a stale cached
worldMatricessnapshot, - better match dynamic thin-instance workflows that mutate
matrixDatain place and then callthinInstanceBufferUpdated("matrix").
Current behavior
thinInstanceGetWorldMatrices allocates on first use
thinInstanceGetWorldMatrices() currently works like this:
- Read
matrixData. - If
worldMatricesis missing, allocate aMatrix[]. - Fill that array by calling
Matrix.FromArray()once per instance. - Return the cached array.
That means a pick against a mesh with many thin instances can trigger many implicit heap allocations, even though the picker only needs one matrix at a time.
ray.core.ts currently materializes the full matrix list
The thin-instance branches in:
currently do this:
- Call
mesh.thinInstanceGetWorldMatrices(). - Loop through the returned
Matrix[]. - Multiply each thin-instance matrix by the mesh world matrix into one temporary matrix.
- Call the picker.
That is convenient, but it eagerly exposes the entire thin-instance matrix set as Matrix objects when the picking loop only needs the current matrix.
PickingInfo also goes through thinInstanceGetWorldMatrices
PickingInfo still uses thinInstanceGetWorldMatrices() when transforming normals for a picked thin instance.
So even if the main pick loop stops allocating, later calls such as getNormal() can still instantiate the cached Matrix[] unless this path is updated too.
The cache is not the authoritative dynamic source of truth
The cached worldMatrices array is only partially maintained:
thinInstanceSetMatrixAt()patchesworldMatrices[index]if the cache already exists.thinInstanceSetBuffer("matrix", ...)resetsworldMatricestonull.thinInstanceBufferUpdated("matrix")uploads the current CPU buffer, but does not rebuild cachedMatrixobjects.
So if user code mutates matrixData in place and then calls thinInstanceBufferUpdated("matrix"), rendering sees the updated transforms, but thinInstanceGetWorldMatrices() can still return stale Matrix objects that were created earlier.
Goal
Add an internal fast path for picking-time access to thin-instance transforms, using direct reads from matrixData, without changing or deprecating the public thinInstanceGetWorldMatrices() API.
Recommendation
For thin-instance picking internals, add a direct-buffer fast path that reads _thinInstanceDataStorage.matrixData and hydrates only the current thin-instance matrix via Matrix.FromArrayToRef().
The first implementation should update these internal call sites:
The public thinInstanceGetWorldMatrices() method should remain available and supported for callers that explicitly want a Matrix[] view.
Recommended implementation approach
ray.core.ts fast path
In the thin-instance loops in InternalPick() and InternalMultiPick():
- Resolve
const matrixData = mesh._thinInstanceDataStorage.matrixData. - If
matrixDatais missing, preserve current fallback behavior. - Use one temporary matrix from
TmpVectors.Matrixfor the current thin-instance local matrix. - Use one temporary matrix from
TmpVectors.Matrixfor the combined thin-instance world matrix. - For each instance:
- load the current instance transform with
Matrix.FromArrayToRef(matrixData, index * 16, thinMatrix), - multiply into the combined matrix with
thinMatrix.multiplyToRef(world, combinedWorld), - call the picker with
combinedWorld.
- load the current instance transform with
Conceptually:
const matrixData = mesh._thinInstanceDataStorage.matrixData;
const thinMatrix = TmpVectors.Matrix[0];
const combinedWorld = TmpVectors.Matrix[1];
for (let index = 0; index < mesh.thinInstanceCount; index++) {
Matrix.FromArrayToRef(matrixData, index * 16, thinMatrix);
thinMatrix.multiplyToRef(world, combinedWorld);
const result = picker(pickingInfo, rayFunction, mesh, combinedWorld, fastCheck, onlyBoundingInfo, trianglePredicate, true);
// existing result handling
}
The exact TmpVectors.Matrix slots can be chosen during implementation, but the important part is to reuse temporaries instead of materializing a Matrix[].
PickingInfo normal-transform fast path
PickingInfo should also stop calling thinInstanceGetWorldMatrices() when thinInstanceIndex !== -1.
Instead:
- Read
matrixDatafrom the picked mesh. - Load the picked thin-instance matrix into a temporary matrix with
Matrix.FromArrayToRef. - Use that temporary matrix in
Vector3.TransformNormalToRef.
That avoids a follow-up lazy Matrix[] allocation after a successful pick.
Why FromArrayToRef is the right primitive
Matrix.FromArray() allocates a new Matrix.
Matrix.FromArrayToRef() copies the same data into an existing Matrix.
For picking, only the second behavior is needed.
Behavior and compatibility
No public API change or deprecation
This proposal does not remove, deprecate, or redesign the public thinInstanceGetWorldMatrices() API.
It only adds a faster internal path for picking code that does not need a full Matrix[] materialization.
Better alignment with dynamic CPU-side thin-instance updates
By reading matrixData directly, picking will more closely track:
- in-place edits to
matrixData, - followed by
thinInstanceBufferUpdated("matrix"), - without depending on whether
worldMatriceswas already created and then left stale.
That is a correctness improvement for dynamic thin-instance workflows that treat matrixData as the CPU-side source of truth.
Explicit limitation: GPU-only partial updates remain out of scope
thinInstancePartialBufferUpdate() explicitly allows GPU updates that do not also update the CPU-side buffer.
If a caller updates only GPU data and not matrixData, direct CPU-side reads in picking still cannot see that change. This proposal does not attempt to change that contract.
Scope and non-goals
In scope
- internal thin-instance picking in
ray.core.ts, - thin-instance normal transform in
pickingInfo.ts, - removing implicit
Matrix[]allocation from those paths.
Out of scope
- removing or deprecating the public
thinInstanceGetWorldMatrices()API, - changing non-picking users such as the navigation plugin in
recastJSPlugin.ts, - changing the semantics of
thinInstancePartialBufferUpdate(), - redesigning thin-instance rendering or buffer ownership.
Risks
Key implementation risks are:
- relying on internal storage layout through
_thinInstanceDataStorage, - using
TmpVectors.Matrixslots that accidentally conflict with nearby temporary-matrix usage, - missing a remaining picking-related
thinInstanceGetWorldMatrices()call site and still allocating through that path, - incorrectly handling the
matrixData == nullfallback, - overstating the dynamic-sync improvement in cases where only GPU state was updated.
Performance impact
Expected runtime impact:
- lower GC pressure because the fast path no longer materializes one
Matrixper thin instance, - lower peak memory because picking no longer needs to hold a full cached
Matrix[]solely to inspect current thin-instance transforms, - neutral or slightly positive CPU cost from replacing
Matrix.FromArray()object creation withMatrix.FromArrayToRef()into reusable temporaries, - better practical behavior for dynamic thin-instance scenes that mutate
matrixDatadirectly.
Memory impact
Expected memory impact:
- reduced transient allocation on first pick through the new fast path,
- reduced retained memory for scenes that pick thin instances but do not otherwise need
worldMatrices, - no new steady-state per-instance storage,
- reuse of existing
TmpVectors.Matrixslots rather than new heap allocations.
Implementation handoff
Implementation mode should execute the following steps:
- Add a fast path for thin-instance iteration in
InternalPick()using directmatrixDatareads plusMatrix.FromArrayToRef(). - Add the same fast path in
InternalMultiPick(). - Add a matching fast path for the thin-instance normal-transform path in
PickingInfo. - Preserve all existing picking semantics, including
fastCheck,onlyBoundingInfo, predicates, andthinInstanceIndex. - Add tests for dynamic
matrixDataupdates followed bythinInstanceBufferUpdated("matrix"). - Benchmark allocation, GC, and peak memory in large thin-instance picking scenarios.