Use matrixData for thin instance picking

Summary

The thin-instance picking path in InternalPick() and InternalMultiPick() currently calls thinInstanceGetWorldMatrices(), which lazily allocates a cached Matrix[] by calling Matrix.FromArray() once per thin instance. That hidden allocation cost shows up the first time thin-instance picking is used on a mesh and can materially increase GC pressure and peak memory for large thin-instance sets.

This proposal recommends adding an internal fast path for picking that reads _thinInstanceDataStorage.matrixData directly, hydrating a temporary matrix with Matrix.FromArrayToRef() and then multiplying it with the mesh world matrix into another temporary matrix from TmpVectors.

That change should:

  • remove implicit Matrix[] allocation from the thin-instance picking path,
  • reduce GC churn and peak memory when thin-instance picking is enabled,
  • make picking observe the current CPU-side thin-instance matrix buffer instead of a stale cached worldMatrices snapshot,
  • better match dynamic thin-instance workflows that mutate matrixData in place and then call thinInstanceBufferUpdated("matrix").

Current behavior

thinInstanceGetWorldMatrices allocates on first use

thinInstanceGetWorldMatrices() currently works like this:

  1. Read matrixData.
  2. If worldMatrices is missing, allocate a Matrix[].
  3. Fill that array by calling Matrix.FromArray() once per instance.
  4. Return the cached array.

That means a pick against a mesh with many thin instances can trigger many implicit heap allocations, even though the picker only needs one matrix at a time.

ray.core.ts currently materializes the full matrix list

The thin-instance branches in:

currently do this:

  1. Call mesh.thinInstanceGetWorldMatrices().
  2. Loop through the returned Matrix[].
  3. Multiply each thin-instance matrix by the mesh world matrix into one temporary matrix.
  4. Call the picker.

That is convenient, but it eagerly exposes the entire thin-instance matrix set as Matrix objects when the picking loop only needs the current matrix.

PickingInfo also goes through thinInstanceGetWorldMatrices

PickingInfo still uses thinInstanceGetWorldMatrices() when transforming normals for a picked thin instance.

So even if the main pick loop stops allocating, later calls such as getNormal() can still instantiate the cached Matrix[] unless this path is updated too.

The cache is not the authoritative dynamic source of truth

The cached worldMatrices array is only partially maintained:

So if user code mutates matrixData in place and then calls thinInstanceBufferUpdated("matrix"), rendering sees the updated transforms, but thinInstanceGetWorldMatrices() can still return stale Matrix objects that were created earlier.

Goal

Add an internal fast path for picking-time access to thin-instance transforms, using direct reads from matrixData, without changing or deprecating the public thinInstanceGetWorldMatrices() API.

Recommendation

For thin-instance picking internals, add a direct-buffer fast path that reads _thinInstanceDataStorage.matrixData and hydrates only the current thin-instance matrix via Matrix.FromArrayToRef().

The first implementation should update these internal call sites:

The public thinInstanceGetWorldMatrices() method should remain available and supported for callers that explicitly want a Matrix[] view.

Recommended implementation approach

ray.core.ts fast path

In the thin-instance loops in InternalPick() and InternalMultiPick():

  1. Resolve const matrixData = mesh._thinInstanceDataStorage.matrixData.
  2. If matrixData is missing, preserve current fallback behavior.
  3. Use one temporary matrix from TmpVectors.Matrix for the current thin-instance local matrix.
  4. Use one temporary matrix from TmpVectors.Matrix for the combined thin-instance world matrix.
  5. For each instance:
    • load the current instance transform with Matrix.FromArrayToRef(matrixData, index * 16, thinMatrix),
    • multiply into the combined matrix with thinMatrix.multiplyToRef(world, combinedWorld),
    • call the picker with combinedWorld.

Conceptually:

const matrixData = mesh._thinInstanceDataStorage.matrixData;
const thinMatrix = TmpVectors.Matrix[0];
const combinedWorld = TmpVectors.Matrix[1];

for (let index = 0; index < mesh.thinInstanceCount; index++) {
    Matrix.FromArrayToRef(matrixData, index * 16, thinMatrix);
    thinMatrix.multiplyToRef(world, combinedWorld);
    const result = picker(pickingInfo, rayFunction, mesh, combinedWorld, fastCheck, onlyBoundingInfo, trianglePredicate, true);
    // existing result handling
}

The exact TmpVectors.Matrix slots can be chosen during implementation, but the important part is to reuse temporaries instead of materializing a Matrix[].

PickingInfo normal-transform fast path

PickingInfo should also stop calling thinInstanceGetWorldMatrices() when thinInstanceIndex !== -1.

Instead:

  1. Read matrixData from the picked mesh.
  2. Load the picked thin-instance matrix into a temporary matrix with Matrix.FromArrayToRef.
  3. Use that temporary matrix in Vector3.TransformNormalToRef.

That avoids a follow-up lazy Matrix[] allocation after a successful pick.

Why FromArrayToRef is the right primitive

Matrix.FromArray() allocates a new Matrix.
Matrix.FromArrayToRef() copies the same data into an existing Matrix.

For picking, only the second behavior is needed.

Behavior and compatibility

No public API change or deprecation

This proposal does not remove, deprecate, or redesign the public thinInstanceGetWorldMatrices() API.

It only adds a faster internal path for picking code that does not need a full Matrix[] materialization.

Better alignment with dynamic CPU-side thin-instance updates

By reading matrixData directly, picking will more closely track:

That is a correctness improvement for dynamic thin-instance workflows that treat matrixData as the CPU-side source of truth.

Explicit limitation: GPU-only partial updates remain out of scope

thinInstancePartialBufferUpdate() explicitly allows GPU updates that do not also update the CPU-side buffer.

If a caller updates only GPU data and not matrixData, direct CPU-side reads in picking still cannot see that change. This proposal does not attempt to change that contract.

Scope and non-goals

In scope

  • internal thin-instance picking in ray.core.ts,
  • thin-instance normal transform in pickingInfo.ts,
  • removing implicit Matrix[] allocation from those paths.

Out of scope

  • removing or deprecating the public thinInstanceGetWorldMatrices() API,
  • changing non-picking users such as the navigation plugin in recastJSPlugin.ts,
  • changing the semantics of thinInstancePartialBufferUpdate(),
  • redesigning thin-instance rendering or buffer ownership.

Risks

Key implementation risks are:

  • relying on internal storage layout through _thinInstanceDataStorage,
  • using TmpVectors.Matrix slots that accidentally conflict with nearby temporary-matrix usage,
  • missing a remaining picking-related thinInstanceGetWorldMatrices() call site and still allocating through that path,
  • incorrectly handling the matrixData == null fallback,
  • overstating the dynamic-sync improvement in cases where only GPU state was updated.

Performance impact

Expected runtime impact:

  • lower GC pressure because the fast path no longer materializes one Matrix per thin instance,
  • lower peak memory because picking no longer needs to hold a full cached Matrix[] solely to inspect current thin-instance transforms,
  • neutral or slightly positive CPU cost from replacing Matrix.FromArray() object creation with Matrix.FromArrayToRef() into reusable temporaries,
  • better practical behavior for dynamic thin-instance scenes that mutate matrixData directly.

Memory impact

Expected memory impact:

  • reduced transient allocation on first pick through the new fast path,
  • reduced retained memory for scenes that pick thin instances but do not otherwise need worldMatrices,
  • no new steady-state per-instance storage,
  • reuse of existing TmpVectors.Matrix slots rather than new heap allocations.

Implementation handoff

Implementation mode should execute the following steps:

  1. Add a fast path for thin-instance iteration in InternalPick() using direct matrixData reads plus Matrix.FromArrayToRef().
  2. Add the same fast path in InternalMultiPick().
  3. Add a matching fast path for the thin-instance normal-transform path in PickingInfo.
  4. Preserve all existing picking semantics, including fastCheck, onlyBoundingInfo, predicates, and thinInstanceIndex.
  5. Add tests for dynamic matrixData updates followed by thinInstanceBufferUpdated("matrix").
  6. Benchmark allocation, GC, and peak memory in large thin-instance picking scenarios.

It looks good to me: do you want to create a PR for it?

(and if you do not like that ai-generated unit test, i can remove it)

1 Like