GaussianSplattingMesh: reducing buffer copy of removePart

Summary

Add view-backed raw splat source support, starting with compound rebuild internals. Preserve aligned views as-is, and copy only if required by alignment, size, or ownership constraints. The main expected benefit is lower peak memory and GC pressure in compound removePart() / rebuild flows; loader impact is small.

Motivation

GaussianSplattingMeshBase._updateData() is already zero-copy for raw splat bytes when the input is a standalone ArrayBuffer, so the largest remaining avoidable copies are not in the normal load path.

The real problem is in compound retained-source handling:

  • GaussianSplattingMesh._createRetainedPartSource() uses slice(), which copies surviving part data during rebuilds,
  • several read paths assume offset 0 and would misread a true ArrayBufferView,
  • removePart() therefore pays unnecessary temporary copy cost today, especially for large compounds.

The guiding rule for this proposal is:

  • preserve the original view whenever alignment is already valid,
  • copy only if required by typed-array alignment, exact-length constraints, or an ownership boundary that truly needs an isolated buffer.

This leads to a two-phase approach:

  1. add low-risk internal view support for compound rebuilds,
  2. optionally widen public updateData() / updateDataAsync() to accept ArrayBufferView later.

Expected outcome:

  • high memory and GC improvement for removePart() on large compounds,
  • moderate improvement for some addPart() and retained-source rebuild flows,
  • low impact for loader-generated data, because the SPLAT loaders already mostly synthesize fresh packed ArrayBuffers before calling updateData(),
  • no change to the mandatory per-splat processing and texture upload cost in _updateData() / _addPartsInternal().

Current behavior

_updateData() is ArrayBuffer-only and already zero-copy for raw splat bytes

GaussianSplattingMeshBase._updateData() currently accepts data: ArrayBuffer and immediately creates:

const uBuffer = new Uint8Array(data);
const fBuffer = new Float32Array(uBuffer.buffer);

It then stores uBuffer.buffer into _splatsData when RAM retention is enabled.

That means:

  • raw splat data is not copied today when the source is already a standalone ArrayBuffer,
  • but callers cannot safely pass a subrange view without first repacking it into a dedicated buffer,
  • and _shData is still cloned on retain via sh.map((arr) => new Uint8Array(arr)).

So the raw _updateData() path is not the biggest current internal copy site. The larger current win is in compound rebuilds.

compound rebuilds currently materialize copied per-part source buffers

GaussianSplattingMesh._createRetainedPartSource() currently does this:

_splatsData: this._splatsData.slice(splatByteOffset, splatByteOffset + splatByteLength),
_shData: this._shData?.map((texture) => texture.slice(shByteOffset, shByteOffset + shByteLength)) ?? null,

That is a full byte copy for every surviving part during removePart().

Later, _retainMergedPartData() allocates a new merged retained buffer and copies those bytes again into the compound’s new authoritative retained storage.

So removePart() currently pays for:

  • one full temporary retained-source copy of all surviving parts,
  • one full merged retained-buffer copy that is still required by the current single-buffer design.

current “typed array tolerated” code is not true view support

Some comments already acknowledge that callers may have stored a typed array in _splatsData, but the implementation is still buffer-centric.

Examples:

  • _appendSourceToArrays() extracts srcRaw.buffer and then creates new Uint8Array(srcBuffer) and new Float32Array(srcBuffer).
  • _retainMergedPartData() uses getSourceBuffer(data).buffer and then copies from offset 0.
  • compound rebuild paths use new Uint8Array(this._splatsData) and new Float32Array(this._splatsData) directly.

Those patterns work only when the data starts at byte offset 0. They are incorrect for a view with a non-zero byteOffset.

So widening _splatsData to ArrayBufferView without central helper changes would risk silent data corruption.

loader outputs are mostly already fresh packed buffers

The SPLAT loaders are not the main copy hotspot:

  • packages/dev/loaders/src/SPLAT/splatDefs.ts defines IParsedSplat.data as ArrayBuffer.
  • ParseSpz() allocates a new packed splat ArrayBuffer.
  • ParseSogDatas() allocates a new packed splat ArrayBuffer.
  • SPLATFileLoader._ConvertPLYToSplat() returns the original raw-splat ArrayBuffer for .splat input, and fresh converted buffers for parsed PLY paths.
  • SPLATFileLoader then forwards that data to gaussianSplatting.updateData(parsed.data, ...).

So loader-side type widening is reasonable for API consistency, but it is not where the largest memory win lives today.

Goal

Allow exact byte-range views to represent retained raw splat data where that removes avoidable copies, while preserving correct byteOffset handling and keeping the current merged-buffer ownership model intact.

Recommendation

Phase 1: internal retained-part views for compound rebuilds

This phase should be the first implementation target.

Use Uint8Array byte views for transient retained part sources created during compound rebuilds, without changing the public mesh API yet.

Concretely:

  • Keep the mesh-owned _splatsData field as-is for now.
  • Change the internal retained-part source contract so _createRetainedPartSource() can return a byte view instead of a copied ArrayBuffer.
  • Replace slice() with subarray() in _createRetainedPartSource() for both splat bytes and SH bytes.
  • Add central helpers that preserve both byteOffset and byteLength.
  • Keep all downstream reads view-based and only repack if a consumer cannot legally interpret the current offset/length.

Conceptually:

type SplatBytes = Uint8Array;

function getSplatBytes(data: ArrayBuffer | ArrayBufferView): Uint8Array {
    return ArrayBuffer.isView(data) ? new Uint8Array(data.buffer, data.byteOffset, data.byteLength) : new Uint8Array(data);
}

function getSplatFloats(bytes: Uint8Array): Float32Array {
    return new Float32Array(bytes.buffer, bytes.byteOffset, bytes.byteLength / 4);
}

Then update these paths to use the helpers instead of .buffer or direct new Float32Array(raw):

  • GaussianSplattingMeshBase._appendSourceToArrays()
  • GaussianSplattingMesh._retainMergedPartData()
  • all direct rebuild reads in GaussianSplattingMesh._addPartsInternal()

This phase removes the most expensive avoidable copy in removePart() and does not require loader parser changes.

Alignment and “copy only if required”

This should be an explicit implementation rule, not an implicit side effect.

For raw splat data:

  • splat records are 32 bytes each,
  • any view that starts on a splat boundary is automatically 4-byte aligned,
  • any view whose length is an integer number of splats is automatically a multiple of 4.

That means:

  • subviews produced from proxy._splatsDataOffset * 32 and proxy._vertexCount * 32 do not need copying before creating Float32Array overlays,
  • the compound rebuild path can stay zero-copy for transient retained-part sources as long as it preserves the exact byteOffset.

For SH data:

  • each retained SH texel is 16 bytes per splat,
  • subviews created on SH-splat boundaries are also naturally 4-byte aligned,
  • Uint32Array overlays used by texture upload can therefore stay view-based for those subranges.

Copying should happen only when one of these is true:

  • byteOffset % 4 !== 0,
  • byteLength % 4 !== 0,
  • the byte range is not an exact integer number of splat or SH records,
  • a public API contract still requires returning an owned standalone ArrayBuffer,
  • the implementation deliberately wants snapshot ownership instead of aliasing mutable caller memory.

In other words: misalignment should trigger a fallback copy, not a blanket copy policy.

Phase 2: public updateData(ArrayBufferView) support

This phase is feasible, but it is a broader API decision.

Recommended changes:

  • widen _updateData(), updateData(), and updateDataAsync() to accept ArrayBuffer | ArrayBufferView,
  • normalize the input into a Uint8Array view that preserves byteOffset,
  • retain the exact view when RAM retention is enabled,
  • optionally retain SH views instead of cloning them,
  • copy only on the misaligned fallback path.

Important constraint:

  • zero-copy float reinterpretation needs byteOffset % 4 === 0 and byteLength % 4 === 0,
  • for splat-aligned subranges this is naturally true because each splat is 32 bytes,
  • for arbitrary caller-provided views, a fallback copy may still be needed when alignment is invalid.

Recommended normalization shape:

function normalizeSplatBytes(data: ArrayBuffer | ArrayBufferView): Uint8Array {
    return ArrayBuffer.isView(data) ? new Uint8Array(data.buffer, data.byteOffset, data.byteLength) : new Uint8Array(data);
}

function ensureFloat32Readable(bytes: Uint8Array): Uint8Array {
    if (bytes.byteOffset % 4 === 0 && bytes.byteLength % 4 === 0) {
        return bytes;
    }

    const copy = new Uint8Array(bytes.byteLength);
    copy.set(bytes);
    return copy;
}

That keeps the fast path zero-copy and makes the fallback explicit and local.

Public API compatibility options

Phase 2 has one real compatibility question: the public splatsData getter.

If mesh-owned _splatsData becomes a view, the getter can no longer safely pretend the data is always a standalone ArrayBuffer.

There are two reasonable options:

  • next major: widen splatsData to return ArrayBuffer | Uint8Array | null,
  • additive path: introduce a new splatsDataView getter and defer any change to splatsData.

The lower-risk path is to keep phase 1 internal-only and make phase 2 a deliberate API follow-up.

Why Uint8Array is the right view type

If view support is added, the preferred byte-view type is Uint8Array, not a generic ArrayBufferView.

Reasons:

  • raw splat storage is byte-addressed,
  • Uint8Array naturally preserves byteOffset and byteLength,
  • SH data is already Uint8Array[],
  • serialization already accepts ArrayBufferView,
  • derived Float32Array views can be created from the byte view when alignment is valid.

Loader impact

The loaders should be updated only for API consistency, not because they are the primary optimization target.

Recommended loader changes:

  • widen IParsedSplat.data in packages/dev/loaders/src/SPLAT/splatDefs.ts to ArrayBuffer | ArrayBufferView,
  • keep existing parser implementations unchanged for now,
  • let SPLATFileLoader forward whichever binary type it receives to GaussianSplattingMesh.updateData().
  • if a future loader ever returns a view into a larger parent buffer, preserve that view and rely on the core alignment fallback instead of eagerly repacking.

Why the impact is low:

  • ParseSpz() always allocates a new packed output buffer,
  • ParseSogDatas() always allocates a new packed output buffer,
  • converted PLY paths already allocate packed output buffers,
  • only raw .splat input naturally reuses the incoming source buffer.

So loader-side widening is mainly a forward-compatible plumbing change.

Estimated impact

Per-splat retained payload

Retained raw source payload is:

  • 32 bytes per splat for base splat data,
  • plus 16 bytes per SH texture per splat.

Common reference sizes:

Splats Base payload With 3 SH textures
100,000 3.1 MiB 7.6 MiB
1,000,000 30.5 MiB 76.3 MiB
5,000,000 152.6 MiB 381.5 MiB

CPU impact

Expected CPU effect:

  • _updateData() with loader-generated ArrayBuffer input: low for raw splat bytes, because that path is already zero-copy today.
  • _updateData() with caller-provided subviews: low to moderate, because it removes the need to repack subranges into standalone buffers before calling updateData().
  • removePart(): moderate to high, because it removes one full temporary retained-source copy of the surviving payload.
  • addPart() / _addPartsInternal(): low to moderate, because the required merged-buffer copy still remains, but source reads become view-safe and any transient copied retained-source slices disappear.

In practice, the biggest CPU win is reduced typed-array copy bandwidth during compound rebuilds.

Memory impact

Expected memory effect:

  • removePart() peak retained-source memory drops by roughly one surviving-payload copy.
  • With the current flow, peak transient retained raw memory is approximately:
    • old retained buffer + survivor slices + new merged retained buffer
  • With phase 1 view-backed survivors, that becomes approximately:
    • old retained buffer + new merged retained buffer

If the surviving set is close to the old total, that is roughly a one-third reduction in peak transient retained raw memory.

Examples for the surviving payload that can be removed from the temporary peak:

  • about 30.5 MiB per 1M surviving splats without SH,
  • about 76.3 MiB per 1M surviving splats with 3 SH textures,
  • about 152.6 MiB per 5M surviving splats without SH,
  • about 381.5 MiB per 5M surviving splats with 3 SH textures.

GC impact

Expected GC effect:

  • fewer large temporary ArrayBuffer allocations during removePart(),
  • lower risk of promoting large copied survivor buffers into longer-lived generations,
  • smaller major-collection pressure and fewer pause spikes around compound rebuilds,
  • only a small number of extra view objects per part remain transiently alive.

The object-count increase from views is negligible compared with the current byte churn.

Non-goals

This proposal does not try to:

  • remove the required merged retained-buffer copy in _retainMergedPartData(),
  • reduce the mandatory _makeSplat() processing work,
  • reduce the mandatory texture upload work,
  • redesign compound retained-source ownership as a segmented or piece-table structure,
  • optimize SPZ/SOG parsing algorithms beyond type widening.

Risks

Key risks are:

  • missing one remaining .buffer-based read path and corrupting subview offsets,
  • accepting misaligned public views and failing when creating Float32Array overlays,
  • accidentally copying aligned views “for safety” and losing most of the intended gain,
  • unintentionally pinning a large parent buffer longer than intended if a small subview escapes the rebuild scope,
  • turning the public splatsData getter into an accidental breaking change,
  • overstating _updateData() gains when the real hot path is still _makeSplat() plus texture upload.

Implementation handoff

Implementation mode should proceed in this order:

  1. Add shared byte-view helpers in gaussianSplattingMeshBase.ts that preserve byteOffset and derive float views safely.
  2. Change transient compound retained part sources to use byte views instead of copied slice() buffers.
  3. Update all compound rebuild and merge paths to consume exact views instead of .buffer at offset 0.
  4. Validate removePart() and addPart() correctness and memory behavior.
  5. If desired after phase 1, widen updateData() and updateDataAsync() to accept ArrayBufferView.
  6. Decide separately whether to widen the public splatsData getter or add a new splatsDataView API.
  7. Widen packages/dev/loaders/src/SPLAT/splatDefs.ts only after the core API shape is settled.

Just for own curiosity,

which AI was used for that?

Best. Werner

gpt-5.4 xhigh

1 Like

And a PR: