GaussianSplattingMesh: reducing buffer copy of removePart

kzhsw · April 15, 2026, 5:44am

Summary

Add view-backed raw splat source support, starting with compound rebuild internals. Preserve aligned views as-is, and copy only if required by alignment, size, or ownership constraints. The main expected benefit is lower peak memory and GC pressure in compound removePart() / rebuild flows; loader impact is small.

Motivation

GaussianSplattingMeshBase._updateData() is already zero-copy for raw splat bytes when the input is a standalone ArrayBuffer, so the largest remaining avoidable copies are not in the normal load path.

The real problem is in compound retained-source handling:

GaussianSplattingMesh._createRetainedPartSource() uses slice(), which copies surviving part data during rebuilds,
several read paths assume offset 0 and would misread a true ArrayBufferView,
removePart() therefore pays unnecessary temporary copy cost today, especially for large compounds.

The guiding rule for this proposal is:

preserve the original view whenever alignment is already valid,
copy only if required by typed-array alignment, exact-length constraints, or an ownership boundary that truly needs an isolated buffer.

This leads to a two-phase approach:

add low-risk internal view support for compound rebuilds,
optionally widen public updateData() / updateDataAsync() to accept ArrayBufferView later.

Expected outcome:

high memory and GC improvement for removePart() on large compounds,
moderate improvement for some addPart() and retained-source rebuild flows,
low impact for loader-generated data, because the SPLAT loaders already mostly synthesize fresh packed ArrayBuffers before calling updateData(),
no change to the mandatory per-splat processing and texture upload cost in _updateData() / _addPartsInternal().

Current behavior

`_updateData()` is ArrayBuffer-only and already zero-copy for raw splat bytes

GaussianSplattingMeshBase._updateData() currently accepts data: ArrayBuffer and immediately creates:

const uBuffer = new Uint8Array(data);
const fBuffer = new Float32Array(uBuffer.buffer);

It then stores uBuffer.buffer into _splatsData when RAM retention is enabled.

That means:

raw splat data is not copied today when the source is already a standalone ArrayBuffer,
but callers cannot safely pass a subrange view without first repacking it into a dedicated buffer,
and _shData is still cloned on retain via sh.map((arr) => new Uint8Array(arr)).

So the raw _updateData() path is not the biggest current internal copy site. The larger current win is in compound rebuilds.

compound rebuilds currently materialize copied per-part source buffers

GaussianSplattingMesh._createRetainedPartSource() currently does this:

_splatsData: this._splatsData.slice(splatByteOffset, splatByteOffset + splatByteLength),
_shData: this._shData?.map((texture) => texture.slice(shByteOffset, shByteOffset + shByteLength)) ?? null,

That is a full byte copy for every surviving part during removePart().

Later, _retainMergedPartData() allocates a new merged retained buffer and copies those bytes again into the compound’s new authoritative retained storage.

So removePart() currently pays for:

one full temporary retained-source copy of all surviving parts,
one full merged retained-buffer copy that is still required by the current single-buffer design.

current “typed array tolerated” code is not true view support

Some comments already acknowledge that callers may have stored a typed array in _splatsData, but the implementation is still buffer-centric.

Examples:

_appendSourceToArrays() extracts srcRaw.buffer and then creates new Uint8Array(srcBuffer) and new Float32Array(srcBuffer).
_retainMergedPartData() uses getSourceBuffer(data).buffer and then copies from offset 0.
compound rebuild paths use new Uint8Array(this._splatsData) and new Float32Array(this._splatsData) directly.

Those patterns work only when the data starts at byte offset 0. They are incorrect for a view with a non-zero byteOffset.

So widening _splatsData to ArrayBufferView without central helper changes would risk silent data corruption.

loader outputs are mostly already fresh packed buffers

The SPLAT loaders are not the main copy hotspot:

packages/dev/loaders/src/SPLAT/splatDefs.ts defines IParsedSplat.data as ArrayBuffer.
ParseSpz() allocates a new packed splat ArrayBuffer.
ParseSogDatas() allocates a new packed splat ArrayBuffer.
SPLATFileLoader._ConvertPLYToSplat() returns the original raw-splat ArrayBuffer for .splat input, and fresh converted buffers for parsed PLY paths.
SPLATFileLoader then forwards that data to gaussianSplatting.updateData(parsed.data, ...).

So loader-side type widening is reasonable for API consistency, but it is not where the largest memory win lives today.

Goal

Allow exact byte-range views to represent retained raw splat data where that removes avoidable copies, while preserving correct byteOffset handling and keeping the current merged-buffer ownership model intact.

Recommendation

Phase 1: internal retained-part views for compound rebuilds

This phase should be the first implementation target.

Use Uint8Array byte views for transient retained part sources created during compound rebuilds, without changing the public mesh API yet.

Concretely:

Keep the mesh-owned _splatsData field as-is for now.
Change the internal retained-part source contract so _createRetainedPartSource() can return a byte view instead of a copied ArrayBuffer.
Replace slice() with subarray() in _createRetainedPartSource() for both splat bytes and SH bytes.
Add central helpers that preserve both byteOffset and byteLength.
Keep all downstream reads view-based and only repack if a consumer cannot legally interpret the current offset/length.

Conceptually:

type SplatBytes = Uint8Array;

function getSplatBytes(data: ArrayBuffer | ArrayBufferView): Uint8Array {
    return ArrayBuffer.isView(data) ? new Uint8Array(data.buffer, data.byteOffset, data.byteLength) : new Uint8Array(data);
}

function getSplatFloats(bytes: Uint8Array): Float32Array {
    return new Float32Array(bytes.buffer, bytes.byteOffset, bytes.byteLength / 4);
}

Then update these paths to use the helpers instead of .buffer or direct new Float32Array(raw):

GaussianSplattingMeshBase._appendSourceToArrays()
GaussianSplattingMesh._retainMergedPartData()
all direct rebuild reads in GaussianSplattingMesh._addPartsInternal()

This phase removes the most expensive avoidable copy in removePart() and does not require loader parser changes.

Alignment and “copy only if required”

This should be an explicit implementation rule, not an implicit side effect.

For raw splat data:

splat records are 32 bytes each,
any view that starts on a splat boundary is automatically 4-byte aligned,
any view whose length is an integer number of splats is automatically a multiple of 4.

That means:

subviews produced from proxy._splatsDataOffset * 32 and proxy._vertexCount * 32 do not need copying before creating Float32Array overlays,
the compound rebuild path can stay zero-copy for transient retained-part sources as long as it preserves the exact byteOffset.

For SH data:

each retained SH texel is 16 bytes per splat,
subviews created on SH-splat boundaries are also naturally 4-byte aligned,
Uint32Array overlays used by texture upload can therefore stay view-based for those subranges.

Copying should happen only when one of these is true:

byteOffset % 4 !== 0,
byteLength % 4 !== 0,
the byte range is not an exact integer number of splat or SH records,
a public API contract still requires returning an owned standalone ArrayBuffer,
the implementation deliberately wants snapshot ownership instead of aliasing mutable caller memory.

In other words: misalignment should trigger a fallback copy, not a blanket copy policy.

Phase 2: public `updateData(ArrayBufferView)` support

This phase is feasible, but it is a broader API decision.

Recommended changes:

widen _updateData(), updateData(), and updateDataAsync() to accept ArrayBuffer | ArrayBufferView,
normalize the input into a Uint8Array view that preserves byteOffset,
retain the exact view when RAM retention is enabled,
optionally retain SH views instead of cloning them,
copy only on the misaligned fallback path.

Important constraint:

zero-copy float reinterpretation needs byteOffset % 4 === 0 and byteLength % 4 === 0,
for splat-aligned subranges this is naturally true because each splat is 32 bytes,
for arbitrary caller-provided views, a fallback copy may still be needed when alignment is invalid.

Recommended normalization shape:

function normalizeSplatBytes(data: ArrayBuffer | ArrayBufferView): Uint8Array {
    return ArrayBuffer.isView(data) ? new Uint8Array(data.buffer, data.byteOffset, data.byteLength) : new Uint8Array(data);
}

function ensureFloat32Readable(bytes: Uint8Array): Uint8Array {
    if (bytes.byteOffset % 4 === 0 && bytes.byteLength % 4 === 0) {
        return bytes;
    }

    const copy = new Uint8Array(bytes.byteLength);
    copy.set(bytes);
    return copy;
}

That keeps the fast path zero-copy and makes the fallback explicit and local.

Public API compatibility options

Phase 2 has one real compatibility question: the public splatsData getter.

If mesh-owned _splatsData becomes a view, the getter can no longer safely pretend the data is always a standalone ArrayBuffer.

There are two reasonable options:

next major: widen splatsData to return ArrayBuffer | Uint8Array | null,
additive path: introduce a new splatsDataView getter and defer any change to splatsData.

The lower-risk path is to keep phase 1 internal-only and make phase 2 a deliberate API follow-up.

Why `Uint8Array` is the right view type

If view support is added, the preferred byte-view type is Uint8Array, not a generic ArrayBufferView.

Reasons:

raw splat storage is byte-addressed,
Uint8Array naturally preserves byteOffset and byteLength,
SH data is already Uint8Array[],
serialization already accepts ArrayBufferView,
derived Float32Array views can be created from the byte view when alignment is valid.

Loader impact

The loaders should be updated only for API consistency, not because they are the primary optimization target.

Recommended loader changes:

widen IParsedSplat.data in packages/dev/loaders/src/SPLAT/splatDefs.ts to ArrayBuffer | ArrayBufferView,
keep existing parser implementations unchanged for now,
let SPLATFileLoader forward whichever binary type it receives to GaussianSplattingMesh.updateData().
if a future loader ever returns a view into a larger parent buffer, preserve that view and rely on the core alignment fallback instead of eagerly repacking.

Why the impact is low:

ParseSpz() always allocates a new packed output buffer,
ParseSogDatas() always allocates a new packed output buffer,
converted PLY paths already allocate packed output buffers,
only raw .splat input naturally reuses the incoming source buffer.

So loader-side widening is mainly a forward-compatible plumbing change.

Estimated impact

Per-splat retained payload

Retained raw source payload is:

32 bytes per splat for base splat data,
plus 16 bytes per SH texture per splat.

Common reference sizes:

Splats	Base payload	With 3 SH textures
100,000	3.1 MiB	7.6 MiB
1,000,000	30.5 MiB	76.3 MiB
5,000,000	152.6 MiB	381.5 MiB

CPU impact

Expected CPU effect:

_updateData() with loader-generated ArrayBuffer input: low for raw splat bytes, because that path is already zero-copy today.
_updateData() with caller-provided subviews: low to moderate, because it removes the need to repack subranges into standalone buffers before calling updateData().
removePart(): moderate to high, because it removes one full temporary retained-source copy of the surviving payload.
addPart() / _addPartsInternal(): low to moderate, because the required merged-buffer copy still remains, but source reads become view-safe and any transient copied retained-source slices disappear.

In practice, the biggest CPU win is reduced typed-array copy bandwidth during compound rebuilds.

Memory impact

Expected memory effect:

removePart() peak retained-source memory drops by roughly one surviving-payload copy.
With the current flow, peak transient retained raw memory is approximately:
- old retained buffer + survivor slices + new merged retained buffer
With phase 1 view-backed survivors, that becomes approximately:
- old retained buffer + new merged retained buffer

If the surviving set is close to the old total, that is roughly a one-third reduction in peak transient retained raw memory.

Examples for the surviving payload that can be removed from the temporary peak:

about 30.5 MiB per 1M surviving splats without SH,
about 76.3 MiB per 1M surviving splats with 3 SH textures,
about 152.6 MiB per 5M surviving splats without SH,
about 381.5 MiB per 5M surviving splats with 3 SH textures.

GC impact

Expected GC effect:

fewer large temporary ArrayBuffer allocations during removePart(),
lower risk of promoting large copied survivor buffers into longer-lived generations,
smaller major-collection pressure and fewer pause spikes around compound rebuilds,
only a small number of extra view objects per part remain transiently alive.

The object-count increase from views is negligible compared with the current byte churn.

Non-goals

This proposal does not try to:

remove the required merged retained-buffer copy in _retainMergedPartData(),
reduce the mandatory _makeSplat() processing work,
reduce the mandatory texture upload work,
redesign compound retained-source ownership as a segmented or piece-table structure,
optimize SPZ/SOG parsing algorithms beyond type widening.

Risks

Key risks are:

missing one remaining .buffer-based read path and corrupting subview offsets,
accepting misaligned public views and failing when creating Float32Array overlays,
accidentally copying aligned views “for safety” and losing most of the intended gain,
unintentionally pinning a large parent buffer longer than intended if a small subview escapes the rebuild scope,
turning the public splatsData getter into an accidental breaking change,
overstating _updateData() gains when the real hot path is still _makeSplat() plus texture upload.

Implementation handoff

Implementation mode should proceed in this order:

Add shared byte-view helpers in gaussianSplattingMeshBase.ts that preserve byteOffset and derive float views safely.
Change transient compound retained part sources to use byte views instead of copied slice() buffers.
Update all compound rebuild and merge paths to consume exact views instead of .buffer at offset 0.
Validate removePart() and addPart() correctness and memory behavior.
If desired after phase 1, widen updateData() and updateDataAsync() to accept ArrayBufferView.
Decide separately whether to widen the public splatsData getter or add a new splatsDataView API.
Widen packages/dev/loaders/src/SPLAT/splatDefs.ts only after the core API shape is settled.

yamaciller · April 15, 2026, 10:56am

Just for own curiosity,

which AI was used for that?

Best. Werner

kzhsw · April 16, 2026, 12:38am

gpt-5.4 xhigh

kzhsw · April 22, 2026, 5:28am

And a PR:

github.com/BabylonJS/Babylon.js

GaussianSplatting: use view-backed retained part sources for compound rebuilds (#18361)

master ← kzhsw:patch-1

opened 05:27AM - 22 Apr 26 UTC

kzhsw

+241 -26

## Summary This change implements phase 1 of the view-backed raw splat source… proposal for compound Gaussian Splatting rebuilds. The main goal is to remove avoidable transient retained-source copies during compound `addPart()` and `removePart()` flows while keeping the current merged-buffer ownership model intact. In particular: - retained part sources created during compound rebuilds now use byte views instead of copying with `slice()`, - compound rebuild reads now preserve `byteOffset` correctly for raw splat data, - SH retained-part sources are also view-backed, - empty-compound rebuilds now preserve SH correctly when all incoming parts provide SH. ## What Changed ### 1. Centralized view-safe splat reads `GaussianSplattingMeshBase` now has internal helpers that: - produce a `Uint8Array` covering the exact byte range of an `ArrayBuffer` or `ArrayBufferView`, - produce a `Float32Array` reinterpretation over that exact range, - copy only if float alignment is invalid. This is the key correctness fix for non-zero-offset retained sources. ### 2. Retained part sources are now true views `GaussianSplattingMesh._createRetainedPartSource()` previously copied both splat and SH bytes with `slice()`. It now returns: - `_splatsData` as a `Uint8Array.subarray(...)`, - `_shData` entries as `Uint8Array.subarray(...)`. That removes the temporary retained-source copy that `removePart()` used to pay for before the final merged retained-buffer copy. ### 3. Compound rebuild paths now respect `byteOffset` The rebuild paths that previously assumed offset `0` now use the centralized helpers instead of constructing typed arrays directly from the whole backing buffer. This applies to: - `_appendSourceToArrays()`, - `_retainMergedPartData()`, - direct rebuild reads in `_addPartsInternal()`. ### 4. SH retention on empty-compound rebuilds is fixed While adding SH coverage, a correctness bug surfaced: - when rebuilding from an empty compound, `_addPartsInternal()` could drop SH even if all incoming parts had SH. This is now fixed by computing `hasSH` correctly for the empty-compound case. ## Copy Model After This Change ### Add/remove retained-source behavior For compound add/remove rebuilds, the retained-source copy model is now: - no temporary retained-part copy for splat bytes, - no temporary retained-part copy for SH bytes, - one final authoritative merged retained-buffer copy for splat bytes, - one final authoritative merged retained-buffer copy for SH bytes. That final merged copy is still intentional and required by the current single-buffer retained ownership design. ### What is not counted here This change does not remove the normal CPU processing and GPU upload work in `_updateData()` / `_addPartsInternal()`. Texture staging arrays like `covA`, `covB`, `colorArray`, and temporary SH upload arrays are still allocated as part of normal processing. Those are not retained-source copies and are outside the scope of this phase. ## Loader Impact This should not change SPLAT loader behavior. Current loaders still parse into packed standalone `ArrayBuffer`s and call `updateData()` with `ArrayBuffer`, not `ArrayBufferView`. So: - normal `.splat` / `.ply` / `.spz` / `.sog` loading behavior is unchanged, - the new view-handling logic is primarily exercised by internal compound retained-source rebuilds. ## Tests Added / Updated Unit coverage now verifies: - retained part sources alias the compound's merged retained splat buffer and merged retained SH buffer, - the aliasing uses the expected non-zero `byteOffset` for later parts, - `removePart()` rebuild preserves the surviving splat bytes exactly, - `removePart()` rebuild preserves the surviving SH bytes exactly, - the rebuilt surviving retained part source still aliases the authoritative retained buffers after rebuild, - existing serialization / proxy reconnection behavior still works. Focused test command used during development: ```bash npx vitest run packages/dev/core/test/unit/Meshes/babylon.gaussianSplatting.serialization.test.ts ``` ## Reviewer Focus The most important things to review are: 1. Byte-offset correctness in the new helper-based splat reads. 2. Whether the retained-source copy model now matches the intended "single retained merge copy" behavior. 3. Whether the empty-compound SH fix matches expected API behavior. 4. Whether any adjacent code paths still incorrectly assume offset `0` for retained splat or SH data. ## Deliberately Out of Scope This PR does not widen the public mesh API to accept `ArrayBufferView` in `updateData()` / `updateDataAsync()`. It also does not change the public `splatsData` getter contract. Those are phase 2 API-level changes and still need separate review because broader base-class upload code would need to be audited for offset-safe `ArrayBufferView` support. ## Forum post <https://forum.babylonjs.com/t/gaussiansplattingmesh-reducing-buffer-copy-of-removepart/63164>

Topic		Replies	Views
The Future of GS in BBL Questions gaussian-splatting	38	591	November 24, 2025
Include Gaussian splatting models in .babylon files Feature requests	12	348	October 21, 2024
Gaussian Splatting seems to not work since version 8 Bugs	43	829	June 30, 2025
Gaussian Splatting in Babylon.js Completed Features	39	5746	July 30, 2024
Data transfer from compute shader to renderer Questions compute-shader	16	1717	December 7, 2023