Performance of PointsCloudSystem?

I’m trying to create a set of “trace points” and wondering whether PointsCloudSystem is the most efficient.

I update the position of a single point per frame, and don’t need to change color. All points have the same unchanging color. “Update” instead of create because I think this minimizes new buffers or reallocation.

When I get the mesh, it contains positions, colors, and UVs.

Are there any other optimizations that I can apply besides:

pcs.computeParticleRotation = false;        
pcs.computeParticleTexture = false;
pcs.computeParticleColor = false;
pcs.computeBoundingBox = false;

I note that the mesh has positions, uv, and color VerticesData. Are all of these required?

I use the counter to track the next available point and update it:

// copy position in updateParticle:
pcs.updateParticle = (cp)=>cp.position.copyFrom(emitterPosition)

// per render frame:
pcs.setParticles(pcs.counter,pcs.counter)
pcs.counter = (pcs.counter+1) % pcs.nbParticles

Can I do any better in terms of performance or footprint?

Any chance to get a Playground to see how it behaves

From a more general standpoint, as you need call to Update, I think you are on a very fast solution there

1 Like

Here’s a quick playground.

The meat of it is in addTrace() (with a call to getTracePCS()) and tracePerFrame().

You can change number of points on line 73 (currently 1000, but likely to want 10000 in final implementation). Points are initialized to position 0,0,0, and are eventually recycled/repositioned when the counter gets to them (1 per frame).

In the playground in my previous comment, is there a way to measure CPU-to-GPU transfers as well as GPU allocated buffers?

My hope is that colors and uv VerticesData are not transferred in tracePerFrame. I hope “pcs.computeParticleColor = false;” disables color transfer, but not sure if uv data is transferred. Maybe “pcs.computeParticleTexture” controls uv VerticesData transfers?

I also hope that “pcs.setParticles(pcs.counter,pcs.counter)” only transfers the single position (12 bytes) at the index pcs.counter. If so, the fact that color and uv exist means they take up space in the CPU (javascript objects) and GPU (VerticesData) but don’t directly affect performance. I think I can reduce the CPU space of colors by using the same Babylon color object, but I presume it is expanded into an ArrayBuffer in prep for transfer to GPU. I might do one better by using an ArrayBuffer directly instead of an Array() of Color4 objects. (I’m not sure if it’s still copied to a new ArrayBuffer, though I think I could replace it after creation (maybe)). It would be great if I could specify a single Color4 for all particles, but I’m doubtful that has been implemented (as my use case is fairly niche).

The alternative Babylon objects seem heavier. A regular mesh with points needs(?) vertices in sets of 3. ThinInstance uses an array of matrices, one per thin instance which is 16*4=64 bytes in the GPU. An Instance is also at least an additional matrix in the GPU. Other ParticleSystem objects also seem heavier. GPUParticleSystem is interesting but I also think that’s heavier on the GPU (though it may save some CPU static resources). And it’s unclear to me if ParticleSystem and GPUParticleSystem only transfer updated particle data rather than the entire buffer on each small change.

As long as I’m ok with camera-facing squares, PointsCloudSystem seems it would be the best performing. And if I wanted an arbitrary mesh as the marker, would mesh ThinInstances be the best approach?

On that note, where does the “camera facing” portion get computed when using PointsCloudSystem? Is it implemented with a modified (from nominal mesh) shader code path? If so, is that code path shorter than if it weren’t camera facing?

I guess you can find a lot of answers directly in the code:

Babylon.js/packages/dev/core/src/Particles/pointsCloudSystem.ts at master · BabylonJS/Babylon.js

For instance you can see that UV and normals are differents vertex buffers

Modified so it only sends to GPU the position data for a single vertex. The key function is on VertexBuffer, .updateDirectly(tempVectorArray, pcs.counter*3,false);

To answer my questions above:

I hope “pcs.computeParticleColor = false;” disables color transfer, but not sure if uv data is transferred. Maybe “pcs.computeParticleTexture” controls uv VerticesData transfers?

This is correct. computeParticleTexture controls the update of the UvKind VertexBuffer.

I also hope that “pcs.setParticles(pcs.counter,pcs.counter)” only transfers the single position (12 bytes) at the index pcs.counter.

This is incorrect. PointsCloudSystem.setParticles() only calls .updateParticles within the range specified, but afterwards send the entire updated buffer to the GPU.

I think I can reduce the CPU space of colors by using the same Babylon color object, but I presume it is expanded into an ArrayBuffer in prep for transfer to GPU.

This is correct.

It would be great if I could specify a single Color4 for all particles, but I’m doubtful that has been implemented (as my use case is fairly niche).

I couldn’t find where this is implemented. The full ColorsKind VertexBuffer is created, though “pcs.computeParticleColor = false;” does prevent updates to the GPU buffer.

I was able to implement the update fairly straightforwardly by retaining the PositionKind VertexBuffer and calling .updateDirectly() on it. This means the CloudPoint array is not involved at all after first creation. Basically, I use PointsCloudSystem to create the mesh and use the material/shader of that created mesh. After creating the mesh, I bypass the PointsCloudSystem object and update the mesh’s PositionKind VertexBuffer directly. This also bypasses all the per-vertex rotation calculations in PointsCloudSystem. setParticles().

This works well for me (even though I don’t know how to verify the performance optimization). I’m not sure generalizing the optimization I’ve done would be beneficial because usually user code is modifying a CloudPoint object, which then would need copying into VertexBuffers then sent to the GPU (using “mesh.updateVerticesData()” on lines 914-922).

However, it looks like the VertexBuffers needed are retained in an array or map on the mesh object and thus can be retreived quickly if neede for a call to .updateDirectly(). This appears to be a possible future optimization for PointsCloudSystem.

Edit: see this thread for more info on partially updating VertexBuffer.

1 Like

If your operating system is Windows, you can try using PIX to get details of what a frame does and the different timings. This page should get you started:

1 Like