Optimizing performance of GreasedLineMesh._setPoints

kzhsw · March 30, 2024, 7:12am

When creating GreasedLineMesh, it’s observed that the creation is very slow when having ~10k points, example below, see console for creation time, which takes 5310ms on my laptop:

After profiling with chrome devtools, we found that the performance bottleneck is function GreasedLineMesh._setPoints

Code here:

github.com

BabylonJS/Babylon.js/blob/7.0.0/packages/dev/core/src/Meshes/GreasedLine/greasedLineMesh.ts#L94


      
          }

          

          protected _setPoints(points: number[][]) {

              this._points = points;

              this._options.points = points;

          

              this._initGreasedLine();

          

              let indiceOffset = 0;

          

              points.forEach((p) => {

                  const counters: number[] = [];

                  const positions: number[] = [];

                  const indices: number[] = [];

          

                  const totalLength = GreasedLineTools.GetLineLength(p);

                  for (let j = 0, jj = 0; jj < p.length; j++, jj += 3) {

                      const partialLine = p.slice(0, jj + 3);

                      const partialLineLength = GreasedLineTools.GetLineLength(partialLine);

                      const c = partialLineLength / totalLength;

And the GreasedLineTools.GetLineLength code is here:

github.com

BabylonJS/Babylon.js/blob/7.0.0/packages/dev/core/src/Misc/greasedLineTools.ts#L246


      
          

              let points: Vector3[];

              if (typeof data[0] === "number") {

                  points = GreasedLineTools.ToVector3Array(<number[]>data) as Vector3[];

              } else {

                  points = data as Vector3[];

              }

          

              const tmp = TmpVectors.Vector3[0];

              let length = 0;

              for (let index = 0; index < points.length - 1; index++) {

                  const point1 = points[index];

                  const point2 = points[index + 1];

                  length += point2.subtractToRef(point1, tmp).length();

              }

              return length;

          }

          

          /**

           * Divides a segment into smaller segments.

           * A segment is a part of the line between it's two points.

We analyzed into the algorithm, and found that the time complexity of it is O(n^2), where n is the number of points.

In line 101 of GreasedLineMesh._setPoints code above, p.slice(0, jj + 3); creates a copy of points array from the beginning to current index, and GetLineLength is executed over this, so the length of previously computed points are evaluated again for each point following.

To optimize this, a fairly simple way is to cache the length of previously computed points, after this, the p.slice(0, jj + 3); call can be removed so there should be less memory allocated after this optimization.

The optimized code is here, which uses ~60ms for the first run, and ~30ms for next run:

mawa · March 30, 2024, 8:02am

cc @roland Have a great Easter holiday

kzhsw · March 30, 2024, 8:29am

github.com/BabylonJS/Babylon.js

Optimizing performance of GreasedLineMesh._setPoints

BabylonJS:master ← kzhsw:patch-1

opened 08:29AM - 30 Mar 24 UTC

kzhsw

+25 -3

See detailed context at forum post <https://forum.babylonjs.com/t/optimizing-per…formance-of-greasedlinemesh-setpoints/49168>. This proposal adds a static method `GreasedLineTools.GetLineLengthArray` which does almost the same as `GreasedLineTools.GetLineLength`, but instead of computing for the whole line, it computes the length from the beginning of the line to each point, and collect results to array. This should simplify the loop in `GreasedLineMesh._setPoints`, eliminates the need to slice array and compute length in each loop, reducing the time complexity from `O(n^2)` to `O(n)`.

roland · March 30, 2024, 11:13am

You too buddy!
And of course @kzhsw to you as well!

The main bottleneck in GreasedLine is setting/updating the points and preparing the underlying GPU buffers. I was already thinking about optimizing but honestly my first thoughts were to prepare the buffers by a compute shader and I didn’t think about to start looking at first at the javascript code.

Very well done @kzhsw ! I tested your PG with 50.000 points:
const points = BABYLON.GreasedLineTools.GetCircleLinePoints(100, 50000)

From:
ms

To:
ms

kzhsw · April 1, 2024, 3:06am

The algorithm seems mainly memory lookup and copy operations, to futher optimize it, consider precompute length of arrays, instance property this._vertexPositions, this._indices, and local vars counters, positions, and allocate them as Float32Array, this should reduce allocations by reducing array.push, which can make memory reallocated as array grow.
Preallocate array optimization (~30ms → ~20ms):

Making positions and indices zero copy:

Making uvs preallocated, and eliminate side, counters, and some branches:

Making _previousAndSide and _nextAndCounters preallocated, ~12ms when after warmup:

Also, since there are not too much calculations here, the algorithm might not benefit too much from compute shaders, as the overhead of copying memory to gpu and the latency of invoking compute shader would also introduced by compute shaders.
BTW, if I understand correctly, this code pushes the same point to positions twice, if this could be indexed, there might be some performance gains.

github.com

BabylonJS/Babylon.js/blob/7.0.0/packages/dev/core/src/Meshes/GreasedLine/greasedLineMesh.ts#L105


      
          const counters: number[] = [];

          const positions: number[] = [];

          const indices: number[] = [];

          

          const totalLength = GreasedLineTools.GetLineLength(p);

          for (let j = 0, jj = 0; jj < p.length; j++, jj += 3) {

              const partialLine = p.slice(0, jj + 3);

              const partialLineLength = GreasedLineTools.GetLineLength(partialLine);

              const c = partialLineLength / totalLength;

          

              positions.push(p[jj], p[jj + 1], p[jj + 2]);

              positions.push(p[jj], p[jj + 1], p[jj + 2]);

              counters.push(c);

              counters.push(c);

          

              if (jj < p.length - 3) {

                  const n = j * 2 + indiceOffset;

                  indices.push(n, n + 1, n + 2);

                  indices.push(n + 2, n + 1, n + 3);

              }

          }

kzhsw · April 1, 2024, 3:15am

The playground targeting the pr:

mawa · April 1, 2024, 3:18pm

This all looks very promising. I’ll be following on that (from a distance ). Meanwhile, have both a great time during these Season holidays

Deltakosh · April 1, 2024, 4:20pm

woot! Massive. I love these updates

kzhsw · April 2, 2024, 5:34am

There is also a few things left for futher optimization of GreasedLineMesh:

in _setPoints, building previous and next array takes considerable time if run for multiple times, it could benefit if we can eliminate it like side and counters, but doing this would split the loop to 3 parts.

profiling957×796 51.3 KB
_updateColorPointers and CompleteGreasedLineWidthTable takes considerable amount of time after optimization of the pr, which can also benefit from the preallocated typedarrays optimization.

profiling1136×797 67.4 KB

mawa · April 2, 2024, 7:21am

WoW. I think you’re about to become @roland best friend…and by extension, mine as well
I’m so eager to see what your joined forces will be able to accomplish from all this

roland · April 2, 2024, 8:54am

Yes! I’m glad someone makes the optimalizations of GreasedLine. I have it on table for a long time already but didn’t have the opportunity to start making the stuff…

@kzhsw can you please make the changes in greasedLineRibbonMesh.ts file where it’s needed as well? Thanks a lot!

mawa · April 2, 2024, 9:21am

I can hear you on that. For me, optimization is always the last thing in the tickets and somehow, it tends to always slip through Probably because I’m no real dev… as long as it works and looks good…fine by me

roland · April 2, 2024, 9:32am

Yes YOU ARE! However while in business we have to make tough decisions and sometimes we have to make things just good enough.

Fortunatelly the open source has a big advantage: there are plenty of coders who pushes (or push??? damn I have to practice English) your good enough code to a masterpiece! @kzhsw

kzhsw · April 3, 2024, 1:54am

After a benchmark, the profiling result shows that greasedLineRibbonMesh does not have the time complexity issue like GreasedLineMesh:

The benchmark code:

Since gc takes a large part of time, this could benefit from using preallocated vector when doing calucations, or inlining the calucations maybe.
Perallocated typedarrays could work, but the algorithm here is far too complex here than the one of GreasedLineMesh, so this could be some long-term thing.
Unlike the algorithm of GreasedLineMesh, this algorithm makes a lot of compute, and not using trigonometric functions at runtime, so it might also benefit from simd, but this could be out of the scope of babylon.js.

roland · April 8, 2024, 3:27am

This PR unfortunatelly introduced a bug:

Produces:

Expected:

Can you check it please?

kzhsw · April 8, 2024, 7:54am

This could be the same index thing reported here, seems this pr fixes it.

Topic		Replies	Views
Poor performance creating scene with large number of meshes Questions	4	853	October 8, 2022
How to draw a tail in an optimized way Questions	21	470	August 1, 2023
BABYLON.MeshBuilder.CreateBox cost long time Questions documentation	9	550	August 11, 2021
"Free" performance improvement: preallocate arrays (when possible) throughout Babylon.js's codebase Questions	10	1096	October 16, 2020
Better FPS with mesh cloning than mesh instancing - a bug in Babylon? Questions	16	1193	May 31, 2022

Related Topics