Batched or Instanced BoundingBoxRenderer

kzhsw · November 25, 2024, 5:49am

Version: git master (73ab4d6)

Status

Merged and released 7.37.2, latest example here: https://playground.babylonjs.com/#NRNVQA#2

Background

Currently in BoundingBoxRenderer, it renders in a loop, calling engine.drawElementsType for each bounding box in renderList, making every bounding box 1 or 2 draw call.
Since it renders on web, it would suffer the same performance issue like meshes when draw calls increases.

github.com

BabylonJS/Babylon.js/blob/73ab4d666c0506a5fe551ce852a23e0e8ec0d15a/packages/dev/core/src/Rendering/boundingBoxRenderer.ts#L393


      
                  } else {

                      engine.setDepthFunctionToLess();

                  }

                  this._uniformBufferFront.bindToEffect(drawWrapperFront.effect!, "BoundingBoxRenderer");

                  this._uniformBufferFront.updateColor4("color", this.frontColor, 1);

                  this._uniformBufferFront.updateMatrix("world", worldMatrix);

                  this._uniformBufferFront.updateMatrix("viewProjection", transformMatrix);

                  this._uniformBufferFront.update();

          

                  // Draw order

                  engine.drawElementsType(Material.LineListDrawMode, 0, 24);

          

                  this.onAfterBoxRenderingObservable.notifyObservers(boundingBox);

              }

              this._colorShader.unbind();

              engine.setDepthFunctionToLessOrEqual();

              engine.setDepthWrite(true);

          }

          

          private _createWrappersForBoundingBox(boundingBox: BoundingBox): void {

              if (!boundingBox._drawWrapperFront) {

Proposal

Since only the worldMatrix changes for different bounding boxes, it should benefit from instancing, like thin instances for meshes.

Allocate buffers for instancing.
Loop renderList and fill buffers.
Copy buffers to gpu.
Render instanced.

With instancing the draw call needed for each BoundingBoxRenderer could be reduced to 1 or 2 (in case this.showBackLines enabled)

Example:

Alternatives

Since there are very few[1] vertex needed, and there is already vectorsWorld in BoundingBox, which updates every time world matrix updates, the computation of matrices can be skipped, and reuse computation result of vectorsWorld to rebuild vertex buffer every time it renders. In this case _indexBuffer needs to be reconstructed is count of bounding box mismatches.

[1]: 24 points in case of CreateBoxVertexData, or 8 points if constructed manually, since uvs and normals are not needed for rendering as lines

Evgeni_Popov · November 25, 2024, 9:44am

It looks like a good idea! Do you want to make a PR for it!

However, you should implement it with a flag (like useInstancing, default: false), so that it’s not a breaking change: currently, onBeforeBoxRenderingObservable and onAfterBoxRenderingObservable are notified for each bounding box, and it should still be the default behavior.

HiGreg · November 25, 2024, 1:05pm

What about this optimization of drawing boxes with 4 lines of 4 vertices each instead of 12 lines of 2 vertices each?

Evgeni_Popov · November 25, 2024, 1:19pm

I don’t think it changes anything at the GPU level, as the current bounding box renderer already issues a single draw call with 24 indices (2 indices per line), which can’t be optimized better:

(from Spector)

HiGreg · November 25, 2024, 2:13pm

Ah. Is that because drawElements doesn’t have a multiline primitive? Looks like the multidraw extension to WebGL would help, but it’s not supported everywhere (Firefox lacks support).

Evgeni_Popov · November 25, 2024, 6:14pm

drawElements is already multilines, as you give a list of indices and it will draw lines between index 0 and 1, 2 and 3, and so on. Maybe I didn’t understand your question?

HiGreg · November 25, 2024, 6:42pm

Agreed drawElements can draw multiple single-segment lines (lines defined by two points each). By “multilines” I meant lines made up of multiple segments each. In my case, each multiline is three segments defined by four points. Points 0, 1, 2, 3 result in a multiline defined by line segments 0->1, 1->2, 2->3, and 3->4.

The more “efficient” box definition uses four such multilines and is only 16 total vertices instead of 24. But if the end result is a drawElement(), which can only draw single-segment lines (defined by exactly two points each), then there is no savings in space or time.

kzhsw · November 26, 2024, 1:17am

Since draws are batched, when should events trigger? Like：

loop renderList, make matrices and trigger onBeforeBoxRenderingObservable
make draws
loop renderList, make matrices and trigger onAfterBoxRenderingObservable

In this case, if one uses the 2 Observables to change rendering param for each box, it might not work as expected.

Evgeni_Popov · November 26, 2024, 10:05am

We would trigger each event only a single time (passing a dummy/undefined box), not for each box. That would be the “breaking change” part (as well as the display being potentially different, because drawing the black/white part of each box one after the other can be different from drawing the black part of all boxes and then the white part).

kzhsw · November 28, 2024, 6:52am

Updated to use a dummy bounding box, but what is your option on whether or not to keep a renderList in DummyBoundingBox?

kzhsw · November 28, 2024, 7:43am

Comparing performance with or without SIMD:

Without SIMD (avg 3.694ms on my local chrome):

With SIMD (avg 3.277ms on my local chrome, ~11% diff):

source

#include <cglm/cglm.h>

extern unsigned char __heap_base;

uintptr_t get_heap_base() {
    // align with 64 bytes
    return (((uintptr_t) (&__heap_base)) + 63) & ~63;
}

unsigned bbox_compose(float * minmax, vec4 * mat, size_t count) {
    CGLM_ALIGN_MAT mat4 tmp_mat;
    CGLM_ALIGN_MAT vec4 diff, median;
    glm_mat4_identity(tmp_mat);
    float * m = (float *) tmp_mat;
    for (size_t i = 0; i < count; i++) {
        float * min = minmax;
        minmax += 4;
        float *  max = minmax;
        minmax += 4;
        glm_vec4_sub(max, min, diff);
        glm_vec4_scale(diff, 0.5, median);
        glm_vec4_add(min, median, median);
        // Directly update the matrix values in column-major order
        m[0] = diff[0];  // Scale X
        m[3] = median[0];  // Translate X
        
        m[5] = diff[1];  // Scale Y
        m[7] = median[1];  // Translate Y
        
        m[10] = diff[2];  // Scale Z
        m[11] = median[2];  // Translate Z
        glm_mat4_mul(mat, tmp_mat, mat);
        mat += 4;
    }
    return count;
}

Evgeni_Popov · November 28, 2024, 9:45am

I don’t think we need a renderList (I’m not sure what the user would do with it). We could simply document that when using the instanced mode, the passed bounding box has no meaning and should be ignored (would be better to be able to not pass a bounding box at all, but it would be a breaking change).

kzhsw · November 29, 2024, 12:55am

Ok, I’ll remove the renderList in events. Also, since SIMD does not have expected performance boost, I’ll prefer not to use it.

And the WebGPU/WGSL port:

kzhsw · November 29, 2024, 2:37am

PR here:

github.com/BabylonJS/Babylon.js

Instanced BoundingBoxRenderer

BabylonJS:master ← kzhsw:patch-1

opened 02:36AM - 29 Nov 24 UTC

kzhsw

+302 -9

## Background Currently in BoundingBoxRenderer, it renders in a loop, calling… `engine.drawElementsType` for each bounding box in `renderList`, making every bounding box 1 or 2 [draw call](https://doc.babylonjs.com/features/featuresDeepDive/scene/optimize_your_scene/#reducing-draw-calls). Since it renders on web, it would suffer the same performance issue like meshes when draw calls increases. ## Proposal Since only the `worldMatrix` changes for different bounding boxes, it should benefit from instancing, like [thin instances](https://doc.babylonjs.com/features/featuresDeepDive/mesh/copies/thinInstances) for meshes. 1. Allocate buffers for instancing. 2. Loop renderList and fill buffers. 3. Copy buffers to gpu. 4. Render instanced. With instancing the draw call needed for each BoundingBoxRenderer could be reduced to 1 or 2 (in case `this.showBackLines` enabled) Example: <https://playground.babylonjs.com/#37HN68#14> ## Alternatives Since there are very few[1] vertex needed, and there is already [`vectorsWorld`](https://github.com/BabylonJS/Babylon.js/blob/73ab4d666c0506a5fe551ce852a23e0e8ec0d15a/packages/dev/core/src/Culling/boundingBox.ts#L43) in `BoundingBox`, which [updates](https://github.com/BabylonJS/Babylon.js/blob/73ab4d666c0506a5fe551ce852a23e0e8ec0d15a/packages/dev/core/src/Culling/boundingBox.ts#L166) every time world matrix updates, the computation of matrices can be skipped, and reuse computation result of `vectorsWorld` to rebuild vertex buffer every time it renders. In this case `_indexBuffer` needs to be reconstructed is count of bounding box mismatches. [1]: 24 points in case of `CreateBoxVertexData`, or 8 points if constructed manually, since uvs and normals are not needed for rendering as lines ## Compatibility 1. It's disabled by default for backwards compatibility. 2. It could result in a difference in rendering result if `showBackLines` enabled, because drawing the black/white part of each box one after the other can be different from drawing the black part of all boxes and then the white part. 3. Events of `onBeforeBoxRenderingObservable` and `onAfterBoxRenderingObservable` would only be triggered once for one rendering, instead of once every bounding box. Events would be triggered with a dummy box to keep backwards compatibility. ## References Forum post: <https://forum.babylonjs.com/t/batched-or-instanced-boundingboxrenderer/54977>

And playground targeting this PR:

Topic		Replies	Views
Is it okay to use two BoundingBoxRenderer-s in one scene Questions	17	1071	June 13, 2023
New feature: BoundingInfoHelper Announcements	14	349	April 3, 2025
Update uniforms in onBeforeRenderObservable call back Questions material , shader	5	1225	June 29, 2020
Mesh add edge,showBoundingBox ,bounding add edge Questions material	2	29	January 6, 2025
How can we set the width of the BoundingBox lines for BoundingBoxRendered? Questions	8	962	November 12, 2020

Status

Background

Proposal

Alternatives

Related topics