Low-Level Havok Debug Lines Visualization

I had difficulty using CompleteGreasedLineWidthTable as it was returning an empty array:

mesh.widths = BABYLON.CompleteGreasedLineWidthTable(pointsCount, mesh.widths, BABYLON.GreasedLineMeshWidthDistribution.REPEAT)

However, I was successful using:

mesh.setPoints(lines,{updatable:true})
const oldlength = mesh.widths.length
mesh.widths.length = lines.length*4//pointsCount*2
for(var i=mesh.widths.length-1;i>=oldlength;i--)
    mesh.widths[i]=1

The biggest disappointment is that CreateLineSystem accepts lines only in the format Vector3[ ][ ] whereas CreateGreasedLine.setPoints() only accepts Array[ ][ ]. That just means support for both is slightly more difficult. Overall, GreasedLine seems to perform sufficiently well and is more configurable (e.g. line width), as well as supporting better visualization for AngularVelocity.

I’d love for TypedArray support for multidimensional arrays (such as with stride), as I think a flat buffer of 24 bytes representing a single-segment line) would be best. For now, implementating a set of single-segment lines as an Array() of Float32Array(6) works well.

1 Like

Because it is
WIDTH_DISTRIBUTION_REPEAT not just REPEAT :wink:

Switch to Typescript :wink: no more errors like this!

Cool idea…

Here you go:

7 Likes

Thank you! I’ve used stride with a 1D Float32Array in CreateGreasedLine() and it works great. However, when I then use .setPoints(), I can’t use the same buffer because setPoints uses GreasedLineMeshOptions, not GreasedLineMeshBuilderOptions.

1 Like

This will fix the issue:

From now you can use setPoints with any supported datatype for the points, not just number[][]

#TCURLI#5

2 Likes

Looks like you fixed the setPoints options/stride issue as well as the (unreported) TypeScript-related issue for accepting Float32Array[ ]. Thank you!

Do you know where I can find the latest dev branch babylon.d.ts?

Would this work? https://preview.babylonjs.com/babylon.d.ts

This looks good. If I have some time later, I might write a shader in webGL & webGPU to be used with shader material class to do the rendering as fast as possible. The limiting factor for FPS performance stemming from copying data from havok to a uniform buffer and then transferring that buffer to GPU memory… it sure would be nice to have a webGPU refactor of havok lol.

1 Like

:crown:

1 Like

That sounds great! I’m curious where a custom shader could help. Right now, I think the areas of high CPU usage or high CPU-GPU transfer are:

DebugBodies : velocity lines require an update to half of the (local-space) data-points array. I only update the GreasedLine endpoints, not the start points. No change to number of points, widths, or colors.

DebugBodies (CPU) : matrix calculations converting to rotated local space.

Edit: I originally confused the implementation of Lines and Points. Corrected:

ContactPoints : update of all thinInstance transform matrices. Frequent changes to the number of thinInstances. I implement a FIFO for the contact points and old points get rotated out. This precludes containing changes to the matrices array within a small contiguous range.

ContactLines : every frame I regenerate the entire GreasedLine points, widths, and colors to only collisions occuring since the previous frame.

I’m happy to discuss advantages of a custom shader and what I could change to accomodate one.

1 Like

I could be wrong, but the last time I remember trying this sort of strategy in an effort render lots of grass blades, I found updating thousands of matrices on the cpu to be inefficient. If you were front loading this work into the havok engine, or tfjs, then you could get the advantages of SIMD and wasm helping you to power through those calculations faster. I’m assuming we’d expect this to be running in real time for easily more than 1000+ vertices since havok can support a lager number of dynamic bodies.

Thinking about this again, I believe you could squeeze out 50k matrix updates in 3ms on most modern mid-level laptop devices with wasm and SIMD enabled, but I don’t know off the top of my head how well SIMD is supported in the browser ecosystem atm. But lets say 3ms for the updates, and another 2.5ms for the cpu-gpu transer. That’s still more than 11 ms left for whatever else you need to do in that frame & still keep it at 60 fps, so this does seem feasible.

In your case if you have raw vertices data like with the contact points you don’t need to do anything but copy/update values in a buffer to be directly rendered. However, that data buffer needs to be fed to a shader program of some sort regardless if you use thin instances or your own custom shader. Under the hood, I don’t know how how many cpu operations are happening when you update a thin instances buffer but I think you’d save a few steps by just updating two uniform buffers:

  1. that contains the number of spheres to update and hence the vertex index count.
  2. the updated vertices
    Your vertex shader exits on on invocations beyond that vertex count, which saves on a lot of fragment shader invocations as well.

Also, if you were receiving these contact point updates merely as an array of transform matrices, then just copying those vales to the vertex shader to be computed gives you even more performance benefits. So either way it seems like having your own shader would be faster, right? Sorry if I missed something.

1 Like

Regarding this:

Maybe you could have more pools with different colors, widhts and use them as needed.

I’m not familiar with all that can be done within a vertex shader but I can provide detail on the data needing to be updated.

Focusing on ContactPoints:

All matrices are updated currently because that is how thinInstances are positioned. For ContactPoints, the updates are only for position, so a “Translation Matrix” (a matrix having three non-zeros only in the last row, contiguous because Matrices are stored column first) is created then the entire Matrix array of all thinInstances is pushed to the GPU. This could be greatly reduced (to 3/16 of the data) by only pushing new position vectors. If I abandon the “persistance” aspect of ContactPoints, then we’re still left with all new ContactPoint positions every frame. Further, updating a list of positions within a pre-allocated GPU buffer along with a count of valid positions (located at the start of that buffer) would minimize GPU buffer re-allocations.

The overall savings I think would be immense.

It doesn’t have to be thinInstances at all, but could easily be Instances with the same material. All I need to update are a multitude of mesh (sphere) positions, where each mesh represents a single contact point.

Then if persistance were needed, I could separate the updates into “blocks of positions” where a certain number would remain the same and a certain number would be new. Because the position data in this case is constantly rotating, it’s a little difficult to keep all the data in contiguous blocks, especially in the case where the new data size exceeds the data being rotated out. I’m not sure of the capabilities of a vertex shader, but I could further minimize the data sent to be only 1) number of contiguous blocks to be removed, 2) new block of position data. If the GPU moved all old position data that are not removed (i.e. data not yet rotated out) to the beginning of the buffer, then new data is always appended to that old data.

Again, I’m not exactly sure if this fits into what a vertex shader is capable of.

In the CPU I collect vertex position one at a time, one per collision Observable, in an Array. But even that can maybe be optimized if I am able to cycle one time through all collisions since the last frame and skip the numerous collision observer notifications.

For your reference, the perFrame update for ContactPoints currently looks like this (in TypeScript).

perFrame( edata: any, estate: any): void {
    const previousPointCount = this.#contactSphere.thinInstanceCount;
    const newPointCount = this.#contactPoints.length;
    const smaller = Math.min(previousPointCount,newPointCount);
    for (var i=0; i<smaller;i++) {
        const p = this.#contactPoints.array[i];
        this.#contactSphere.thinInstanceSetMatrixAt(i,
            BABYLON.Matrix.TranslationToRef(p.x, p.y, p.z, this.#matrix),
            false);
    }
    for (var i=smaller; i<newPointCount;i++) {
        const p = this.#contactPoints.array[i];
        this.#contactSphere.thinInstanceAdd(
            BABYLON.Matrix.TranslationToRef(p.x, p.y, p.z, this.#matrix),
            false);
    }
    this.#contactSphere.thinInstanceCount = newPointCount;
    this.#contactSphere.thinInstanceBufferUpdated("matrix");
    if (this.#contactSphere.isVisible != this.#contactSphere.hasThinInstances) {
        this.#contactSphere.isVisible = this.#contactSphere.hasThinInstances;
    }
    this.#contactPoints.elapse(edata.deltaTime); // removes old points
}
1 Like

Would offset work if ContactLines update both endpoints? Each contact line is created from the point of contact then along a random direction (specified as the normal of the collision) to a length that is scaled from the scalar size of the impulse of the collision.

Pools seem difficult to maintain because the various array sizes that would be needed. However, all color values and width values are constant and fixed. If the need to update those arrays at all were eliminated, it would save a lot of transfers.

You need to define offsets for each vertex. One line consists of 4 vertices.

Theoretically you could create all your lines from [0, 0, 0] to [0, 0, 0] (or [0.0001, 0.0001, 0.0001] - I’m not sure zero length would work - I’ll test it) then you can set the offsets as absolute positions. Using offsets would result in a bit of skewing of the line start/end because if you use thick lines with offsets as absolute positions, because if want to be precise you would want to have different offsegts for each vertex. You can simplify it by setting the offset with the same value for the starting vertices and the same value for the ending vertices.

I’ll have a look at the code.

I generally think just having a fixed size buffer is the way to go, at least for now, since the focus is just on rendering contact points and we aren’t trying to do any other sort of analysis or logic with them atm.

As far as processing all contact points at once versus a per-contact point basis, if you’re just rendering them, then processing all at once seems fine? If you happen to know the layout of the havok wasm buffer that contains the updates or info regarding all contact points of that frame then there’s no reason we cant just copy that into the uniform buffer directly.

If I provided the example shader material implementation I mentioned would that be enough to use on your side to bench mark against the original approach you had? I haven’t run your example at all yet, so I suspect there’s a use-case your current approach supports that mine wouldn’t and I wont see it until I’ve setup something.

I plan on adding a comparison of alternate approaches with timing etc. I also plan on extracting the core mechanisms for stress testing. When I get that accomplished, then your shader material code would be useful. I’m working on constraints and the basic core visualization of constraints mechanisms now. I’m also planning to put it all on github, but it’s not quite ready yet. Monitor this thread for progress.

I’ll post a summary of bottlenecks of my core visualization techniques, similar to the above, once I validate them as fully working and I wring out optimizations.

To summarize: I’m still getting basic functionality, after which optimizations are very welcome.

2 Likes