A better computeWorldMatrix is needed

I dunno if this is a “I’m holding a hammer so the solution must be a nail” but spitballing a bit here, instead of having to propagate changes by setting a flag, would you be able to achieve what you want with a onWorldMatrixInvalidatedObservable?

The developer (or the API) could subscribe to these change events arbitrarily, which could allow a deterministically ordered change publication that “looks like” the behavior you’re seeking, meaning you’d potentially avoid heavy CPU traversal of node hierarchies and updates of unchanged matrices since you would only have to handle events you’re interested in.

Thank you for the playground. Hopefully now there is an example to work from the experts can work on possible solutions using existing methods or new ones if needed.

This reminds me of a long time ago when I was a mathematics undergraduate at Leeds university. Our lecture was writing out a proof on the blackboard (it was that long ago). At one point he wrote on the board “from this it is obvious that our hypothesis is true.” He stopped for a moment and looked at the board, after a pause he stepped further from the board and read what he had written. He then proceeded to the back of the lecture room and studied his work further, strocking his chin as he pondered. He walked back to the front got out some sheets of paper and began to write furiously bent over at the desk. After several minutes he straightened up and looked at us with relief on his face. “Yes people,” he said “it is obvious,” and proceeded to to continue to the next step of the proof. :slightly_smiling_face:

2 Likes

I am a simple minded person and like simple solutions where possible. Given you gave a simple example in your PG it may be that my simple solution is not one to use in your more complicated situations. However here goes.

A bit of simple explanation (teaching your grandmother to suck eggs type thing)

As far as I understand it the BJS engine takes your code and

  1. Interprets it, updating values just as it say on the tin, ie A.position +=1 updates the stored position property of A by 1 but that’s all nothing to do with rendering
  2. does rendering calculations, eg updates node matrices
  3. does the rendering.

So in this PG https://playground.babylonjs.com/#3P8L7E#14 all the code rendering calculations are done before the actual rendering and the getWorldMatrix values are correctly calculated but only after all the code is updated so only final values are available.

For correct rendering changing for example a position value would need its matrix calculating after the change. Suppose (as I think you do) you want interim node matrix values immediately after a property has been changed eg

nodeA.position.x += 1
// force matrix for nodeA to be computed rather than cached value
nodeA.getWorldMatrix();

Since the team is thinking in terms of get rather than compute the matrix and since computeWorldMatrix returns the computed matrix why not create a function

getNodeMatrix(node) {
    return node.computeWorldMatrix(true);
}

along the lines of https://playground.babylonjs.com/#3P8L7E#15

You could of course tidy it up to use node.getNodeMatrix() and allow a true or false parameter if you want to

As I said I might have missed the complexity of your situation, if so I hope others can help.

The overall goal of Babylon.js is to be fast. One of the most consuming task is to keep the graph updated. Hence the decision I made to compute world matrix only when required by the rendering loop.
The system caches all the world matrices and update them only when needed. There are already flags internally to avoid unnecessary computation.

@mrlooi: We won’t change the current model because as @JohnK it is easy to simply force a computation if you are outside the rendering loop. You can also do all your operations using the onAfterRenderObservable where you have the guarantee that the graph is updated.

Your proposal is great if we want to keep the graph always up to date. Which will also KILL the performance when the animation system will update 150 nodes and 10 skeletons because we will have to go down the graph a thousand or a million times simply to propagate the dirty flag

3 Likes

+1 to what Deltakosh said. While I appreciate your perspective on this and thank you for the Playground example, I think the performance cost of a feature like this will be simply too high.

Unless I am greatly mistaken, there is no engine where in-frame logic like this, which requires updated state propagations through a very large hierarchy, will be efficient. In Unity, this will do exactly what Deltakosh mentioned, traversing the hierarchy thousands or millions of times setting dirty flags, which should show up very clearly in performance traces. While C# and C++ are fast enough that this might not impact framerate if you have enough additional bandwidth in your CPU usage, JavaScript suffers from this kind of operation much more quickly. If you have a Unity install and want to validate whether what I’m saying is true, it should not be difficult: create large and small transform hierarchies in Unity, manipulate them both at different times while taking a trace, then check the trace to see whether/how CPU usage changed.

While I respect your assessment, I disagree. As described above, large coherent hierarchy operations are complex by nature, and though engine-level implementation can certainly make such an operation slower, there is no engine-level choice which can make such an operation fast. In fact, I would suggest that Babylon’s approach can allow this to be done faster than with an auto-propagation scheme, though the onus to do this and understand it is on the developer. Consider the following hierarchy

Node1 <- Node2 <- Node3 <- ... NodeN

where you have a manual operation which needs to be performed on every node of this hierarchy using “to-the-moment correct” world matrices (as opposed to “to-the-frame correct” world matrices). Without auto-propagation, it is possible to do this with minimal cost if your operations can be ordered from the bottom of the hierarchy toward the top: operate first on Node1, then on Node2, etc., in every case calling computeWorldMatrix() and never having to call computeWorldMatrix(true). Proceeding in this way will complete the entire operation while only traversing the hierarchy once. Contrast this with an auto-propagation mechanism: the Node1 operation will traverse N nodes setting dirty flags, after which the Node2 operation will traverse N - 1 nodes, etc. A Gauss sum tells us that the overall cost of this operation in node traverses is N * (N + 1) / 2, which is O(n^2), whereas the cost of the non-auto-propagated approach which traversed the hierarchy only once was O(n). Note that achieving the linear performance requires the operations to be sorted in order of dependence; if node operations need to be doable in random order with fully correct world matrices, I know of no way to do this generally in sub-polynomial time.

In short, unless I have misunderstood something, an auto-propagation scheme will always perform an operation such as this with guaranteed low efficiency, whereas without auto-propagation it is at least possible, by taking advantage of the underlying implementation, to do it comparatively efficiently. This is exactly the scenario we ran into in the Unity project I mentioned earlier; but because we were using Unity and couldn’t prevent it from auto-propagating through the hierarchy, we were forced to resort to tricks and scene structure changes on our end to work around the inefficiency.

When you need correct world matrices to do operations, it is definitely more convenient and easier to understand to have those world matrices propagated for you. However, the performance cost of having such propagation is much higher than one might initially assume, and so a tremendous amount of performance in high-stress situations can be saved by allowing users to optimize their approach to world matrix computation. This is one of those rare scenarios where two of Babylon’s objectives come into conflict – in this case, “Powerful” and “Simple” – and as Deltakosh said, we’ve so far assessed the gains in power to be worth the cost in simplicity.

But, with all that said, if you still believe an auto-propagation mechanism would be worthwhile (whether because you believe that the performance cost would be lower than we estimate or that the usability gain would be higher), please give it a shot! Make a new branch in your Babylon.js repo, add the flag you suggest, and run a perf comparison assessing the impact of the flag on runtime performance in large hierarchy and animation scenarios. I, as I’ve stated, suspect that the increase in runtime cost you observe will be prohibitive; but I’ve been wrong many times before, and if you happen to find a way to do this that brings the simplicity benefits without incurring the penalties to power, that would be absolutely awesome and I’d love to see it!

Hope this helps in clarifying our perception of the situation, and thanks again for giving such awesome and passionate feedback on Babylon!

4 Likes

@Deltakosh @syntheticmagus Thanks for clearing that up! Definitely agree with most of the points mentioned!

I suppose in the end it depends on how often world matrices are needed. The tradeoff between propagation vs re-computation is a tricky one, and it seems that in most cases it probably won’t be an issue with the current method (after all you guys are the experts). That said I’ve seen cases where world matrices are constantly needed, and almost always it’s never a simple case of simply “operate first on Node1 down the chain without forcing computeWorldMatrix(true)”. That practically never happens due to complexity of chains and lots of moving nodes, not to mention code spread out across different parts of the project (far more than a simple PG); that at least given the current API, devs are mostly forced to use computeWorldMatrix(true) all the time.

Re the propagation, it does not always have to traverse the entire graph. In the case of A->B->C->D, if B is dirty, by extension C+D are. So if A becomes dirty, it does not have to traverse down to C+D since B is already dirty.
Then on computeWorldMatrix(true), all these dirty flags are “cleared”

Hope this helps in clarifying our perception of the situation, and thanks again for giving such awesome and passionate feedback on Babylon!

Thanks for taking the time out to provide all the feedback and discussions!! I suppose this is what makes the BJS community great, and how good progress is made :grinning:
I’m glad to be a part of it!

2 Likes

When I saw that IK was being used as a use for having ‘always correct’ matrices, I started looking at my code. Here is a shot of an IK editor I have.

ik

There are both IK controllers (disks) and bone look controllers (squares) on all limbs & neck. There is also a mesh for the root bone / pelvis which is for both translation / rotation. The mesh structure never moves.

I do not really even care about efficiency in the editor. I just want it to record locations of the IK / look meshes for a limb which are combined to constitute a pose.

When different poses are actually are actually being interpolated between, the meshes will be replaced with transform nodes, which have no draw overhead. I am performing animations with my own before render based sub-system. For performance reasons, Nothing is ever allowed to compute world matrix in the render loop. Everything is frozen.

If an IK/Look pair moved in a particular frame, I also call freezeWorldMatrix() on it, which always does a computeWorldMatrix(true), then re-freezes to lock out scene from doing it again. All of the control meshes are parented to the character. So, the character, even though it too is frozen, would get re-calced up to 11 times.

I think the force arg, true, could be removed from the call in freezeWorldMatrix() to avoid the 11 calls. I am not sure anyone even knows it is there, as I was the one who put it in.

I am not going to touch on bones for this thread, but see there is a way to calculate this when you wish without doing it twice. It is called freezeWorldMatrix().


EDIT: Since, my parent / character mesh should not be dirty, in my particular case the parent matrix should not be calc’ed 11 times, since freeze is evaluated before force. See

I am still not sure force needs to be applied. Maybe an option arg on FreezeWorldMatrix()?