Offscreen canvas and multithreading

Hi All,

I am working on a project which tries to come up with the right layout of planes positioned in 3D space. For each plane, I need to find an optimal position in 3D so that it doesn’t overlap other planes (there is more criteria that is not relevant to this post).

For each plane, I need to perform various computations which can take some time (1s - 10s). There may be up to 30 planes (objects) for which I have to perform these computations.

I have spent a considerable amount of time coding this project in Babylonjs - what I need esentially is a way to perform these computations in multiple threads.

I have already looked into web workers, but it seems that what I want to achieve won’t be doable using web workers. For one, I am unable to post the objects to the web workers.

The other option seems to me to be offscreen canvasing, since at that point I will fully have access to all the Babylon methods and objects in the scene (without having to send the mesh data via post messages to Web Workers).

So, my question is the following: Do I spawn an “invisible” offscreen canvas that will perform the 1-10s computation for each object and will then return the ideal position in the scene for that plane? Is it possible to “spawn an invisible canvas”? Is there a “better” way to do this? Does Babylon Native support multithreading?

Any help will be greatly appreciated.

Hi semirel,

Very cool problem! Can you share a little more detail about what the 1-to-10 second operation is doing? Parallelism sometimes isn’t the easiest thing to do in JavaScript, but knowing some more specifics of the problem might help give ideas for what the options are. For example, WebWorkers might be the best bet, or it might be possible to solve the problem largely on the GPU, or it might be a complex enough use case to make diving down into WebAssembly (where you can have real threads) worth the effort. It all depends on the nature of the problem; lots of exciting possibilities. :smiley:

1 Like

For each of the objects, which can be at different depths relative to the camera, I create a “plane” facing the camera that fills the view frustum. I discretize the dominant plane on which the object will move and create a 2D grid. For each position, I calculate a value that basically specifies how good that position is for placing my object there (based on different criteria).

For each position on the grid, I perform various computations, one of the most expensive one being finding the closest point on a mesh from that point (I am annotating a model consisting of smaller models - each mesh has a label associated with it).

I was thinking of putting the computations for each object in a separate thread.

Regarding this sentence:
“it might be possible to solve the problem largely on the GPU” - I may not know exactly what you mean by this - I know that the rendering is run by my GPU, but other than that, I am not sure how much control Babylonjs gives with respect to the GPU.

P.S The other computation would be the other post that you addressed for me - the intersection of a line 3D with a mesh in screen space

So, to clarify, is this an accurate description of the problem you’re trying to solve?

You have an arbitrary number of objects observed by a perspective camera. The objects are free to move left, right, up, and down with respect to the camera, but they are not allowed to move closer to or further from the camera. They are also not allowed to rotate. You want to find the ideal arrangement of the objects such that they all (or as many as possible) can be seen with no overlap. Is that correct?

1 Like

Yes. In a nutshell, that is what I am trying to do.

Okay, sorry for the back-and-forth, but I think I’m starting to understand. :smiley: This is a label placement problem, right? You’re trying to generate labels to annotate certain parts of a model, but you don’t want those labels to be on top of the thing they’re labeling nor on top of other labels?

1 Like

This is an early version of the project - although you can see what I mean by labels annotating a model:

https://playground.babylonjs.com/#TL1IIB#49

I have since taken the code offline and run it locally.

Yes. Precisely. Label Placement and View Management

Very neat. (And really cool Playground, by the way!) Label placement is actually a whole subfield unto itself, which you may have already looked into. I can’t guarantee anything I’ll have to say here will be particularly new, but I think it might be possible to get performance faster than 1 to 10 seconds per plane.

Since the placement you’re trying to go for is in screen-space, the entire problem can (I think) be solved in 2D, which removes a fair amount of the complexity from the computation. Also, the fact that the relevant pieces are axis-aligned rectangles makes this whole thing a lot easier, too.

The most flexible approach I can think of to do this is an optimizer, philosophically related to the “simulated annealing” mentioned on the other page I linked to, which sounds a lot fancier than it is (at least in its simplest form). You can kind of think of it as a sort of fuzzy physics system related to gravity—or in this case repulsion.

  • At the start of your computation, project everything you care about down to the image plane; this might include the labels themselves, the feature points they’re trying to label, and any other pieces of the geometry you specifically want to remain uncovered. This is the last time you’ll have to do anything in 3D until the very end of the algorithm, when you’ll unproject the results to put them back into the world.
  • For each label, initialize its position by placing it somewhere; where it starts out doesn’t actually matter that much. A good idea might be to place each label right beside the thing it’s trying to describe, not caring at this point about what else it’s overlapping with. It just needs to start somewhere.
  • From here, you conduct an interative optimization. For each moveable thing (in this case, labels), look at everything else in the world that can affect it (other labels, feature points, important geometry, etc.) and calculate a repulsion “force” generated by proximity to that thing. For example, if the current label is close enough to another label that they’re overlapping, that proximity will generate a “force” to push the current label away from the other label. Note that you only apply the “force” to the element currently being processed; other labels will calculate their own “forces” when they’re processed. Once you’ve aggregated all the “forces” applying to the label you’re considering, apply the implied motion to your current label, then move on to the next label. In aggregate, this will cause labels to “bump” away from each other and slide into better places where they can all be seen and they don’t cover up other important things.
  • Repeat the process above—hereafter referred to as an optimization pass—as many times as you want to. A typical iterative optimizer will require multiple passes to find a good solution; but because each pass only does very simple operations, they should all be pretty fast. One thing that’s often done to improve performance is to multiply the applied motion per pass by an “energy factor” which grows smaller over time, reducing the impact of later passes once the algorithm is presumably approaching a good solution. This is the key mechanic behind the “simulated annealing” technique referenced by the Wikipedia page linked above.
  • Once you’ve done enough iterations that you’re happy (you can check the global error—the sum total of all “forces” computed in a single pass–if you want to, but it’s often sufficient to just pick a good number of iterations), reverse the projection math to send all your labels back to the planes from which they originally came. This will give you the optimized locations for your labels to be placed.

The reasons I recommend an approach like this are (1) because it’s surprisingly easy to implement, (2) it can be made to run quite fast, and (3) it’s extremely flexible. Basically, the entire behavior of the algorithm is controlled by “tuning parameters” that define the amount of force generated by different kinds of proximity. If your labels are getting too close together, change the fall-off rate for force generation over distance. If you want to add new kinds of things to avoid and move away from, it’s as simple as projecting them into the 2D space and generating forces to move away from them. If you want your labels to conscientiously stay close to something instead of moving away, you can create attractor forces that work within exactly the same environment; only the direction of the force is different. You can even create forces to make a label try to stay a specific distance from something else—not too close, not too far—and the system will attempt to make that work while balancing that consideration against the other forces you’ve told it to consider. And if you find that one kind of force is too strong versus another, you can weaken the former or strengthen the latter. All by just changing tuning parameters—hard-coded floating point numbers—and without reimplementing any part of the algorithm.

Optimizers like this are extremely powerful and flexible, and as mentioned above they can be made to run extremely fast. These are actually the core algorithms underneath certain real-time computer vision algorithms like SLAMs (think ARKit and ARCore), which do optimizations like this for every video frame they try to track. What I meant by “solve the problem largely in the GPU” is that you could theoretically do something like this in a shader, since each individual optimization pass is completely parallel-izeable. However, for a relatively constrained number of labels like 30, that’s probably way more complicated than you need.

Okay, by now I’ve rambled on for far too long, but the last thing I want to point out is that something like this can very easily be done in a WebWorker. You only need a small amount of information from the scene—the 2D projected positions of important features—in order to run the optimization, so you could very easily have a WebWorker running in the background that accepts updates about the 2D projections and the cameras, runs the optimizer, and reports back with where the labels should be. Whether or not you actually need such a WebWorker depends on how fast your optimizer would end up being; if its fast enough, you might not need another thread at all. :smiley:

Anyway, sorry if that ended up way too involved or unrelated; I get excited about tech like this. Hopefully some if it was helpful. If you have any questions or other involved aspects to consider, let me know!

2 Likes

Thank you for the longer post and reply :slight_smile:

I am fully familiar with the things you are describing - the research area of label placement, simulated annealing, artificial potential fields etc … In fact, what I am writing to you is regarding a research paper - if it gets published, I will make sure I send you a copy :slight_smile:

My placement and manipulation of the labels is done in object space - the way I do that is that for each label, I generate a plane (facing the camera), project all other labels and meshes on this label dominant plane, and then once it’s discretized into a grid, I calculate a value for that grid position. That value depends on “fine-tuning” parameters like you said, and calculating penalties etc. Basically, every label has its own force field (https://www.cs.cmu.edu/~motionplanning/lecture/Chap4-Potential-Field_howie.pdf).

The label is moved in object space in a manner that it lies in the plane facing the camera, i.e. its own field. Depending on where the label is in its dominant plane, its size can vary in screen space - one of the main selling points is that I manipulate the labels in object space.

I will take some time to digest the information in your post. I just realized from your post that I can leverage web workers by sending essential information from the scene (like projection matrix for example).

I am including the following images to hopefully bring to illustrate the idea better:

Image 1: Camera is zoomed out - spheres represent positions on grid (ignore last row, bug):

Image 2: Plane for each label

1 Like

This paper sounds awesome!

Sounds like you’ve got your algorithm situation all figured out, really neat approach. And yep, if you strip down the WebWorker logic to the bare math (and whatever else is minimal), that’ll make it easier/faster to create them. The only other thing I’d mention is that you might see diminishing returns on performance from creating lots and lots of WebWorkers. You’d mentioned one per object in a scene with up to 30 objects somewhere above; 30 is quite a lot of WebWorkers, and there’s a certain amount of cost to creating each one, so at some point there’s a tradeoff between having lots of WebWorkers versus having fewer working together to burn through a work queue. Probably not a huge deal, but might be worth a perf profile.

Once again, sounds really cool; this stuff is fun to talk about. :smiley: Good luck with the paper, and definitely keep us posted about it!

I saw significant improvements to the overall runtime (5-8 times faster) with using function getClosestFacetAtCoordinates instead of my own function. For each position in the discretized grid, I find the “anchor”, i.e. a point on the mesh from which the connector line emanates, by getting the closest point on the mesh. I stumbled upon FacetData and enabled it on the meshes. I encountered an issue with it (found in post: getClosestFacetAtCoordinates returning null most of the time), but this looks very promising.

Including the previous function:

/**
 * Finds the point that is closest to the part's mesh. 
 * @param {Vector3} point
 * @return {Vector3} Closest point on the Part Mesh
 */
closestPointOnPartMeshTo(point) {
    var positions = this._partMesh.getVerticesData(BABYLON.VertexBuffer.PositionKind);
    var minSquaredDistance = Math.pow(10, 9);
    var closestVertex = BABYLON.Vector3.Zero;
    var numberOfVertices = positions.length / 3;	

    for (var i = 0; i < numberOfVertices; i = i + 1) {
        var vertPosition = new BABYLON.Vector3(positions[i*3], positions[i*3+1], positions[i*3+2]);
        var dsq = BABYLON.Vector3.DistanceSquared(point, vertPosition);

        if (minSquaredDistance > dsq)  {
            minSquaredDistance = dsq;
            closestVertex = vertPosition;
        }
    }    
    return closestVertex;
}