Marker Tracking and Webpiling

Welcome to the marker tracking and webpiling discussion! This is the entry point for conversation stemming from the “Marker Tracking in Babylon.js” blog post.

Some relevant links:

Thanks for joining the discussion!


This is excellent!

Hi @syntheticmagus

Thanks to share. Great job!

1 Like

Hi @syntheticmagus, wonderfull work :muscle:
I read the blog, I played a little bit with the playground and I decided to try myself,… aaaand it is working :metal:
but I tried to change few things.
As example I’m using a custom script for camera, because CreateFromWebCam function all the time open front camera, and I wanted to be able to switch between cameras.
Then I wanted to use Layer instead of simple plane, but it seems is not working :slight_smile:
Then I tried to make the plane to fit the screen, this seems to not work too :frowning:

So I want to ask you if you can tell me what is the magic with this plane :sunglasses:
I’ll keep playing with it.

Hi MarianG,

Thanks for trying it out, looks awesome!

I think there actually is a way to make CreateFromWebCam() work from other cameras. It looks like it might be undocumented, but CreateFromWebCam’s “constraints” argument takes an optional deviceId parameter, which I believe corresponds to an element of the output of MediaDevices.enumerateDevices(). Definitely not very easy to use, but I believe that that’s how to choose which camera you get your video texture from.

The plane was just put in as a cheap way to render the texture behind the tracked object, and I do mean cheap. It’s definitely not the right way to do that for anything more than a demo, so I’m super happy to hear you’re investigating alternatives.

I’d hadn’t seen layers before; pretty cool, but they don’t appear to support video textures, sadly. That might be a worthwhile feature to add, though, especially if we want to enable more camera-based experiences powered by Babylon. :grin:

tl;dr: The next three paragraphs are all about camera intrinsics alignment for AR scenarios. I wrote them all before it occurred to me that I don’t really know why your screen-filling plane didn’t work, though I do have a theory. If you’re already familiar with AR camera concepts, you may just want to skip ahead a bit. :upside_down_face:

The trick with trying to make a plan that fits the screen is that, for the marker tracking illusion to work, your virtual camera’s intrinsic parameters (most importantly field-of-view and aspect ratio) have to match those of your real-world camera. Canonically, most AR experiences have required users to “calibrate” their cameras in order to learn their intrinsics; this can yield high-quality experiences, but requires a lot of overhead for the user. To avoid that, in my Playground I simply “guessed” the intrinsics and hand-tuned the plane in the scene to look correct with the output it was receiving.

Guessing the field-of-view may or may not be viable depending on the experience you’re going for (most cameras a pretty similar, so a good middle-of-the-road guess will probably work unless you are trying to do something very precise). Where it gets more tricky is in aspect ratio. Real-world cameras are much more constrained than virtual ones vis a vis the aspect ratios they can output, so generally you’ll have to try to make your virtual camera account for the idiosyncrasies if your real one. Ideally, that just means rendering to an output that has the same aspect ratio as your camera’s input, which will allow you to essentially “crop” your rendering to the edges of the plane you see in my original Playground. If you can’t do this, the second most common approach is to render an area that fits “inside” the image from your camera. For example, if your camera returns a 4:3 image but you have to render to a 16:9 output, most approaches would have you crop out the top and bottom of the image from your camera, the aim still being to fill your rendering background with the image from your camera even if you can’t fit all of it in. The downside to this second approach is that, while it does give the appearance of a “full screen AR experience,” the math required to get your camera parameters to stay synchronized across multiple aspect ratios is not trivial.

To avoid fighting with that math, and to allow the experience to work properly in the Playground where I don’t have much control over the aspect ratio, I took a third option: I rendered the video texture to a constant-size plane that was allowed to fit completely inside the virtual camera’s view frustum. In other words, for simplicity and use in the Playground, I fully decoupled the real and virtual cameras. This, like I said, is almost certainly not the best approach for any production deployment. But I really wanted to be able to simply and easily show the behavior in the Playground; so to make that happen, I chose this approach.

You didn’t specify how your screen-fitting plane doesn’t work, so please correct me if my guess is wrong; but I’m guessing its because pegging your virtual camera to the render target’s aspect ratio caused the camera’s intrinsics to get out of sync with the real-word camera. This problem would manifest as an apparent and varying “offset” between your video texture background and your 3D-rendered foreground: you move the marker, and the object that’s supposed to track it moves in the right general direction, but it’s in the wrong place and it doesn’t move the right distance. The easiest way to address this problem will be to force your render window to be the same aspect ratio as your webcam’s output; that would allow you to hand-tune your plane to go from edge to edge of your render target and still match up with the motions it’s tracking in the video. This is the first option I described two paragraphs ago; options two and three are also viable, but they have definite downsides.

After so long a reply, this is probably the wrong point to ask, but…did any of that address your question? I think I covered all the “magic” about that plane; but if you have any more questions – about that or anything else in this arena – please ask! After ten paragraphs this is probably obvious, but I love talking about this stuff. :smile:


Well, nice explanation :sunglasses:
Thank you for taking the time to explain me
And to answer to your question: -Yes, this is exactly what i wanted to hear.
I called it ‘magic plane’ but basically, yes, I wanted to know more not only why you put a plane in the scene :))… and I really apreciate that you told me the process behind too.

Now, about deviceId param from CreateFromWebCam function, somehow I didn’t see it, but seems to be exactly what i wanted so I’ll definitely try it.

Oh, and on layer we can use videoTexture too eg. I like more layer than a plane as example because once I set the layer as background I don’t have to take care that maybe some objects will go throw it.
But basically I understood, doesn’t matter what I’m using for render the videoTexture, I have to syncronize device camera output with scene camera.

In my head I’m haveing few more paragraphs to write and ask maybe but first I’ll play with what i learned till now, but I’m almost sure that I’ll come back to ping you again.:smile:


Hi @syntheticmagus,

Can you please suggest me that, is Babylon works for 2D image cropping, slicing the part of images and set back into the original image. Are these kind of features available in the library.

Your answer appreciated in advanced.

Hi surajprajapati,

Welcome to Babylon! As far as I know there aren’t too many existing features surrounding image editing checked into Babylon.js today. You can accomplish some things with PostProcesses and the creative application of other features for things like compositing, but the kind of texture manipulation I think you’re asking about hasn’t really been a priority for Babylon historically. (@Deltakosh can contradict me if I’ve forgotten some existing features here. :smiley:)

But! As part of our recent explorations into bringing native computer vision capabilities to the Web, we have a few experiments that bring some of OpenCV’s powerful image editing capabilities (like GrabCut and Seamless Clone) to WebAssembly utilities that can then be easily used through Babylon.js. We’ve now proven that this is possible (it’s not even particularly difficult) and all we’re waiting for is a reason to move forward. So if you have a use case – if there’s something you want to do that could make use of advanced image editing capabilities in Babylon – please let let me know!

This is awesome! I’m looking forward to trying it but currenty getting a build error and don’t have time at the moment to fix it:

Question for you, with a lot of the AR packages currently using a cloud of points to “mark” a position is this something you’re looking at getting into in the future or is this package going to be only developed with markers?

Hi i73,

Good catch on the errors; it looks like some scoping behavior has changed with the new Playground that’s causing some of the constants defined at the top of the JavaScript to not be defined when they’re used later. I’m not sure why that’s happened – have to think about that – but as a temporary workaround, if you replace the uses of those constants with the values they’re defined to be, that problem should go away.

Regarding future features, the explorations originally described in this topic are now housed under the umbrella of the Babylon AR project (which we’re probably going to rename to Babylon CV). At the moment, the only tracking technology implemented there is marker-based. We definitely want to keep exploring new capabilities and adding features; we just haven’t gotten there quite yet. :smiley:

I’m extremely impressed, just one question, I notice the model has a jittering issue even when the filter level is set to 1, is this just my machine or is there small bugs to work out?

Bugs for sure; that demo is definitely not at ship quality. :smiley:

To elaborate, that demo was just a proof of concept, and it almost directly outputs the raw results from OpenCV 3.2’s ArUco single-marker tracking. Those results tend to be very noisy, so for demonstration purposes I slapped a naive low-pass filter on top, which creates a trade-off between jitter and “lag” or “swim.” That is not a “real” solution to that problem; it’s just a stopgap to make the demo look nicer.

“Real” solutions to tracking problems come in a variety of forms, many of which we’re exploring in that demo’s spiritual successor, Babylon AR’s ArUcoMetaMarkerObjectTracker. That tracker’s still in alpha, but it uses things like ArUco boards (instead of single markers) and more advanced (Kalman-inspired) filtering to dramatically improve tracking quality. There are more things we want to try, too, to drive quality and performance even higher; that’s all work-in-progress. :smile:


Yeah I’m thinking the way would be { maxWidth: 640, maxHeight: 480, deviceId: videoDevices[userChoosenCamera].deviceId });

and of course:

navigator.mediaDevices.enumerateDevices().then(function(devices) { var videoDevices = devices.filter(device => device.kind === "videoinput");

Assuming there’s no better way to natiavly get the back camera. And please anyone else tell me if there’s a better way! :laughing: