WebXR Body Tracking

Excited to share that Babylon.js now supports full-body tracking via the WebXR Body Tracking spec. 83 joints - torso, spine, arms, hands, legs, feet - all tracked and updated every frame, ready to drive a rigged avatar in real time.

If you have a Mixamo character, getting started is pretty straightforward:

const bodyTracking = xr.baseExperience.featuresManager.enableFeature(
  BABYLON.WebXRFeatureName.BODY_TRACKING, "latest",
  { bodyMesh: myMixamoMesh, isMixamoModel: true }
);

Custom rigs are supported too - just provide a mapping from XR joint names to your skeleton’s bone names.

Currently works on: Meta Quest (2, Pro, 3, 3S) via the Meta Quest Browser.

Playground demo: https://playground.babylonjs.com/#0FOISU#2

Full docs are up on the doc site at Babylon.js docs with all the config options, joint reference tables, and examples. Would love to see what you build with it :slight_smile:

11 Likes

Very nice, but sad these things do not work on link cable or wireless for PCVR. So many nice WebXR things not working for high performance graphics with nice visuals. I have had a small chat with Gemini which suggested that since I use Tauri for hosting the VR application it could be possible for adding a component in the Rust wrapper they compile for WebView2 to access the hardware like haptics which is not working either through cable/wifi PCVR. But supposedly it would then have to be run through SteamVR which has OpenVR which supposedly supports parallell access to the hardware. I will likely attempt creating this at one point as I feel haptics is such an important part of VR.

1 Like

I tried the quest browser, it was worse in every way compared to a link cable, except controller haptics only work on it XD silly

Yes I feel aiming for standalone WebXR to run on the device like the Quest 3 is hopeless unless you are making something very low poly or a completely baked scene with little interaction. It is a shame that the WebXR support through PCVR link is not supported better as I feel the potential is massive here for nice JavaScript based VR development. There surely is an opportunity here to make a nice wrapper like Tauri but that also has some direct support for reaching the hardware. Tauri’s Rust wrapper can also have separate processes running on native threads that could be used to gain access to way more CPU which is the biggest bottleneck of Babylonjs. It is kinda funny as when I run the app render timing overlay from Oculus Debug Tool of my VR world and look at the numbers my GPU is practically idling at 0.5ms for a full scene with lots of instances and thin instances (I want to add more), but the CPU is basically the bottleneck making me sometimes miss the window to have smooth 72hz, even though the Babylonjs scene.render is at 5ms or lower. So there is a lot of overhead here through the WebXR layer too that adds up quickly.

Thank you! This was perfect timing, as this is one of the things we wanted for our research project.
I recorded a quick clip from another user’s perspective as soon as I got it working. The Quest 3 body tracking AI does not seem to model crouch walking :grinning_face_with_smiling_eyes: .
Avatar used: https://github.com/BabylonJS/Assets/blob/master/meshes/dummy3.babylon

3 Likes

Holy c**p, this is beautiful!! and working so well :slight_smile:

will you be able to share your mapping as a playground or a live demo? I would love to try it out as well

Here is the playground version https://playground.babylonjs.com/#ISDTR4#10 . This is completely vibe-coded.

1 Like

I asked my agent to summarize how the demo was implemented, the findings, the challenges, areas for improvement, and any open questions. If you know the answers to any of these questions right away, that is great. If not, I hope this helps with further development.

A bit more context on what we built.

The original work was not a playground-first experiment. We first built this in our app as a multiplayer avatar sync path. In that code, one XR user is tracked locally, their body/hand/finger pose is serialized, and another client reconstructs that pose on a visible avatar. So when I say “remote avatar” or “remote rig”, I mean the target avatar on the receiving side of that sync pipeline.

We ported the same retargeting logic into a local demo. In the playground there is no actual networked remote user. Instead, it mirrors the same sender/receiver idea locally:

  • a hidden source avatar/body driven by local tracking
  • a visible target avatar that reproduces that pose locally

So the playground is basically a local mirror of the avatar-sync retargeting path, not a completely separate implementation.

A few clarifications from the experiment:

  • The demo uses the Babylon dummy3 avatar, which is a Mixamo-style rig.
  • BODY_TRACKING is the foundation for the main body pose.
  • The issue was not that BODY_TRACKING lacks hand/finger joints. According to the WebXR body-tracking spec and Babylon docs, those joints do exist there.
  • The real difficulty was retargeting those tracked hand/finger poses cleanly onto this specific avatar rig.

What we found in practice:

  • There does not seem to be a standard off-the-shelf “WebXR avatar rig”.
  • The built-in Mixamo body path was a very good starting point for torso, head, arms, and legs.
  • The difficult part was hands and fingers.
  • One remaining issue from testing is that hand and fingertip positions are still not as accurate as in our earlier hand-tracking-only setup with separate hand models.
  • Direct quaternion transfer for fingers did not work well on this rig.
  • For the non-thumb fingers, we got much better results by computing curl/bend values and applying them relative to the rig’s bind pose.
  • For the thumb, we needed thumb-specific local axes and sign fixes.
  • For wrist/hand orientation, we had to derive a palm-basis correction from hand joint positions.
  • We also found that replaying tracked local translations onto every body bone of a fixed Mixamo rig caused distortion/twisting, so keeping bind-pose offsets for most body bones worked better.
  • We also had a root-yaw issue at one point where extra yaw was being layered on top of already tracked body yaw.

So the practical conclusion for us was:

  • BODY_TRACKING gives the main tracked body pose and is the right base.
  • But for a real avatar rig like dummy3/Mixamo, “tracked finger joints exist” and “there is a clean built-in finger retargeting path for this avatar” turned out to be different things.
  • In our case we had to build custom retargeting logic for fingers and hand orientation.

A few questions that would help us and probably other developers too:

  1. For Mixamo-style avatars, is there a recommended Babylon path for full fingers with BODY_TRACKING, or is custom finger retargeting currently expected?
  2. Is the intended/recommended Babylon approach for avatar hands on Quest to use BODY_TRACKING only, HAND_TRACKING only, or a hybrid approach depending on the rig?
  3. Are the hand/finger poses exposed through BODY_TRACKING and HAND_TRACKING effectively derived from the same underlying runtime source on Quest, or are there practical differences in filtering/stability/intended use?
  4. Are jointScaleFactor and preserveBindPoseBonePositions the recommended way to handle automatic proportion fitting for avatars with different proportions?
  5. Can automatic proportion fitting work well even when the avatar rig is not a close one-to-one WebXR skeleton match, or is it mainly intended for rigs with much closer correspondence?
  6. Would it make sense for Babylon to provide more helpers/examples for this workflow, such as:
  • a reference full-body avatar demo with fingers on a real humanoid rig
  • helper mappings or retargeting utilities for common rigs like Mixamo
  • a debug view showing which XR joints are mapped vs unmapped
  • docs that more explicitly separate “BODY_TRACKING exposes finger joints” from “this avatar rig has no built-in full finger retargeting path”

The feature is already very useful. For us, the biggest challenge was not getting tracked data, but making a real avatar rig interpret that data correctly and consistently.

2 Likes