How to Scale a 3d scene to fit a video feed from media pipe

i am to make a T-shirt try-on app , am using mediaPipe for this ,each render loop I give a frame to mediaPipe and it returns a array of points like this `

index 0 - nose
index 1 - left eye (inner)
index 3 - left eye (outer)
...and so on 

each point has three values like this 

x,y - normalise to video dimensions ,(think this might be in ndc space of the input frame)
z - depth

My goal is to move a rigged model using these points in 3d ,how can i overlay a 3d obj form Babylon scene to on top of a point in screen scape , is there a way to transform the 3d scene as to match it with real life video feed captured by the camera

Hello :slight_smile:

Aaaaaahh :smiley: . I have done exactly this a few years ago in my previous Start-Up so I know exactly what you mean :grin: . First of all → Not an easy task ^^ And be happy to have MediaPipe because at the time I had nothing but Python… The landmarks detection pipeline is even a worse task :see_no_evil: Ahah

A few hints :

  • If you are using a mono camera (no stereo), you must know that you can really rely only on X and Y. The Depth will be highly estimated.
  • Your first task would be to project this (X,Y,Depth) to 3D points in the 3D space
  • Then going from this set of 3D points to Rigged Mesh :
    • It highly depends on your rig. You cannot use just any rig, the rig itself should be designed to fit this “2D” estimation.
    • Most likely you will need to deal with higher level bone targets and Inverse Kinematics (bones which you directly force a position to, and the rest of the rig follows)
    • If your app is only for the face (not the full body) there are some ways to estimate the 3D pose based on 2D landmarks (then it’s easy to be use on a rig since you have euler rotations of the head and you can directly force it to the bone rotations)

thanks for the help , will try this out .