So, this is a little tool I did some time ago and was here getting dust. =)
It uses Mediapipe for face mesh capture – plus a trick of mine to convert it into textured 3D face model on-the-fly =) – and Babylon.js for 3D rendering. It can export the captured face as gltf.
This is amazing! I believe this technique can be used for virtual space live conferencing calls by mapping webcam feeds to this and then map onto avatar meshes. Current approaches only display a square mesh of webcam feed beside the characters (could be for performance purposes).
Close to that, but not so straightforward. MediaPipe did not provide the 3D mesh directly, had to extract that manually. Also, coordinates were screen-space and the video was not directly mapped to the 3D mesh (there was no uv mapping). Needed to generate unwrapped mesh and remap the face roi to uv map to make it possible to generate the final 3D face. That happens in real-time.
The result face mesh is normalized. I can multiple x and y with the width and height of the image. What can I do about the z axis? How can I scale the z axis to make the mesh look good?
For your code, how can I run it locally on my local Linux server?
Do I need to configure node.js and how?
Not sure I understand your questions, so I’ll try to answer based on how I understood.
For [1], the rendered face is in 3D already, so you can scale it equally by x, y and z. Are you referring to modified code or the original one?
For [2], it is a standard web page (no node.js used), so you only need a web server on your local machine. I use Apache, but you can use anything, like httpd, xamp or even that python built-in micro server.
Check the javascript examples. What I did was convert the screen marks (which were essentially in 2D coordinates with unused z) to a real 3D mesh, by reprojecting them and auto generating UV for the mesh.
Hey, and thanks for providing this great mediapipe example. I used it to create a new playground which doesn’t rely on a fixed camera position, and have posted it to the demo forum. There were some scale issues which were likely introduced to align the scene camera to the content, which I removed, and I added filtering to the landmarks to reduce some of the frame jitter.
Hey, and after some searching I discovered that you could access the geometry and pose transform without having to transform the landmarks from screen space. Each frame produces updated vertices and UVS, which appear a little stable than the transformed landmarks. It also seems to solve the problem of placing the camera in front of the transformed mesh.
Enable geometry and set camera fov:
const facemesh = new FaceMesh({
locateFile: file => {
return `https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh/${file}`;
},
});
facemesh.setOptions({
// Camera specific and should be derived
cameraVerticalFovDegrees: 43.3,
// Enable per frame mesh generation
enableFaceGeometry: true,
maxNumFaces: 1,
minDetectionConfidence: 0.65,
minTrackingConfidence: 0.65,
selfieMode: true,
});
Mediapipe includes some tools for determining camera distance using iris width (mediapipe | iris), but they are not available in js. Using the following image I have been able to convert the eye landmarks into a world locations and to find their center, much the same way.
I will post when I have a working solution, and have opened some questions in another thread.
r