Realtime 3D Face Capture

imerso · May 4, 2022, 4:08pm

So, this is a little tool I did some time ago and was here getting dust. =)

It uses Mediapipe for face mesh capture – plus a trick of mine to convert it into textured 3D face model on-the-fly =) – and Babylon.js for 3D rendering. It can export the captured face as gltf.

Demo: face
Source-code: GitHub - imerso/facecap: Babylon.js + Mediapipe face capture

Requires a webcam!

carolhmj · May 6, 2022, 5:48pm

this is INSANE! SO! COOL!

Wow Amazing GIF - Wow Amazing Great - Discover & Share GIFs | Gif, Cool gifs, Amazing

brianzinn · May 6, 2022, 8:53pm

would be interesting if you published to github pages to make it more accessible!

Edit: click the demo link if you are as blind as me!

imerso · May 7, 2022, 5:01pm

It is on github pages, the link is on the original post. =)

Khoa_Pham · May 8, 2022, 4:38pm

This is amazing! I believe this technique can be used for virtual space live conferencing calls by mapping webcam feeds to this and then map onto avatar meshes. Current approaches only display a square mesh of webcam feed beside the characters (could be for performance purposes).

Josip_Almasi · June 6, 2022, 11:55am

Great stuff!
Right, webcam or stream, all the same, just pass the video element (that can be hidden) to MediaPipe constructor.

imerso · June 6, 2022, 12:05pm

Close to that, but not so straightforward. MediaPipe did not provide the 3D mesh directly, had to extract that manually. Also, coordinates were screen-space and the video was not directly mapped to the 3D mesh (there was no uv mapping). Needed to generate unwrapped mesh and remap the face roi to uv map to make it possible to generate the final 3D face. That happens in real-time.

Josip_Almasi · June 6, 2022, 12:29pm

But now that you did all that, it is that straightforward!
Cool

Alex_Wen · November 5, 2022, 7:30am

I’d like to run it to process images and export to models in a batch. How can I do that?
Thanks.

imerso · November 7, 2022, 11:59am

That would be perfectly possible, but with more development. The sample can export only a single camera image at once.

Alex_Wen · November 8, 2022, 8:06pm

Thanks for the reply. I’m doing this in Python.

The result face mesh is normalized. I can multiple x and y with the width and height of the image. What can I do about the z axis? How can I scale the z axis to make the mesh look good?
For your code, how can I run it locally on my local Linux server?
Do I need to configure node.js and how?

Thanks.

imerso · November 9, 2022, 10:48am

Not sure I understand your questions, so I’ll try to answer based on how I understood.

For [1], the rendered face is in 3D already, so you can scale it equally by x, y and z. Are you referring to modified code or the original one?

For [2], it is a standard web page (no node.js used), so you only need a web server on your local machine. I use Apache, but you can use anything, like httpd, xamp or even that python built-in micro server.

Happy0Ending · November 11, 2022, 8:19am

；in case 2,Where does facial data come from?Did it request from this website? If so, it may cause privacy disclosure. How can I avoid this problem

imerso · November 11, 2022, 4:50pm

That comes from Google itself:

Check the javascript examples. What I did was convert the screen marks (which were essentially in 2D coordinates with unused z) to a real 3D mesh, by reprojecting them and auto generating UV for the mesh.

rdurnin · February 16, 2023, 5:14pm

Hey, and thanks for providing this great mediapipe example. I used it to create a new playground which doesn’t rely on a fixed camera position, and have posted it to the demo forum. There were some scale issues which were likely introduced to align the scene camera to the content, which I removed, and I added filtering to the landmarks to reduce some of the frame jitter.

Thanks again,
r

rdurnin · February 17, 2023, 2:05pm

Hey, and after some searching I discovered that you could access the geometry and pose transform without having to transform the landmarks from screen space. Each frame produces updated vertices and UVS, which appear a little stable than the transformed landmarks. It also seems to solve the problem of placing the camera in front of the transformed mesh.

Enable geometry and set camera fov:

    const facemesh = new FaceMesh({
      locateFile: file => {
        return `https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh/${file}`;
      },
    });
    facemesh.setOptions({
      // Camera specific and should be derived
      cameraVerticalFovDegrees: 43.3,
      // Enable per frame mesh generation
      enableFaceGeometry: true,
      maxNumFaces: 1,
      minDetectionConfidence: 0.65,
      minTrackingConfidence: 0.65,
      selfieMode: true,
    });

Update the mesh using the provided geometry:

const _update_mesh = (
  mesh: Mesh,
  results: Results,
) => {
  const geometry = results.multiFaceGeometry[0];
  if (!geometry) return;

  const landmarks = results.multiFaceLandmarks[0];
  if (!landmarks) return;

  const verts: FloatArray = [];
  const uvs: FloatArray = [];

  // returns (xyz) + (uv)
  const gverts = geometry.getMesh().getVertexBufferList();
  const pmdata = geometry.getPoseTransformMatrix().getPackedDataList();
  const matrix = Matrix.FromArray(pmdata);

  let uv_idx = 0;
  let vert_idx = 0;
  for (let i = 0; i < landmarks.length; i++) {
    const gx = gverts[i * 5];
    const gy = gverts[i * 5 + 1];
    const gz = gverts[i * 5 + 2];
    const gvec = Vector3.TransformCoordinates(new Vector3(gx, gy, gz), matrix);

    verts[vert_idx++] = gvec._x;
    verts[vert_idx++] = gvec._y;
    verts[vert_idx++] = gvec._z;

    uvs[uv_idx++] = gverts[i * 5 + 3];
    uvs[uv_idx++] = gverts[i * 5 + 4];
  }
  mesh.updateVerticesData(VertexBuffer.UV2Kind, uvs);
  mesh.updateVerticesData(VertexBuffer.PositionKind, verts);
};

I hope that helps. I will post the code for deriving camera vfov when I sort it out.
r

imerso · February 18, 2023, 1:29am

That is nice, not sure if that feature was always available or added in newer Mediapipe iterations.

Anyway, cool! Thanks for your contribution.

sebavan · February 20, 2023, 10:39am

Thanks @rdurnin !!!

rdurnin · February 20, 2023, 1:10pm

Mediapipe includes some tools for determining camera distance using iris width (mediapipe | iris), but they are not available in js. Using the following image I have been able to convert the eye landmarks into a world locations and to find their center, much the same way.

eye_landmarks

I will post when I have a working solution, and have opened some questions in another thread.
r

Lloyd_Henning · February 27, 2023, 4:41pm

This is incredible. Well done on getting mediapipe to work for you! They don’t have any 3D examples, so this has be very useful for me.

Topic		Replies	Views
Single camera, in browser motion tracking/rendering with babylonjs Demos and projects	3	1241	February 9, 2022
Who remembers the address of a PG（Collect face pictures from the camera and use them on the face model） Questions	1	440	August 19, 2022
Mediapipe Face Tracking Playground Demos and projects	7	2026	June 30, 2023
Mediapipe segmentation Questions	3	394	January 20, 2024
BabylonJS + Deep-Learning Demos and projects	12	461	May 31, 2025

Realtime 3D Face Capture

Related topics