Realtime 3D Face Capture

So, this is a little tool I did some time ago and was here getting dust. =)

It uses Mediapipe for face mesh capture – plus a trick of mine to convert it into textured 3D face model on-the-fly =) – and Babylon.js for 3D rendering. It can export the captured face as gltf.

Demo: face
Source-code: GitHub - imerso/facecap: Babylon.js + Mediapipe face capture

Requires a webcam!


this is INSANE! SO! COOL!

Wow Amazing GIF - Wow Amazing Great - Discover & Share GIFs | Gif, Cool gifs, Amazing


would be interesting if you published to github pages to make it more accessible!

Edit: :man_facepalming: click the demo link if you are as blind as me!

1 Like

It is on github pages, the link is on the original post. =)

1 Like

This is amazing! I believe this technique can be used for virtual space live conferencing calls by mapping webcam feeds to this and then map onto avatar meshes. Current approaches only display a square mesh of webcam feed beside the characters (could be for performance purposes).


Great stuff!
Right, webcam or stream, all the same, just pass the video element (that can be hidden) to MediaPipe constructor.

1 Like

Close to that, but not so straightforward. MediaPipe did not provide the 3D mesh directly, had to extract that manually. Also, coordinates were screen-space and the video was not directly mapped to the 3D mesh (there was no uv mapping). Needed to generate unwrapped mesh and remap the face roi to uv map to make it possible to generate the final 3D face. That happens in real-time.


But now that you did all that, it is that straightforward!
Cool :slight_smile:

1 Like

I’d like to run it to process images and export to models in a batch. How can I do that?

That would be perfectly possible, but with more development. The sample can export only a single camera image at once.

Thanks for the reply. I’m doing this in Python.

  1. The result face mesh is normalized. I can multiple x and y with the width and height of the image. What can I do about the z axis? How can I scale the z axis to make the mesh look good?

  2. For your code, how can I run it locally on my local Linux server?
    Do I need to configure node.js and how?


Not sure I understand your questions, so I’ll try to answer based on how I understood.

For [1], the rendered face is in 3D already, so you can scale it equally by x, y and z. Are you referring to modified code or the original one?

For [2], it is a standard web page (no node.js used), so you only need a web server on your local machine. I use Apache, but you can use anything, like httpd, xamp or even that python built-in micro server.

;in case 2,Where does facial data come from?Did it request from this website? If so, it may cause privacy disclosure. How can I avoid this problem

That comes from Google itself:

Check the javascript examples. What I did was convert the screen marks (which were essentially in 2D coordinates with unused z) to a real 3D mesh, by reprojecting them and auto generating UV for the mesh.

1 Like

Hey, and thanks for providing this great mediapipe example. I used it to create a new playground which doesn’t rely on a fixed camera position, and have posted it to the demo forum. There were some scale issues which were likely introduced to align the scene camera to the content, which I removed, and I added filtering to the landmarks to reduce some of the frame jitter.

Thanks again,


Hey, and after some searching I discovered that you could access the geometry and pose transform without having to transform the landmarks from screen space. Each frame produces updated vertices and UVS, which appear a little stable than the transformed landmarks. It also seems to solve the problem of placing the camera in front of the transformed mesh.

Enable geometry and set camera fov:

    const facemesh = new FaceMesh({
      locateFile: file => {
        return `${file}`;
      // Camera specific and should be derived
      cameraVerticalFovDegrees: 43.3,
      // Enable per frame mesh generation
      enableFaceGeometry: true,
      maxNumFaces: 1,
      minDetectionConfidence: 0.65,
      minTrackingConfidence: 0.65,
      selfieMode: true,

Update the mesh using the provided geometry:

const _update_mesh = (
  mesh: Mesh,
  results: Results,
) => {
  const geometry = results.multiFaceGeometry[0];
  if (!geometry) return;

  const landmarks = results.multiFaceLandmarks[0];
  if (!landmarks) return;

  const verts: FloatArray = [];
  const uvs: FloatArray = [];

  // returns (xyz) + (uv)
  const gverts = geometry.getMesh().getVertexBufferList();
  const pmdata = geometry.getPoseTransformMatrix().getPackedDataList();
  const matrix = Matrix.FromArray(pmdata);

  let uv_idx = 0;
  let vert_idx = 0;
  for (let i = 0; i < landmarks.length; i++) {
    const gx = gverts[i * 5];
    const gy = gverts[i * 5 + 1];
    const gz = gverts[i * 5 + 2];
    const gvec = Vector3.TransformCoordinates(new Vector3(gx, gy, gz), matrix);

    verts[vert_idx++] = gvec._x;
    verts[vert_idx++] = gvec._y;
    verts[vert_idx++] = gvec._z;

    uvs[uv_idx++] = gverts[i * 5 + 3];
    uvs[uv_idx++] = gverts[i * 5 + 4];
  mesh.updateVerticesData(VertexBuffer.UV2Kind, uvs);
  mesh.updateVerticesData(VertexBuffer.PositionKind, verts);

I hope that helps. I will post the code for deriving camera vfov when I sort it out.


That is nice, not sure if that feature was always available or added in newer Mediapipe iterations.

Anyway, cool! Thanks for your contribution.

Thanks @rdurnin !!!

Mediapipe includes some tools for determining camera distance using iris width (mediapipe | iris), but they are not available in js. Using the following image I have been able to convert the eye landmarks into a world locations and to find their center, much the same way.


I will post when I have a working solution, and have opened some questions in another thread.

1 Like

This is incredible. Well done on getting mediapipe to work for you! They don’t have any 3D examples, so this has be very useful for me.