WebXR Immersive Session Screen Recording (Potential Solution / Optimization Help)

Hello everyone,

I have been working on trying to get screen recording to work in an immersive WebXR session, I’ve read a lot of posts on this forum (and others) of people being unable to record WebXR sessions and I think I’ve finally come across a solution that works without needing to do a readPixels call at some point which is the major bottleneck in most solutions, it keeps all the data on the GPU and then records using MediaRecorder.

I would love to get some feedback on this approach and any potential issues that could occur and optimisation ideas, but hopefully this is another resource for those trying to achieve the same thing, saw quite a few posts on the topic.

Key problems:

  • Trying to record the Babylon Canvas during an Immersive WebXR session using MediaRecorder API outputs nothing as WebXR is handling their own framebuffer

  • So we need to reconstruct the framebuffer manually via a render texture that has both the WebXR camera image & scene pixels composited

  • Then trying to use readPixels or readPixelsAsync on said RenderTexture to send to a 2d context canvas in order to record using MediaRecorder becomes a sync point between GPU/CPU and causes a major frame drop in the experience when done every frame and even recording every second frame on Pixel 6 still drops to around 20 FPS (with very little content)

Proposed Solution

While digging around I found this forum post on PlayCanvas, which discussed blitting the WebXR framebuffer directly to the existing Babylon Canvas using WebGL commands, then you would be able to record the main canvas directly with MediaRecorder API.

This is the first approach I tried, I was able to find the framebuffer using:

const framebuffer = this.xr.baseExperience.sessionManager.session.renderState.baseLayer!.framebuffer;

Then I tried blitting it to the canvas using the same code as within the post, however I did not receive any output. This approach is the ideal, however, I wasn’t able to get it working.

However, during this experiment I realised that you could infact render to the main canvas and record it, tested by just running the code in engine.onAfterRenderObservable:

gl.bindFramebuffer(gl.FRAMEBUFFER, null);
gl.clearColor(Math.sin(performance.now() / 1000), Math.cos(performance.now() / 1000), Math.atan(performance.now() / 1000), 1);
gl.clear(gl.COLOR_BUFFER_BIT);

This is great because now we just need to:

  1. Put together a RenderTexture with the Camera Image & Scene Contents
  2. Extract the internal WebGL Texture from said RenderTexture
  3. Draw a Fullscreen Quad onto the main Canvas
  4. Apply WebGL Texture
  5. Record Main Canvas :white_check_mark:

In this way we avoid ever needing to transfer data back to the CPU. This works because the main canvas is being unused during an immersive session and also WebXR is sharing the same gl context.

Solution

  1. Create 2 RenderTargetTextures for Scene & Camera
// Stores Scene Pixels
this.sceneRenderTarget = new RenderTargetTexture(
  "scene",
  {
	width: this.width,
	height: this.height,
  },
  this.scene,
  true
);

this.sceneRenderTarget.clearColor = new Color4(0, 0, 0, 0);

this.sceneRenderTarget.renderList = null; // Render everything
this.scene.customRenderTargets.push(this.sceneRenderTarget);

// Stores the XR Camera Pixels
this.cameraRenderTarget = new RenderTargetTexture(
  "camera",
  {
	width: this.width,
	height: this.height,
  },
  null,
  true
);
  1. On the Camera Texture update callback from BabylonJS, copy the WebGL texture into the Camera Render Texture using the CopyTextureToTexture class, which from quick inspection is a GPU copy and should have little overhead

this.babylonTextureCopier = new CopyTextureToTexture(this.engine!, false);

// --- 

this.cameraAccess?.onTexturesUpdatedObservable.add(
	this.onXRCameraTextureUpdated.bind(this)
);

// ---

private onXRCameraTextureUpdated(textures: BaseTexture[]) {
	if (this.babylonTextureCopier && this.cameraRenderTarget) {
	  const cameraTexture = textures[0];
	  this.babylonTextureCopier.copy(cameraTexture, this.cameraRenderTarget);
	}
}
      
  1. Create a PostProcessor to composite both of these RTT’s into one
Effect.ShadersStore["flipYShaderFragmentShader"] = `
    precision highp float;
    varying vec2 vUV;
    // uniform sampler2D textureSampler2;   // Second texture

    uniform sampler2D textureSampler;  // First texture

    uniform sampler2D cameraSampler;  // First texture

    void main(void) {
        // Flip the y-axis by adjusting the vUV coordinate
        // vec2 flippedUV = vec2(vUV.x, 1.0 - vUV.y);
        vec2 flippedUV = vec2(vUV.x, vUV.y);
        vec4 renderColor = texture2D(textureSampler, flippedUV);
        vec4 cameraColor = texture2D(cameraSampler, flippedUV);

        // Combine the two colors
        gl_FragColor = mix(cameraColor, renderColor, renderColor.a);
    }
`;

this.postProcessor = new PostProcess(
  "flipY", // name
  "flipYShader", // fragment shader
  null, // uniforms
  ["textureSampler", "cameraSampler"], // samplers
  1.0, // ratio
  null, // camera (null as this is output to offscreen canvas)
  Constants.TEXTURE_NEAREST_SAMPLINGMODE,
  this.engine, // engine
  false // reusable
);

this.postProcessor.onApply = (effect) => {
  if (this.cameraRenderTarget)
	effect.setTexture("cameraSampler", this.cameraRenderTarget);
};

this.sceneRenderTarget.addPostProcess(this.postProcessor);
  1. At the end of the Babylon Frame call the WebGL Commands that will put the output of the RTT onto the main babylon rendering canvas

this.engine.onEndFrameObservable.add(this.boundRenderFrame);

// ----

private boundRenderFrame() {
	this.engine.setSize(this.width, this.height);

	runPostFrame(
		this.engine._gl,
		this.sceneRenderTarget!,
		this.width,
		this.height
	);
}


WebGL Flow or runPostFrame():

// ----- Full Screen Quad Shaders ----

const vertexShaderSource = `
	attribute vec2 position;
	varying vec2 uv;
	void main() {
		uv = (position + 1.0) * 0.5; // Convert from clip space to UV coords
		gl_Position = vec4(position, 0.0, 1.0);
	}
`;

const fragmentShaderSource = `
  precision mediump float;
  varying vec2 uv;
  uniform sampler2D renderTexture;
  void main() {
	  vec4 renderColor = texture2D(renderTexture, uv);
	  gl_FragColor = renderColor;
  }
`;


// ------ WEBGL COMMANDS ----

const runPostFrame = (
  gl: WebGL2RenderingContext,
  renderTarget: RenderTargetTexture,
  width: number,
  height: number
) => {
  const renderTexture = renderTarget.getInternalTexture();

  if (
    !renderTexture ||
    !renderTexture._hardwareTexture ||
    !renderTexture._hardwareTexture.underlyingResource
  ) {
    console.error("RenderTargetTexture is missing its WebGL texture.");
    return;
  }

// --  Do we need this to ensure it doesn't conflict with WebXR Rendering? --
  gl.fenceSync(gl.SYNC_GPU_COMMANDS_COMPLETE, 0);
- 
  gl.viewport(0, 0, width, height);

  gl.bindFramebuffer(gl.FRAMEBUFFER, null);

  gl.clearColor(Math.sin(performance.now() / 1000), Math.cos(performance.now() / 1000),Math.atan(performance.now() / 1000), 1);
- 
  gl.clear(gl.COLOR_BUFFER_BIT);

  // ==== CREATE WEBGL RESOURCES ( ARRAY BUFFER / FS PROGRAM ) ====

// == CREATE FULL SCREEN QUAD ==
  gl.bindBuffer(gl.ARRAY_BUFFER, vertBuffer);

  // Set up attributes
  if (positionLocation === null) {
    positionLocation = gl.getAttribLocation(fsProgram, "position");
  }
- 
  gl.enableVertexAttribArray(positionLocation);
  gl.vertexAttribPointer(positionLocation, 2, gl.FLOAT, false, 0, 0);
  gl.useProgram(fsProgram);

  if (!textureLocation) {
    textureLocation = gl.getUniformLocation(fsProgram, "renderTexture");
  }

  gl.activeTexture(gl.TEXTURE0);

// == BIND RTT ==
  gl.bindTexture(
    gl.TEXTURE_2D,
    (renderTexture._hardwareTexture as WebGLHardwareTexture).underlyingResource
  );

  gl.uniform1i(textureLocation, 0);

  gl.drawArrays(gl.TRIANGLES, 0, 6);


// -- Is calling `gl.finish()` here a good idea? --
  gl.finish();
};

  1. Start media recording the canvas using MediaRecorder API

const canvasStream = mainCanvas.captureStream(frameRate);

const mediaRecorder = new MediaRecorder(canvasStream, {
  mimeType: "video/webm; codecs=vp9",
});

mediaRecorder.onstop = () => {
      
  const blob = new Blob(recordedChunks, { type: "video/webm" });
  const url = URL.createObjectURL(blob);
  const response : MediaRecorderResponse = {
	success: true,
	url,
  };
  console.log("MediaRecorder finished, returning response: ", response);
  resolve(response);
};

mediaRecorder.start();

  1. Output to a file

Issues & Questions

There are a few issues that I’ve noticed and would love some help to resolve/understand better:

  • Before I call the WebGL commands I have to run engine.setSize(width, height) every frame even when recording an an unmodified resolution, I think this is expensive but without this line the output becomes larger/smaller randomly

  • How can I ensure the WebGL commands are running after all the work that WebXR is doing and doesn’t conflict with its render processes, since they are sharing the same context - I’ve tried using fenceSync for now and haven’t noticed any weirdness in either the immersive session or the recording

  • For some reason I have to copy the given Camera Image from WebXR and cannot use that texture directly in the PostProcessor, I think since the texture is being handled by WebXR it is being destroyed outside of that callback, I would love to know how I can optimize out the CopyTextureToTexture class

MediaRecorder Issues

The main issue I’ve noticed after I got this working is there are random frame-rate drops in the MediaRecorder output, initially I thought it was the WebGL commands conflicting with the XR session - however to debug I brought the main canvas into the DOM overlay and watched it transfer the pixels and it was fully smooth.

So it seems to me that the MediaRecorder is experiencing frame drops due to some internal issue (like stated in this forum post : Ability to tell MediaRecorder to maximize frame rate · Issue #177 · w3c/mediacapture-record · GitHub).

However, recording at a capped 720p at 25-30FPS is leading to smooth outputs, higher resolutions start to experience frame drops but only in the output recording, the experience itself is running at 30FPS without any lag.


let canvasWidth = Math.ceil(props.engine.getRenderWidth(true) * window.devicePixelRatio);

let canvasHeight = Math.ceil(props.engine.getRenderHeight(true) * window.devicePixelRatio);

// clamp to 720p

if (canvasWidth > 1080) {
canvasWidth = 1080;
canvasHeight = Math.ceil((canvasHeight * 1080) / canvasWidth);
}

const recorder = new MediaRecorderWeb(props, {
captureDurationS: 10,
debugCanvas: false,
width: canvasWidth,
height: canvasHeight,
recordingFrameRate: 25
});

// Start recording
const startingResponse = await recorder.start();

if (startingResponse.error) {
	console.log("Some error happened with recording");
}

// Stop
const response = await recorder.stop();

// Downloadable Url
const url = response.url;

console.log("MediaRecorder initialized.", this.impl);

Unfortunately it seems I can’t upload files since I’m a new user, but let me know if you want clarification on any part of the code :slight_smile:

Thanks!

3 Likes

So… if I understand correctly, you want a way to record WebXR on the web. Babylon allows you to duplicate the webxr session to a canvas. To do that, use this method - Babylon.js docs

Does this help with your request?

The docs mention that this approach is mainly for Desktop VR experiences, we are trying to target Mobile AR - however it might still work so I’ll give it a shot today and post an update :slight_smile:

After testing enableSpectatorMode I was able to get some output from the main canvas, however there are various issues:

  1. The framebuffer is not getting cleared
  2. There is no camera output from WebXR being included

The goal is to get a 1:1 representation of what the user is seeing on their screen. This is what I am seeing: