Memory Leak in Screenshots

Alex_B · January 7, 2023, 5:49pm

Hi, this is a weird one, but here is the scenario we’re encountering. In BabylonJS 5x (after the alpha release) we are seeing a GPU memory leak when running CreateScreenshotUsingRenderTargetAsync continually. The issue appears most prominently when running on Linux, for some reason. We do not see this issue when running any alpha version. I think there is an issue with this commit here, which aims to force a render using CreateScreenshotUsingRenderTarget when the following is true:

The scene’s active camera is not the camera used for the render.
The original call is to CreateScreenshot

This is our scenario, only we’re calling CreateScreenshotUsingRenderTargetAsync with a free camera that is not the active Scene camera.

Just looking at this code, it looks like it’s possible that calling CreateScreenshotUsingRenderTargetAsync with a new camera without swapping the scene’s active camera may direct the renderer to call itself twice, and then some object is not being disposed by the GPU?

github.com

carolhmj/Babylon.js/blob/c98cf68c87679f5dc2765ef4af3a9b06fab3c2fa/packages/dev/core/src/Misc/screenshotTools.ts

/* eslint-disable @typescript-eslint/naming-convention */
import type { Nullable } from "../types";
import type { Camera } from "../Cameras/camera";
import { Texture } from "../Materials/Textures/texture";
import { RenderTargetTexture } from "../Materials/Textures/renderTargetTexture";
import { FxaaPostProcess } from "../PostProcesses/fxaaPostProcess";
import { Constants } from "../Engines/constants";
import { Logger } from "./logger";
import { Tools } from "./tools";
import type { IScreenshotSize } from "./interfaces/screenshotSize";

declare type Engine = import("../Engines/engine").Engine;

/**
 * Captures a screenshot of the current rendering
 * @see https://doc.babylonjs.com/how_to/render_scene_on_a_png
 * @param engine defines the rendering engine
 * @param camera defines the source camera
 * @param size This parameter can be set to a single number or to an object with the
 * following (optional) properties: precision, width, height. If a single number is passed,

This file has been truncated. show original

I’m happy to help set up a Playground next week but I wanted to see if any of this sounds plausible.

Alex_B · January 7, 2023, 11:33pm

Actually, I think I lied there. It looks like that change only affects normal screenshots. We’re using this method here, which does not appear to force the render target texture method. We’re still trying to figure out why there’s a leak when we switch to 5.41.0 from 5.0.0-alpha.60

github.com

carolhmj/Babylon.js/blob/c98cf68c87679f5dc2765ef4af3a9b06fab3c2fa/packages/dev/core/src/Misc/screenshotTools.ts#L304


      
          * rendering at a higher or lower resolution

          * @param mimeType The MIME type of the screenshot image (default: image/png).

          * Check your browser for supported MIME types

          * @param samples Texture samples (default: 1)

          * @param antialiasing Whether antialiasing should be turned on or not (default: false)

          * @param fileName A name for for the downloaded file.

          * @param renderSprites Whether the sprites should be rendered or not (default: false)

          * @returns screenshot as a string of base64-encoded characters. This string can be assigned

          * to the src parameter of an <img> to display it

          */

          export function CreateScreenshotUsingRenderTargetAsync(

             engine: Engine,

             camera: Camera,

             size: IScreenshotSize | number,

             mimeType: string = "image/png",

             samples: number = 1,

             antialiasing: boolean = false,

             fileName?: string,

             renderSprites: boolean = false

          ): Promise<string> {

             return new Promise((resolve, reject) => {

Evgeni_Popov · January 8, 2023, 11:10am

Yes, a PG would definitely help.

Do you know what is the object which is leaking?

Alex_B · January 8, 2023, 5:23pm

No, but since it’s a GPU leak we tried a few things to isolate the issue. First, we just looked at the Scene graph in the Inspector and confirmed that we aren’t growing textures or materials. Those counts are stable and nothing is being added there. To really confirm that theory, we skipped any Scene updates altogether and just do the screenshot operation. If we swap that around, and do all the updates but return a single pixel from the screenshot operation, there is no leak, so it’s seems to be somewhere in that screenshot code.

We then reverted to the Alpha version and saw the issue disappear. We tried this several times and can reproduce it every time. Switch to latest, GPU leak, switch back, no GPU leak.

This is using the latest Chromium/Puppeteer in a Linux environment, if that helps at all. I’m not sure we’d be able to see it just using a PG, but I’d guess the way to set it up is to run an async render operation in a loop that just does them one after the other and see if your GPU climbs. I don’t see it happening on a Mac, either, which just makes it even more of a mystery!

Alex_B · January 9, 2023, 12:03am

One other note: to see the GPU leak we are watching the GPU using watch -n1 nvidia-smi on the Linux machine with an NVidia card.

Evgeni_Popov · January 9, 2023, 12:44pm

That does not ring any bell to me…

One thing that would probably help narrow down the search would be to test different 5.x versions and locate the one that introduces the problem: there are too many changes between 5.0.0-alpha.60 and 5.41.0 to do a code comparison and try to pinpoint the exact change that leads to the behavior you are experiencing.

Alex_B · January 9, 2023, 3:40pm

We tried to ascertain this but ran into an issue in which 5.0.0 - 5.12.0 would not load our models. So jumping to 5.21, we were able to see the leak. Strangely, the leak is there but not as pronounced in 5.21 as it is in 5.41. It leaks, but does it slower.

Alex2 · January 9, 2023, 4:57pm

Here is a Playground. We can reproduce this in 5.41.0.

Evgeni_Popov · January 9, 2023, 6:44pm

Thanks, I’m going to have a look, but why are you calling engine.endFrame by hand? The user is not supposed to do that when the render loop is handled by the engine, I’m not sure if there can be some side effects because of that…

Can you test by removing the 2x scene.render() and 2x engine.endFrame() calls to be sure the problem is not related to that?

Alex_B · January 9, 2023, 7:48pm

It relates to this other issue we saw here:

(tl;dr) we had geometry that was missing from renders.

If you can advise us how to adjust the Playground, we’re still set up in Linux to test and see if that still results in a memory leak.

Alex_B · January 9, 2023, 8:03pm

I just realized that I didn’t note that in our current app, we freeze the render loop, which is why we run that manual frame advancement. So this PG is slightly more accurate.

Evgeni_Popov · January 9, 2023, 9:29pm

When you say that there is a GPU memory leak, I assume that the percentage of free memory is constantly decreasing and at the end you get an “out of GPU memory” message?

I tested your PG on my computer and used nvidia-smi (I have a 3080Ti). The %mem goes up but comes down regularly, so in the end I don’t see a memory leak on my side:

I guess at this point we can only try to rollback one by one our changes (there are not that many from 5.0.0 to 5.21.0) in the screenshot method so you can test it, and see when the leak disappears…

Alex2 · January 9, 2023, 10:04pm

Correct. Are you testing on Windows or Linux? For us, the problem does not exist on Mac OS but in Linux (Amazon Optimized Linux, to be exact), the GPU goes from 600mb to 7,8,900 and up until it uses up the full 24GB. Nothing else is running on this machine.

Evgeni_Popov · January 10, 2023, 3:06pm

I’m testing on Windows.

Trying to rollback some of our changes, can you try this PG:

Alex_B · January 10, 2023, 3:47pm

With this PG, the problem does NOT exist. The bottom process is our test after running about 5 minutes. This seems like the likely culprit!

Evgeni_Popov · January 10, 2023, 4:38pm

In the new Tools.DumpData function, we are creating a new texture each time the function is called but we are also disposing it, so I don’t understand why it would leak… Indeed, it seems it leaks only on your test machine but not on others.

Can you test this PG and see if it is leaking:

It’s basically what we are doing in the DumpData function…

EDIT: maybe you can lower the setInterval delay

Alex2 · January 10, 2023, 5:02pm

Yep, this is leaking. It started at around 75mbs and is now up to 123mb.

Evgeni_Popov · January 10, 2023, 8:41pm

That means any texture created is leaking, disposing of it does not reclaim the GPU memory…

It sounds like a bug of the driver to me, which does not reclaim GPU memory properly.

It is more apparent now with our new screenshot code because we are creating a new texture each time, but to me the problem is not with our code.

cc @sebavan in case it rings some bells for him.

Note that if that helps, you can keep the overriden method of BABYLON.DumpTools.DumpData I provided in the PG above if the screenshots are ok with you with this method. Our new implementation takes care of the pre-multiply setting of canvas that made data with alpha channel not having the proper look, but screenshots don’t have an alpha channel so that should be fine.

And thanks for your patience with the testings!

Alex2 · January 10, 2023, 10:52pm

That’s interesting. We’re working with AWS on finding an updated driver. I’ll post back when we get info there.

Topic		Replies	Views
ScreenshotTools.CreateScreenshot produces images with scene elements missing/incomplete only on Windows Bugs	13	2571	August 9, 2022
Create screenshot generate blank/black image Questions	13	3217	June 18, 2020
Screenshots and framebuffer problems Questions screenshot , framebuffer	24	1879	December 29, 2021
Taking screenshot crashes with WebGL1 Bugs	5	513	February 26, 2021
Can we skip the ready check in screenshots? Bugs	20	303	March 5, 2024

Memory Leak in Screenshots

Related topics