Unable to create VAO on context lost with active frustum culling

I’m sorry for the tag then. I just saw his name a lot in git blame in the restoration code. :slight_smile:

The code in your screen shot does not work in the current pass of the render loop in which the context was lost though. _contextWasLost is set in the event handler of the context lost event. This handler can only run after babylon has yielded to the browser by requestAnimationFrame() at the end of the render loop.
Even if it could be set earlier, there would be a race condition. What if the context was lost during _renderFrame()?

I’m unsure of what you are asking for. The thrown exception can be observed in the playground in post #1. If you tick “Pause on uncaught exceptions” (chrome) and watch this._gl.isContextLost(), it will return true.
This means the guard around the render loop really does not work in the frame the context is lost.

Do you need a repro of anything else?

Fix context lost event leak by sebavan · Pull Request #15165 · BabylonJS/Babylon.js · GitHub will fix the playground from post #1.

It is still not related to the exception thrown during creation.

Would be great to have a repro for this

1 Like

Thanks for the PR. :slight_smile:
This fixes a new bug, that may have been introduced, after this thread has been opened. I haven’t encountered this one before.

The exception on VAO creation is still reproducible by me following the steps in post #1 using chromium 125. It depends greatly on window size and browser used.
Unfortunately, I have no idea, how to make this easier to reproduce, since I don’t know, why babylon tries to create this VAO in this exact frame in the first place.

I do not see how this is possible without a repro unfortunately cause VAO are only created during render, render is synchronous (so no context lost in between) and context lost prevents render to be call. Would be amazing to track it down.

In case you are not able to reproduce the exception in the PG from post #1, please try this one:

Just rerun a few times and you should see the exception being thrown. I have just reproduced this on Chrome 125 and Firefox 126 on Linux. So this PG should be way more reliable than the older one. Also this seems to be completely independent from Blink now.

Here is another PG simulating a context loss in the callback of SceneLoader.ImportMesh leading to an exception during shader creation:

As I understand your last post, you have the hypothesis, that there can be no context loss inside a synchronous JS function. I really think, that this is not true. As I understand, a context loss (especially a natural one not triggered via the extension) can occur at any time.

There can not be any events during a synchronous JS function not just context lost by the internal nature of the javascript event based engines.

This does not mean the context can not be broken already but the event might be trigger a tiny bit after.

The VAO one should not happen outside of pure debug code, but I agree the import code could suffer from those by having code running outside the raf.

Do you want to give a try at creating a PR ?

I totally agree with this and I think this is the problem here. The context lost event can only be delivered after the raf has finished, but the context can already be broken earlier.

Why is that? As I understood @mikeysee, they are facing it in production. Unfortunately, I do not have data for our application, since we disabled context restoration and force a page reload instead atm.

Rn, there are still to many open questions about how to tackle this for me. I see two possibilities:

  1. Allow the gl objects to be null, let the render loop finish and receive the context lost event when the raf has finished.
  2. Keep the null checks, but instead of throwing check gl.isContextLost() and cancel the animation frame.
    I agree, that in both cases, keeping babylons internal state intact might not be trivial.

I’m not sure about, what would be the best approach. Although, I tend to 1., since there shouldn’t be much reason to read from gl objects, hence they being null shouldn’t really affect babylons internal state. (that’s speculation ofc) I do not have that much knowledge about babylons internals, to make any decision here confidently. That’s why, I asked for help from you guys and also tagged Mr evgeni_popov for insights into the restoration code. :slight_smile:

Unfortunately, I do not have data for our application, since we disabled context restoration and force a page reload instead atm.

Would you mind sharing how you do this as I suspect this is something we will save to do.

Hey mikey, you could add a handler to Engine.onContextLostObservable. We displayed a Modal informing the user about the crash and asking for a page reload this way for the longest time.

After lots of testing, I’ve implemented the following workaround now:

function contextLossCrashWorkaround(scene: Scene): void {
    scene.onBeforeActiveMeshesEvaluationObservable.addOnce(() => {
        scene.skipFrustumClipping = true;
        scene.onAfterRenderObservable.addOnce(() => {
            scene.skipFrustumClipping = false;
        });
    });
}

This forces the creation of all VAOs, vertex buffers, etc. in the scene and therefore prevents babylon from trying to create these on a lost context. This needs to be reapplied, when you add objects to the scene or after a successful context restoration.

Ugly, but seems to work so far. I’d still rather have the underlying issue fixed, but I’m short on time and still have too many open questions as mentioned above. I’d be glad for any more help on this. :slight_smile:

Fantastic thanks, ill see if that helps us