Unable to create VAO on context lost with active frustum culling

Hello,

I’ve encountered a potential bug reproducible in the following playground:

Steps to reproduce:

  1. Open the playground.
  2. Resize the canvas/window so none of the Boombox objects on the left are visible.
  3. Rerun the playground.

Following these steps, I can reliably reproduce an Uncaught Error: Unable to create VAO.

Now the weird part: Resizing the window so that at least one of the Boombox objects is visible on startup or setting scene.skipFrustumClipping = true leads to no error on context loss.

This means, there is enough time to build the VAOs. But with active frustum culling enabled the engine tries to build VAOs for objects, that have been culled for the complete lifetime of the application (and still should be culled), just after the context lost event has been fired.

Implications to my real world use case:
Either I disable frustum culling (big performance loss), or I set alwaysSelectAsActiveMesh = true on all objects and disable it again after at least one renderloop pass (gets complicated when dynamically loading in the scene). Otherwise the application crashes on a context loss and cannot restore. I’d rather have the engine not unnecessarily trying to build VAOs after the context has been lost.

Scanning through the babylon source, I have not been able to find the reason for this so far.
I’d be glad for any help! :slight_smile:

gosh I tried for several minutes to repro but I finally managed to get the error.

I truly believe this is a bug in WebGL implementation (i know that initially the manually forced context lost was only recommended for debug sake).

Do you mind opening an issue for the chromium team to check?

I filed a bug report at chromium:
https://issues.chromium.org/issues/331092193

Please feel free to add additional information there, in case you have any idea about what might be causing this. TBH my bug report feels like a complete shot in the dark to me.

One additional thing: I could not reproduce the problem on firefox. Neither with the playground nor with our actual application, which on chromium reliably throws the error. So it really seems to be chromium related.

1 Like

Thanks a lot! to be fair I really think this is deeply internal and related to the lost context so we cannot really do a lot about it

Update:
Chromium think it’s an issue internal to BabylonJS and closed the bug report.

haha lol
seriously? So it works on firefox but this is babylon fault…

I’ll dig into it

To give a bit more context, the GL error returned by the VAO creation is 37442: “CONTEXT_LOST_WEBGL”.

We do handle context losts, but only after our “webglcontextlost” listener has been called by the browser. The problem here is that this event has not been raised yet when the VAO creation fails, so we are not aware that the context has been lost.

Gosh! I did not realize that and I was escalating with Google team :frowning:

The explanation coming from Ken from Google who kindly looked at the issue:

This indicates to me that sometimes Babylon is trying to use the WebGL context before the webglcontextrestored event is dispatched. It’s necessary to wait for this event to be dispatched after calling the restoreContext method of the WEBGL_lose_context extension as documented in WebGL WEBGL_lose_context Khronos Ratified Extension Specification .

Thanks a lot for investigating!

Kens explanation makes sense, but this does not explain, why Babylon is trying to create this VAO after context loss in the first place.

The error in the playground is reproducible even when commenting out the restoreContext() call. It’s happening immediately on context loss.

I’m still not convinced whether the problem is within chromium or Babylon, but I don’t see an error in the playground code:

Looking at Kens printfs, the first line "I think the context is lost" is printed in WebGL2RenderingContextBase::createVertexArray(), when the context has already been lost.
But "After context lost event dispatch: restore is allowed" printed within WebGLContextEvent::Create() comes afterwards.

To me, this seems to be a super subtle timing error (then Babylon would have tried to create an VAO after context loss) or createVertexArray() fails, before the context lost event has been fully dispatched (this would be a chromium internal bug).

I have two question:

  1. Why does Babylon even try to create the VAO in this scenario?
  2. Is it possible, that the renderloop is not stopped immediately on receiving the context lost event trying to finish it’s current run and thus accessing the lost context?

I’m posting here to ask for your guys opinion first, before reopening the chromium issue too and wasting more people’s time.

If you put a breakpoint in ThinEngine._onContextLost (which is our webglcontextlost handler, set by a canvas.addEventListener("webglcontextlost", this._onContextLost, false); call) and check “Pause on uncaught exceptions” in the debugger, you will see that you hit the latter before the former.

For me, it would mean it’s a problem on the browser side: as we don’t have been notified that the context has been lost, we run our regular code, which makes the create VAO code fail.

The strange thing is that the bug only appears when there are no meshes on the screen… So, maybe we are doing something we shouldn’t, that leads to this state of affairs…

Have you been able to test in other browsers, like Safari?

I did some tests on browserstack (Linux was tested locally on my Manjaro machine):

Not reproducible by me on:
Safari 16.5, 17.3 (Mac)
Firefox 124 (Win11, Linux, Android Samsung Galaxy S22)
Safari iPhone 14
Chrome iPhone 14

Reproducible on:
Edge 123 (Mac, Win11)
Chrome 123 (Mac, Win11, Linux, Android Samsung Galaxy S22)
Opera 109 (Mac, Win11)
Firefox 124 (Mac)

Huge correlation with the blink engine, although Firefox on Mac confuses me a bit.