I’m sorry for the tag then. I just saw his name a lot in git blame in the restoration code.
The code in your screen shot does not work in the current pass of the render loop in which the context was lost though. _contextWasLost is set in the event handler of the context lost event. This handler can only run after babylon has yielded to the browser by requestAnimationFrame() at the end of the render loop.
Even if it could be set earlier, there would be a race condition. What if the context was lost during _renderFrame()?
I’m unsure of what you are asking for. The thrown exception can be observed in the playground in post #1. If you tick “Pause on uncaught exceptions” (chrome) and watch this._gl.isContextLost(), it will return true.
This means the guard around the render loop really does not work in the frame the context is lost.
Thanks for the PR.
This fixes a new bug, that may have been introduced, after this thread has been opened. I haven’t encountered this one before.
The exception on VAO creation is still reproducible by me following the steps in post #1 using chromium 125. It depends greatly on window size and browser used.
Unfortunately, I have no idea, how to make this easier to reproduce, since I don’t know, why babylon tries to create this VAO in this exact frame in the first place.
I do not see how this is possible without a repro unfortunately cause VAO are only created during render, render is synchronous (so no context lost in between) and context lost prevents render to be call. Would be amazing to track it down.
In case you are not able to reproduce the exception in the PG from post #1, please try this one:
Just rerun a few times and you should see the exception being thrown. I have just reproduced this on Chrome 125 and Firefox 126 on Linux. So this PG should be way more reliable than the older one. Also this seems to be completely independent from Blink now.
Here is another PG simulating a context loss in the callback of SceneLoader.ImportMesh leading to an exception during shader creation:
As I understand your last post, you have the hypothesis, that there can be no context loss inside a synchronous JS function. I really think, that this is not true. As I understand, a context loss (especially a natural one not triggered via the extension) can occur at any time.
I totally agree with this and I think this is the problem here. The context lost event can only be delivered after the raf has finished, but the context can already be broken earlier.
Why is that? As I understood @mikeysee, they are facing it in production. Unfortunately, I do not have data for our application, since we disabled context restoration and force a page reload instead atm.
Rn, there are still to many open questions about how to tackle this for me. I see two possibilities:
Allow the gl objects to be null, let the render loop finish and receive the context lost event when the raf has finished.
Keep the null checks, but instead of throwing check gl.isContextLost() and cancel the animation frame.
I agree, that in both cases, keeping babylons internal state intact might not be trivial.
I’m not sure about, what would be the best approach. Although, I tend to 1., since there shouldn’t be much reason to read from gl objects, hence they being null shouldn’t really affect babylons internal state. (that’s speculation ofc) I do not have that much knowledge about babylons internals, to make any decision here confidently. That’s why, I asked for help from you guys and also tagged Mr evgeni_popov for insights into the restoration code.
Hey mikey, you could add a handler to Engine.onContextLostObservable. We displayed a Modal informing the user about the crash and asking for a page reload this way for the longest time.
After lots of testing, I’ve implemented the following workaround now:
This forces the creation of all VAOs, vertex buffers, etc. in the scene and therefore prevents babylon from trying to create these on a lost context. This needs to be reapplied, when you add objects to the scene or after a successful context restoration.
Ugly, but seems to work so far. I’d still rather have the underlying issue fixed, but I’m short on time and still have too many open questions as mentioned above. I’d be glad for any more help on this.