Physics: RuntimeError: memory access out of bounds

Hi everyone,

The exception below is triggered when I blow up a wall with v2 physics. The exception started to occur after I added the new rigid-body-gltf-extension (glTF + Physics + Babylon) and loaded in a rigid body mesh (created in Blender).

Uncaught RuntimeError: memory access out of bounds
    at HavokPhysics.wasm:0x2bb47
    at HavokPhysics.wasm:0x2c403
    at HavokPhysics.wasm:0x1f1b3
    at HavokPhysics.wasm:0x12a461
    at HavokPhysics.wasm:0x128ebf
    at HavokPhysics.wasm:0x181c71
    at HavokPhysics.wasm:0x1c697
    at HavokPhysics.wasm:0x23c04
    at Object.HP_World_Step (eval at new_ (HavokPhysics_umd.js:9:22462), <anonymous>:9:10)
    at e.executeStep (Babylon.min.js:1:4167677)

$func447	@	HavokPhysics.wasm:0x2bb47
$func457	@	HavokPhysics.wasm:0x2c403
$func157	@	HavokPhysics.wasm:0x1f1b3
$func1811	@	HavokPhysics.wasm:0x12a461
$func1805	@	HavokPhysics.wasm:0x128ebf
$func1978	@	HavokPhysics.wasm:0x181c71
$HP_World_Step	@	HavokPhysics.wasm:0x1c697
$func278	@	HavokPhysics.wasm:0x23c04
HP_World_Step	@	VM2879:9
e.executeStep	@	Babylon.min.js:1
e._step	@	Babylon.min.js:1
xr._advancePhysicsEngineStep	@	Babylon.min.js:1
t.animate	@	Babylon.min.js:1
t.render	@	Babylon.min.js:1
update	@	TacticalGame.js:1001 <--render loop

I can reproduce the exception - but not very reliably! That means I go to the same tile, shoot at the ragdoll and in maybe 70% of trials the exception occurs. I cannot trigger the exception when not loading the ragdoll mesh; just gave it 10 more trials. Also there was no such exception in the weeks before.

Anyway, huge problem is if I disable a subsystem I do not know whether it was the disabling or whether the exception did just not occur :open_mouth: So, to be fair, the exception may not be caused by the rigid-body-gltf-extension at all!

Unfortuntely, so far I failed to reproduce the exception in a playground. I guess I now have to assemble a playground bit by bit and hope for the exception to occur.

But maybe someone can make sense of the call stack above? Any hint would be very welcome.

Best wishes
Joe

First:

Good news! yes it does :slight_smile:

Did you share your reproduction?

Please, see edit of first post for the time being.

cc @Cedric , @eoin , @carolhmj

What version of @babylonjs/havok are you using? That’ll help me get some extra info from your callstack.

Obviously, a repro would be great, but (making a guess) I wonder if the objects you’re exploding have PhysicsShapeType.MESH shape types? Your description of the scenario makes me think that in the plugin, we’re trying to allocate a lot of memory, which fast moving, complex meshes using continuous collision detection can do. It might be that we’re asking for more memory than the browser is willing to give us…

Still working on the repro :frowning: I am bouncing between playground, project code and in-game testing.

@eoin I am using the wasm from https://cdn.babylonjs.com/havok/HavokPhysics.wasm last update from 9-Sep-2023. I could not find a version number so I uploaded a copy here: GitHub - Joe-Kerr/tempBabylonHavok

During the entire destruction process I only use PhysicsShapeType.BOX for level geometry and PhysicsShapeType.SPHERE for the projectile collider.

@eoin Upgraded to latest version. Same callstack (lines, addresses).

Some progress. Found a significant crash influencer. A gltf root node slipped through and ended up in the array of the cell fracture meshes. (The ones that get blown to pieces.)

Since I have filtered out the root node, no crashes during the automatic reproduction steps. That means under “normal” crash condition, I was able to reproduce the crash in about <2 game starts on average. I am at 10 game starts w/o crash. The reproduction steps are automated and the only source of variability, I think, should be the ragdoll which wiggles around a bit.

Unfortunately, if I then start shooting at other walls, the crash still occurs.

I still cannot reproduce the error in a playground: https://playground.babylonjs.com/#88CB6A#134

edit: After 30 trials, the automated crash script trigger the crash. After the crash there is no mesh of class ‘Mesh’ with empty geometry and set physicsBody. Oh dear :frowning:

Hey, so, I’m officially mid-vacation at the moment, so don’t really have the time to dig in, I’m afraid. Just a thought, though; when creating the instance of the HavokPlugin(), maybe you could set _useDeltaForWorldStep=false? That way, the physics engine will always use a fixed timestep and I’d hope would make the issue easier to reproduce, since there isn’t a framerate dependency on how it behaves.

2 Likes

FWIW, I can 100% cause a memory access error if I dispose a specific mesh/rigidbody during its own collision callback. However, several other meshes in my system do very similar things without issue. I’m trying to create a smaller repro, but will probably not be able to isolate before the start of LD54. So I may need to shelve until next week.

My big question is: Should I be able to dispose of a mesh/rigidbody inside its collision callback, or should I be doing something else, like disabling it and then disposing at the end of the frame? It looks like that might circumvent the issue, but like I said, I’ve been getting away with it so far. :sweat_smile:

1 Like

…and my issue was caused by a Vector3 that was created with 3 function pointers instead of numbers, due to an unfortunate cast. So, that explains that. It’s amazing how well it ran until the collision though, considering.

Update: No, I spoke too soon. That was indeed a bug, but I had commented out the dispose which is why it worked. (sigh) Back to debugging…

I would still like to know whether it is considered safe to dispose of rigidbodies in the collision callback. Seems to be, but I don’t want to depend on an edge case.

Without being familiar with the havok plugin code or the error, the “collision callback” is probably called in the middle of a loop or something?

can’t you attach an observable for afterRender and dispose the mesh, etc there?

Just making sure: it is the same crash with the same callstack as in the first post, right?

tl;dr

  • _useDeltaForWorldStep = false: might improve replicability but it does not guarantee it

  • ComicSans’s suggestion to comment the dipsose-call of the PhysicsAggregate in the collision callback: does not prevent crash

  • lastet playground with actual code I use offline (no crash): https://playground.babylonjs.com/#Z8HTUN#703

  • current status:

    • Crash steps are fully automated. No user input anymore at all.
    • I can locally reproduce the crash currently at max within 10 trials.
    • I cannot repdroduce the crash online.

@eoin I am assigning it to you for after your break as it seems their might be some dispose issues here. But please, do not answer before you are fully back :slight_smile:

1 Like

So, I reverted my change and then rewrote the code in pieces to find the problem, and now it doesn’t crash. I suspect my memory crash was typo-related, rather than inherent in the configuration. In particular, I think my Sept 29 hypothesis was correct, and a function pointer got passed by my code as a member of a Vector3, and somehow that manifested as a memory error, but only under certain conditions.

Sorry for the polluting @Joe_Kerr 's thread with a red herring!

1 Like

The @babylonjs/havok package has been updated to 1.2.1; we found a bug which could trigger a crash if bodies were disposed while generating events. Hard to say for certain without a repro, but maybe updating that package will fix the problem?

2 Likes

Looking good so far: 20 trials, no crashes. Going to let it run over night. Will be reporting back.

1 Like

Sorry for the delay @eoin. Gave it another 102 trials, no crash :smiley: I think we can close this thread.

Man what a journey, probably one of the meanest bugs. Sorry that I couldnt be more helpful with a (not) working playground.

1 Like