Havok Physics Benchmark & Multithreading Question

First off, here is a modified version of that html sample for havok physics which shows 3000 entities running for me at 60fps in a single thread.


Was curious if you need to mirror the whole scene in a thread for physics entities or if you can dedicate a thread. I used the scene mirroring method to keep crowd nav mesh calculations in an asynchronous thread for example then I can run each pass simultaneously and just share position/rotation data across threads rather than back up a single thread. Anyway great stuff.

Raw html:

<!DOCTYPE html>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <title>Babylon.js Instanced Mesh Benchmark</title>
  <!-- Babylon.js -->
  <script src="https://cdn.babylonjs.com/havok/HavokPhysics_umd.js"></script>
  <script src="https://cdn.babylonjs.com/babylon.js"></script>
    body {
      overflow: hidden;
      width: 100%;
      height: 100%;
      margin: 0;
      padding: 0;

    #renderCanvas {
      width: 100%;
      height: 100%;
      touch-action: none;

    #canvasZone {
      width: 100%;
      height: 100%;
  <div id="canvasZone"><canvas id="renderCanvas"></canvas></div>
    const canvas = document.getElementById("renderCanvas");
    const engine = new BABYLON.WebGPUEngine(canvas);

    const createScene = async function (numSpheres) {
      await engine.initAsync();
        // This creates a basic Babylon Scene object (non-mesh)
      const scene = new BABYLON.Scene(engine);

      // This creates and positions a free camera (non-mesh)
      const camera = new BABYLON.FreeCamera("camera1", new BABYLON.Vector3(0, 5, -10), scene);
      camera.attachControl(canvas, true);

      // This creates a light, aiming 0,1,0 - to the sky (non-mesh)
      const light = new BABYLON.HemisphericLight("light", new BABYLON.Vector3(0, 1, 0), scene);
      light.intensity = 0.7;

      // Our built-in 'sphere' shape.
      const sphere = BABYLON.MeshBuilder.CreateSphere("sphere", { diameter: 2, segments: 32 }, scene);
      sphere.isVisible = false; // Hide the initial instance template

      // Our built-in 'ground' shape.
      const ground = BABYLON.MeshBuilder.CreateGround("ground", { width: 1000, height: 1000 }, scene);

      // Initialize plugin
      const havokInstance = await HavokPhysics();
      // Pass the engine to the plugin
      const hk = new BABYLON.HavokPlugin(true, havokInstance);
      // Enable physics in the scene with gravity
      scene.enablePhysics(new BABYLON.Vector3(0, -9.8, 0), hk);

      // Create arrays to hold instances
      const sphereInstances = [];

      // Create instances of sphere and ground
      for (let i = 0; i < numSpheres; i++) {
        const inst = sphere.createInstance(`sphere${i}`);
        inst.position.y = 4;
        inst.position.x = Math.random() * 90 - 45;
        inst.position.z = Math.random() * 90 - 45;
        // Create associated body for physics
        new BABYLON.PhysicsAggregate(inst, BABYLON.PhysicsShapeType.SPHERE, { mass: 1, restitution: 0.75 }, scene);

      // Create a static box shape for the ground.
      const groundAggregate = new BABYLON.PhysicsAggregate(ground, BABYLON.PhysicsShapeType.BOX, { mass: 0 }, scene);

      return scene;

    let scene = null;

    function runBenchmark(numSpheres) {
      if (scene) {

      createScene(numSpheres).then((scn) => {
        scene = scn;
        engine.runRenderLoop(function () {
          if (scene) {

    // Resize
    window.addEventListener("resize", function () {

    // Run benchmark with different numbers of instances


This is a great idea and I know that we were talking about it a lot with @Cedric.

@Cedric : is it still something the Havok team is interested in?

It would be great, I’m working on the mirrored physics scene in the thread method so I can improve my performance over using Rapier, which for some reason only lets you interact with its world with objects in javascript and doesn’t share the position buffers raw so it caps out around 1000-2000 dynamic entities before my CPU explodes.

Here is a multithreaded demo I am making, currently using Rapier: GitHub - joshbrew/mazeswarm: Javascript maze game with swarm physics. Acerola Game Jam 0 last minute submission


This is Rapier’s performance alongside Babylon with 1000 dynamic objects and ~3000 static. Out of the gate Babylon is already 3X faster than this without multithreading, so I just am going to make it so I can sync threads. The repo I shared above showed how I got crowd navigation running in a thread talking round to the physics to get back to the render.

And baseline performance with just a handful of physics entities and a light in the scene:

This test is here:


Anyway it’s pretty easy to just sync the world matrix buffers or pull positions and so on while running separate threads for each part of the program that does complex math with the entities.

The caveat is of course that you are rendering the previous physics frame by staggering the computations across threads, but in practice it feels completely normal especially at high frames. The sync when using array buffers is so fast too that the final render pass could await the world matrix sync with a slightly deeper implementation.

1 Like

Adding @eoin as well

This is really cool! I didn’t have any luck running the sample (I think it’s too much for my poor little laptop ;_; ) but super glad to hear we’re so fast. Out of curiosity, in the Rapier version, is the physics world step multithreaded, or is it just the workers which perform the transform sync?

it might take a minute to finish importing the cdn or it could the webgpu support if not using chrome for the havok example.

Far as I recall Rapier does have some SIMD and multithreaded options in their rust environment but the out of the box wasm compilation I am using from their npm I am not sure is using those settings so it slows down a lot after a couple thousand dynamic entities. Still pretty good but I know it can go further. Otherwise I just sync the render thread from the physics world thread since rapier runs synchronously on that thread. If Havok is taking care of the threads/workers under the hood to stupid proof it that it would explain the big gap in performance there but I will work on comparing with a hacky worker sync version to compare with rapier just to see if it benefits.

previously, i have found its better to add the rapier rust lib as a subproject and compile it manually with your project, because you’ll get much better performance that way by tuning the compile parameters and definitely using simd, which for whatever reason i dont think it ships with (at least when i was testing several months ago). simd is probably why you’re seeing the perf gains with havok.

on another note… i am really curious about using bullet (ammo)'s opencl backend and transforming that to webgpu compute shaders. physx can also do this partially afaik, and probably havok too, but the web version of havok is so gimp’d compared to what it could be since its a paid product, im reluctant to use it. there’s probably huge gains to be had manually compiling bullet too. also jolt has broadphase multithreading like havok native does, so that could be a good option too. there is a wip repo i saw on github PhoenixIllusion/babylonjs-jolt-physics-plugin (github.com)

rapier doesnt have advanced broadphase multithreading like havok, its relatively very simple. rapier/src/pipeline/collision_pipeline.rs at master · dimforge/rapier (github.com)

it does have region segmentation for the broadphase as you can see here, which can be multithreaded. but afaik havok native has autodetection features that i assume make it way better.
(thorough explanation in the comments here)
rapier/src/geometry/broad_phase_multi_sap/broad_phase_multi_sap.rs at master · dimforge/rapier (github.com)
rapier/src/geometry/broad_phase_multi_sap/sap_region.rs at master · dimforge/rapier (github.com)

if i recall correctly, havok native separates all physics objects into individual components, then creates the subregions on the fly with a task system that allocates job to a variable number of threads (probably detected at runtime). is that correct?