Solving CPU bottleneck in NativeXR Hill Valley AR portal scene

in my wgpu-native fork of BabylonNative, I’ve gone deep into optimizing the infamous AR portal into Hill Valley demo. one of the ceilings to reaching 60fps performance on iPhone XS (A12), iPhone 12 (A14), and iPad Pro 2 (A10X) is large amount of CPU time issuing WebGPU render-pass encoder calls from JavaScript.

On BabylonNative, every setPipeline, setBindGroup, setVertexBuffer, setIndexBuffer, and draw call crosses the JavaScriptCore/Node-API/native bridge. The full scene plus SSAO2 makes that bridge chatter cause visible jank on iPhone 12 after AR placement on top of drawing more power than it needs to.

What I would propose to discuss is two things:

  1. batching commands
  2. checking for Multi-Draw Indirect (and similar) WebGPU features and using them, keeping the CPU/JS path out of it when it safely maps to existing BabylonJS features and dev expectations

I have a very BabylonNative WebGPU specific sketch here, which I’ve tested on several devices: [codex] Batch NativeWebGPU render pass command streams by matthargett · Pull Request #2 · rebeckerspecialties/Babylon.js · GitHub

TL;DR: roughly 90% fewer native render-pass stream calls, on iPhone 12 this reduces peak CPU usage and helps 1% FPS lows. on iPhone XS, it goes from ~40fps up to ~45fps with lower sustained CPU usage peaks.

I’d love to hear people’s thoughts, criticisms, requests for measurement, etc. It seems like this could be a way to dip a toe into future workgraph enablement by formalizing the contracts that make more of the batching safe. Regardless, it makes BabylonJS’ AR portal demo at 60fps more practical on reasonably modern Apple mobile GPUs.

cc @BabylonNative

I’ve done more validation and cross-checking, and opened the upstream PR here: [WebGPU] Add render-pass command stream batching for native hosts by matthargett · Pull Request #18562 · BabylonJS/Babylon.js · GitHub