I am writing webgpu ray tracer, is anyone interested in this topic?

Vadim1 · July 5, 2023, 9:41pm

is there anyone willing to support the development of open source webgpu ray tracer ?

(preview online demo (no bvh yet) is available here (link), by subscription for only one Euro for free)

WGSL GPGPU LBVH Building & Traverse underway

P.S.:
(babylon.js is used to access webgpu)

carolhmj · July 6, 2023, 5:59pm

Cool project! You could share the github repo if available?

Vadim1 · July 6, 2023, 6:59pm

(BVH hasn’t ported yet, I’m porting it from my previously unpublished opencl RTRT application, but it also requires writing a WGSL version of radix sort, because the Boost.Compute library was used for this earlier)

kzhsw · July 8, 2023, 5:08am

You can find them in this webgpu issue

github.com/gpuweb/gpuweb

Ray-Tracing extension

opened 10:38AM - 06 Jan 20 UTC

maierfelix

question large

I've made a Ray-tracing [implementation](https://github.com/maierfelix/dawn-ray-…tracing) for the D3D12 (using *DXR*) and Vulkan (using *VK_KHR_ray_tracing*) backend of Dawn. To demonstrate usage: - [Chromium RT Build](https://github.com/maierfelix/chromium-ray-tracing) - [JS example](https://github.com/maierfelix/webgpu-examples/blob/master/ray-tracing/index.mjs) - [C example](https://github.com/maierfelix/dawn-ray-tracing/blob/master/examples/RayTracing.cpp) - [Path tracer demo](https://github.com/maierfelix/WebGPU-Path-Tracer) Link to specification [here](https://github.com/maierfelix/dawn-ray-tracing/blob/master/RT_SPEC.md).

Vadim1 · July 9, 2023, 1:06pm

specifically the dawn-ray-tracing project :

firstly, it requires custom chrome build

secondly, it requires support for ray tracing in video card drivers

P.S.:

well, I wanted to implement hardware agnostic software gpgpu ray tracing requiring only support for WGSL compute shaders

Happy0Ending · July 11, 2023, 5:54am

that‘s great！But The larger the heightand width of the webpage, the worse the performance。When I maximize the webpage,I got 18 FPS。

carolhmj · July 11, 2023, 12:27pm

I think that’s expected, as more width and height = more pixels to go over

Vadim1 · August 7, 2023, 10:44pm

seen webrtx under webgpu ? (released 4 months ago)

example from description

P.S.:

also works on compute shader,

WebRTX is not hardware ray tracing and is a pure compute shader implementation. This means WebRTX works as long as your browser supports WebGPU (only tested on Chrome so far).

but again without GPGPU BVH Building apparently

The building of BVH happens on host which is then flattened into a buffer for stackless traversal on GPU.

the implementation is tricky, of course although we can say that this is a professional approach (in that tracing can be written on GLSL shaders, even if it is not supported in drivers), but taking into account the CPU’s BVH build, this is of course still quite useless …

### Code structure

/bvh - Rust code for building BVH and serializing it to a format suitable for stackless traversal on GPU.
/glsl - Rust code for parsing and manipulating user provided shaders.
/naga - WASM binding for naga, based on wasm-naga.
/src - All other typescript library code.

Vadim1 · December 10, 2023, 8:16am

if anyone is interested, there is an implementation of radix sort for webgpu (native)

(link to github)

good performance, but after porting to Babylon JS webgpu layer - performance decreases by 5-7 times

therefore, you will probably have to implement the ray tracer in webgpu native, without Babylon JS - at least until the latter has the ability to access the low-level webgpu API (it seems there is no such option now?)

P.S.:

radix sort is part of the LBVH implementation

Evgeni_Popov · December 10, 2023, 2:56pm

Would you have a link to your port? I don’t really see why the port to Babylon would be slower because the layer is very thin, as we basically dispatch the compute shader calls to the browser…

jelster · December 10, 2023, 6:07pm

I know I’m late to the thread, but paging @erichlof !

Vadim1 · December 11, 2023, 10:59am

webgpu used Babylon.js

settings below (in function test()):

`
let count = 64*64*64 *4*2; // default (64*64*64)
let max_value = 1073741824-1; // default (10000)
let bits = 30;  // default (14)
`

wth this settings :
(webgpu used Babylon.js) time 5-7 milliseconds (my code above)
(webgpu native API) time 1 ms (Fei Yang code)

Evgeni_Popov · December 11, 2023, 2:02pm

I think there’s a mistake here:

github.com

itmanager85/radix_sort/blob/main/radix_sort.js#L337-L343


      
          if ( !RadixHelper.bGroup1_t || !RadixHelper.bGroup2_t) {
              let bGroup1_t = createBuffer0(4);
              let bGroup2_t = createBuffer0(4);
          
              RadixHelper.bGroup1_temp = bGroup1_t;
              RadixHelper.bGroup2_temp = bGroup2_t;
          }

bGroup1_t and bGroup2_t don’t exist and are never written, so you end up creating two buffers every time Update_PipelineRadixScan2 is called.

What’s more, in the original code, it creates all bind groups in advance and simply uses them in the main loop. In Babylon.js, when you update a compute shader input, the bind group must be recreated. If you want better performance, you should create as many compute shaders as you can have variations of the inputs. This way, the behavior will be closer to the original code, where everything is created once and you only have to dispatch the compute shaders (and update the uniform buffers).

Vadim1 · December 11, 2023, 3:19pm

fixed, and now time 4 ms on average still a lot …

creating a lot of “new BABYLON.ComputeShader” for all variations of the inputs, isn’t it?

this is not particularly convenient, manipulating only of bind groups is much easier and faster

Evgeni_Popov · December 11, 2023, 6:40pm

Yes.

We can’t expose the bind groups, it’s too low level. But a ComputeShader is fairly light, so creating a number of them should not be a problem.

Vadim1 · December 12, 2023, 8:19am

fynv native API webgpu “radix sort” (I optimized it slightly and added a benchmark) - 0.8 ms on average

my optimized code (“radix sort”) for Babylon.js - 2 ms on average

my optimized2 (close to the original fynv) code (“radix sort”) for Babylon.js - 2 ms on average

as you can see, the difference is 2.5 times in performance (radix_sort_native vs radix_sort_opt)

Additionally, sometimes there are performance drops (15ms or more) when radix sort using Babylon.js

Any other ideas for optimization? (for Babylon.js)

Evgeni_Popov · December 12, 2023, 10:54am

Bind groups and other GPU resources are created when ComputeShader.dispatch is called. So, in your tests with Babylon, you will incur this recreation on each test, whereas the creation of the GPU resources is not included in the timing for the native test.

To improve things in Babylon.js, you should either run first the whole loop to make sure the GPU resources are created before timing the real test, or reuse the same compute shaders for all tests.

I also created a PR which adds a fastMode property to the ComputeShader class:

When true, isReady is not called by dispatch anymore, and it does not check either if the underlying GPU resources should be recreated (because of changes in the inputs). So, in your case, you could pre-init by running the loop this way:

const promises = [];

for (let i = 0; i < bits; i++) {
    let j = i % 2;
    {
        {
            promises.push(pipeline_radix_scan1_buf[j].dispatchWhenReady(0, 0, 0).then(() => pipeline_radix_scan1_buf[j].fastMode = true));
        }


        for (let k = 1; k < buffers_scan1.length; k++) {
            promises.push(pipeline_radix_scan2_buf[k - 1].dispatchWhenReady(0, 0, 0).then(() => pipeline_radix_scan2_buf[k - 1].fastMode = true));
        }

        for (let k = (buffers_scan1.length - 1) - 1; k >= 0; k--) {
            promises.push(pipeline_radix_scan3_buf[k].dispatchWhenReady(0, 0, 0).then(() => pipeline_radix_scan3_buf[k].fastMode = true));
        }
    }

    {
        promises.push(pipeline_radix_scatter_buf[j].dispatchWhenReady(0, 0, 0).then(() => pipeline_radix_scatter_buf[j].fastMode = true));
    }
}

await Promise.all(promises);

The PR also lets you pass (0,0,0) to dispatch for the workgroup counts. That way, all GPU resources are created but the compute shader is not executed.

Here’s a PG with these changes (will work as expected only when the PR is merged):

Vadim1 · December 12, 2023, 3:08pm

I updated the benchmarks on github and now the code reuses the same compute shaders for all tests, but no performance changes

radix_sort_native - 0.8 ms [webgpu native API]
radix_sort (no bind groups optimization) - 4 ms [Babylon.js used]
radix_sort_opt - 2 ms [Babylon.js used]
radix_sort_opt2 - 2 ms [Babylon.js used]

Evgeni_Popov · December 12, 2023, 5:29pm

What about the PG I linked above?

Even without the PR in, it should already improve things.

Vadim1 · December 12, 2023, 7:14pm

no difference on PG (3 ms avg at all) :

PG [run first the whole loop with Promises] - 3 ms avg

PG [reuse the same compute shaders for all tests] - 3 ms avg

PG (default) - 3 ms avg

Topic		Replies	Views
How do I activate the webGPU in a complex scene? Questions webgpu	16	1746	December 11, 2024
webGpu csm autoCalcDepthBounds Questions	16	126	May 30, 2025
Babylon X: Discussion topic Feature requests	43	1325	August 15, 2025
Data transfer from compute shader to renderer Questions compute-shader	18	1513	December 7, 2023
WebGPU not working Bugs	15	769	May 29, 2024

Related topics