webGpu csm autoCalcDepthBounds

repro: Babylon.js Playground

Rotate the camera or zoom out, the shadow will flicker. Comment out line 18 and the shadow is stable. If you reload the PG and change to webgl2 engine, there are no such artifacts. Feel like somethings off in the webgpu implementation, wai?

Tested with Chrome Incognito Version 136.0.7103.114 (Official Build) (64-bit)

Using Chrome Version 136.0.7103.114 (Official Build) (64-bit): It happens rarely. I need to rotate a bit. Flickering means a small part of the shadow disappears for like a frame. No wait: can reproduce it reliably like so:

That’s something I don’t know how to fix at the time…

The problem is that when autoCalcDepthBounds == true, we generate some additional rendering to compute the min/max depth values of the scene. In the end, we must read the result back from the GPU to the CPU, and this is asynchronous in WebGPU, there’s no way around that…

It means we get the result with a delay of (at least) a frame, so the current frame doesn’t use the min/max values for this frame, but for the previous frame. Depending on the scene geometry and the camera position/view direction, you may not see artifacts, but it’s clearly visible in the PG because, depending on the camera view direction, the min/max values can be quite different from frame to frame.

Why is this so? If the depth reducer is activated, shouldn’t csm wait till the min/max values are completed before proceeding with split frustum? or the depth reducer should enforce a split frustum+update shadowmap when min/max are done?

It feels like the shadow computation sequence did not account for the depth reducer, not a fallibility issue of csm itself.

CSM cannot wait; calculations must be performed each frame. The frustum is divided each frame based on the camera position/orientation and the current min/max values. If the min/max values have not yet been updated due to the asynchronous delay, the values from the previous frame are used instead, hence the problem.

Yes, I understand that the async is causing csm to use outdated values in the current frame. From experience, this 1 frame lag should not cause a viewable flicker. A flicker artifact usually points to smthg more than 1 frame. I will dig down to verify.

Just in case, can this be resolved with the frame graph?

Test setup: https://playground.babylonjs.com/#HCVIDA#5 (repro + camera auto rotation)
Browser: Chrome Incognito
Version: @babylonjs/core@8.8.5

What I did: I inserted log points to track function sequencing.

In minMaxReducer

scene.getEngine()._readTexturePixels(reduction.inputTexture.texture, w, h, -1, 0, buffer, false);
minmax.min = buffer[0];
minmax.max = buffer[1];
console.log("after _readTexturePixels");
this.onAfterReductionPerformed.notifyObservers(minmax);

In cascadedShadowGenerator

this._depthReducer.onAfterReductionPerformed.add((minmax) => {
    console.log("onAfterReductionPerformed");
    let min = minmax.min, max = minmax.max;
    if (min >= max) {
        min = 0;
        max = 1;
    }
    if (min != this._minDistance || max != this._maxDistance) {
        console.log("b4 setMinMaxDistance");
        this.setMinMaxDistance(min, max);
    }
});
this._shadowMap.onBeforeBindObservable.add(() => {
    this._currentSceneUBO = this._scene.getSceneUniformBuffer();
    engine._debugPushGroup?.(`cascaded shadow map generation for pass id ${engine.currentRenderPassId}`, 1);
    if (this._breaksAreDirty) {
        console.log("before splitfrustum");
        this._splitFrustum();
    }
    this._computeMatrices();
});

In both webgl2 and webgpu, all logs indicate the same function firing sequence. I did not observe any of the afore-mentioned said 1 frame difference. Even with furiously jiggling the camera.

Then I turned to logging the cascade values in splitFrustum. I saw 2 consistent differences between webgl2 and webgpu.
a) webgpu had consistently lesser data captured than webgl2.
b) webgpu consistently had incorrect _frustumLengths for the last cascade at the start of the data capture, cf, below image.


The blue plot refers to webgl2 engine _frustumLengths for cascade1 and 2 respectively. The 1st point is the length for cascade1 followed by cascade2 and so on and forth. The red plot is from webgpu. If we use webgl as ref, then the 1st and 2nd data point for the webgpu, ie, frustrum length for cascade1+2 is shorter than it should be.

This explains why we see the shadow flicker
a) as the last cascade cutoffs earlier than it should.
b) only when we move the camera at the start.
c) when zooming out but not when zooming in

My question now is: Is the webgpu engine and the minMaxReducer working correctly at the start of the SDSM execution? Or is there a pre-warm state/prerequisite that was missed?

1 Like

It’s not easy to identify the problem with the logs you’ve set up, because we used a shortcut to make the WebGL implementation work as expected. The code for reading the min/max values should normally be:

await scene.getEngine()._readTexturePixels(reduction.inputTexture.texture!, w, h, -1, 0, buffer, false);
minmax.min = buffer[0];
minmax.max = buffer[1];
this.onAfterReductionPerformed.notifyObservers(minmax);

_readTexturePixels returns a promise, so we normally have to await for it to be fulfilled. If you add an await, you will see that we get the updated min/max values after the current frame has finished (the RAF has already returned), so they will be used for the next frame.

However, since WebGL reads synchronously internally, we know that we will get an updated buffer even without waiting for the promise. I added some documentation to the min/max reducer to explain this:

Note that if you add await to the code, you will encounter the same artifact issues in WebGL as in WebGPU.

Yes, I encountered the same when I logged the depth map for verification.

this._depthReducer.onAfterReductionPerformed.add(async (minmax) => {
    // chk depth map!
    const dm = this._depthReducer._depthRenderer.getDepthMap();
    const data = await dm.readPixels();
    console.log(data[0]+" "+data[1]+" "+data[2]+" "+data[3]);

    let min = minmax.min, max = minmax.max;
    if (min >= max) {
        min = 0;
        max = 1;
    }
    if (min != this._minDistance || max != this._maxDistance) {
        console.log(min+" "+max);
        this.setMinMaxDistance(min, max);
    }
});

Kinda off-topic but the depth map for webgpu is needlessly populating on all channels, might wanna fix that.

I have been testing out simple fixes and have a couple that works. But I feel its best to discuss+test further since there are impacts. On my local copy,

a) I fixed minMaxReducer to its proper async/await at _readTexturePixels (If we go to PR, it would need a chk for webgpu or webgl).
b) I added a tolerance, for max in setMinMaxDistance. During testing, 10% was sufficient to eliminate all flicker for mouse rotations. For mouse zoom out, I had to raise it to 40% ~ 50% tolerance. Essentially, I’m making the last cascade slightly longer to account for the difference that a single frame can cause when pointer moves. Ideally, this value could be user set, eg, csm.autoCalcBoundsMaxRangeTol = 0.1.

setMinMaxDistance(min, max) {
        if (this._minDistance === min && this._maxDistance === max) {
            return;
        }
        if (min > max) {
            min = 0;
            max = 1;
        }
        if (min < 0) {
            min = 0;
        }
        // add a tolerance
        max += 0.1*max;
        if (max > 1) {
            max = 1;
        }
        this._minDistance = min;
        this._maxDistance = max;
        this._breaksAreDirty = true;
    }

What I don’t like is the artificially adjusted max range can cause additional shadows where its not meant to be. There is also very minor shadow quality degradation but I don’t think its a big deal. We could use a falloff tolerance instead of a constant value.

My other solution is to add in a flag that tracks the async delay from the _depthReducer. Then _splitFrustum artificially extends the maxDistance by tolerance if the _depthReducer hasn’t returned with new values. And since this happens for only 1~2 frames, its not a hard tolerance shift. The cons is that its a lot more codes for what is essentially a hack and still doesn’t address abrupt zoom out flickers unless tolerance is high.

I guess the point is that mitigation methods exists, its not an unfixable problem. Then again for the gpu, CSM is not my preferred choice. I think we need to start developing alternatives with shadow mapping? So users can stick with CSM in webgl and other gpu efficient methods when migrating to webgpu. The downside of it is that other methods also need await for texture reads for the gpu which goes back to the root, bjs really needs to rework the shadow computation to account for async tex reads, cf, why I said the gpu and cpu are different beasts, not all codes are compatible as is…

Thoughts?

What do you mean? By default, a red texture is used for the depth map (5th parameter of the constructor), so a single channel texture is used.

As you go on to say, all solutions are somewhat hackish, as they rely on adding a hard-coded value (even if provided by the user) to the max value. I don’t think it’s worth spending too much effort on this, but providing users a simple function that allows them to pass either a fixed offset or a percentage of the max distance value should probably be ok (something like setMaxDistanceTolerance(tolerance: number, isPercentage: boolean)), if you want to try it.

However, we shouldn’t add the async/await construct, so that WebGL is not affected.

For CSM, we should do most of the work on the GPU side to avoid reading back data on the CPU (see the section “GPU-Driven Cascade Setup and Scene Submission” in A Sampling of Shadow Techniques).

More generally, we should move to more GPU-driven rendering pipelines in the future to enable more advanced rendering techniques (see https://advances.realtimerendering.com/s2015/aaltonenhaar_siggraph2015_combined_final_footer_220dpi.pdf for example), but we first need to get MultiDrawIndirect support in WebGPU (which is not yet available at this time).

1 Like

I did not look too deeply, its likely I’m wrong. readPixels() was giving something like [0.00283123564,0,0,1] in webgl2 but [0.00283123564,0.00283123564,0.00283123564,0.00283123564] in webGPU for the first pixel?
If webGPU is storing only R channel in the depthMap, then its wai.

Well, this was caught before I tested frame graphs which lead me to think that csm users were screwed trying to port to webGPU, hence, a mitigation was needed. With frame graph, csm in webGPU works fine! We just need to doc this so users know what to do. At this point, I do not think adding a hack is a good idea. A hack should really be considered if and only if there are no alternatives.

Agreed and thumbs up!

Yes, I’m aware. If we are on gpu, my first choice is still ray traced shadows. There is nothing wrong with csm except that the shadows are flat. With pbr, you will really want better lighting and more natural penumbras to match the realism. For games, jfa seems to be very promising with good perf as well for soft shadow. But it still doesn’t beat real time ray tracing. Now, I’m just rambling and we could go on and on, this really needs its own thread…tsk tsk.

Marking as solved, closing. Thanks for all the help! Cheers!

Yes, WebGPU also uses a red texture. Reading a texture is done by using a RGBA buffer, though, and WebGPU copies the same value to each channel.

I’m quite surprised, because CSM didn’t change with frame graphs, there’s still the “async” delay! Would you have a PG so that I can have a look?

lol, my bad. I did not toggle autoCalcBounds when I was testing csm with frame graphs. You can find the bug here.