PIX but for MacOS

I made some measurements using PIX on my windows machine to determine how long various read access methods take via compute shader. But I don’t know how to do so on a mac machine which equally important for me.

Anyone ever do this before?

I’m gonna try following along with this tutorial:

After reading some more I realized that babylon.js already does this in some way, though I’m not exactly sure how I would access it other than using the inspector tool. For some reason though, on windows if I try launching chrome with the command flag --disable-dawn-features=disallow_unsafe_apis I still get an error in the console if I try to use gpu timings:

device = await adapter.requestDevice({
    requiredFeatures: ["timestamp-query"],
  });

Testing this on mac OS now.

Doesn’t work for me on mac OS either.

I then tried using chrome beta and chrome canary on both windows and mac OS and ran into a new error:

Tint WGSL reader failure: :497:13 error: "foobar" must only be called from uniform control flow
  foo = bar(x_731);
            ^^^^^^^^^^^
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:1294:3 note: control flow depends on possibly non-uniform value
  if ((x_1124 & (x_1125 < 32.0f))) {

Note: This is my own code (nothing from bjs).

This doesn’t happen on the current version of chrome on both windows and mac OS but does happen in canary and beta on windows and mac OS.

@Evgeni_Popov do you have any pointer around what I can do to fix my code for control flow errors like this?

Are the same flags enabled in all browser versions?

Yeah, I’ve tried variations such as:

open /Applications/Google\ Chrome\ Canary.app --args  --disable-dawn-features="disallow_unsafe_apis"

open /Applications/Google\ Chrome\ Canary.app --args --enable-unsafe-webgpu --disable-dawn-features="disallow_unsafe_apis"

Does GPU time querying work for you in any browser version?

I’ve never tested :slight_smile:
My reply was based on my debugging experience - if something should work in the same conditions and it doesn’t work, check if the conditions are really the same.

They have changed the flag some time ago, it is now --enable-dawn-features=allow_unsafe_apis. With this parameter, GPU timing does work for me in the inspector. Here’s how it is done in the inspector, if you want access to the counter in your own code:

Uniformity analysis has been added to WGSL to ensure correct and portable behavior.

Most of the time, the problem comes from the fact that a texture is sampled in a if block, and the condition of the if is not static, meaning the result of the if evaluation may be different depending on the GPU thread which is executing the instruction.

One way to fix it is to sample the texture without filtering (either using texelFetch or texture2DLodEXT / textureCubeLodEXT), if it’s acceptable for you to disable filtering.

You can also try to refactor your code, for eg. by removing the if block and making sure that what is inside the if block has no impact on the output when the condition is false.

Lastly, you can simply add /* disable_uniformity_analysis */ in your shader code, to disable uniformity analysis.

2 Likes

They have changed the flag some time ago, it is now --enable-dawn-features=allow_unsafe_apis

Wow, okay!

Might I suggest an update to the docs here:

I made the suggestion here:

this._engineInstrumentation = new EngineInstrumentation(scene.getEngine()); this._engineInstrumentation.captureGPUFrameTime = true;

Awesome! I’ll try this out!

Lastly, you can simply add /* disable_uniformity_analysis */ in your shader code, to disable uniformity analysis.

Yes!! Without understanding what you mean by

One way to fix it is to sample the texture without filtering (either using texelFetch or texture2DLodEXT / textureCubeLodEXT), if it’s acceptable for you to disable filtering.

Disabling the uniformity check is the ideal solution for me as I would rather avoid a texture sampling if it can be helped. To be sure that I understand the uniformity check:

IF my_function(global_thread_id: vec3) :
THEN sample a texture

Any code that does this will trigger the uniformity analysis - right?

Lastly, I’ll read about texelFetch or texture2DLodEXT / textureCubeLodEXT) - it sounds like I should now about these functionalities.

Thank you @Evgeni_Popov :smiley:

Actually, uniform analysis is enabled by default, so it will always apply (except if you add /* disable_uniformity_analysis */ to disable it). It will generate an error if the code is “not uniform”. The problem to fully understand what that means is that you must know a bit of how a GPU works under-the-hood.

As 95% of the time you get an error from the uniformity analysis is that because of a texture sampling, I prefered to explain it that way (and also I’m not an expert of GPU hardware, so it’s better I don’t try to go too deep because I would also make mistakes!).

To rewrite your example more accurately:

IF (mycondition_not_uniform) {
    ...
    sample a mipmapped texture with filtering
    ...
}

This will generate a uniformity analysis error. mycondition is not uniform if the evaluation could be different depending on the thread of a warp/wavefront which executes it. For eg, if the condition only involves parameters passed to the shader through uniform parameters (like uniform float myFloatParam;), then it will be ok because all threads of the warp/wavefront will see the same value for myFloatParam, it’s not possible that this value changes on a per thread basis. However, if your condition is something like if (vUV.x < 0.5), then you will get an error because vUV has a different value for each fragment shader invocation (and each fragment shader is processed by a different warp/wavefront thread).

The problem with having a texture sampling inside a non uniform block of code is that when filtering is enabled (that is, you are doing bilinear or trilinear filtering => only filtering with mipmaps (so trilinear filtering) will trigger the error!), the GPU needs the (screen-space) derivatives of the coordinates you pass to the texture sampling function (texture2D for 2D-textures in Babylon) to select the right mipmap. To calculate them, all GPUs (I think) launches threads in a 2x2 pattern (so 4 threads) and simply computes the derivatives as being the difference between the values computed by the 4 different threads. But if (at least) a thread of the 2x2 block does not run the code inside the if because the condition is false, then it’s not possible to calculate these derivatives. That’s what the uniformity analysis attempts to check beforehands, by a static analysis of your code.

Using texelFetch or a texture variant with “LOD” in the name won’t trigger the error because they don’t need the coordinate derivatives as they don’t interpolate between mipmaps.

Note that a problem reported by the uniformity analysis process means that your program may not produce the same result depending on the GPU it runs on! In the case explained above, if you disable uniformity analysis, your program will run, but some derivatives may not be calculated correctly, so you could see some rendering artifacts. But it’s also possible that it does not happen / it’s not visible…

Note also that it’s possible that the block of code inside a if block evaluated to false is still executed by the GPU, but that the result is just thrown away (I think it depends on the GPU architecture?). In that case, the texture sampling will be done and you won’t gain anything by trying to avoid it. Even if the block of code is not executed, texture sampling is cached, so there’s a high probability that the 2nd sampling will come from the cache and won’t cost much. So, it’s a trade-off between writing an accurate program without uniformity problems and performances (as always), but measuring performances is really difficult because a GPU does not generally behave the way you think it behaves (so you should always benchmark your code, which is quite difficult too, because you would have to benchmark it with different GPUs, and you are not sure that with the next generation of GPUs your optimized code won’t break).

All in all, in Babylon we had only a few uniformity analysis problems (I think we had a single one, after we removed the filtered texture sampling that should have been non-filtered from the start because the textures involved did not have mipmaps), so we decided to perform the texture sampling in all cases, outside the if, to avoid disabling the uniformity analysis process.

1 Like

Note that a problem reported by the uniformity analysis process means that your program may not produce the same result depending on the GPU it runs on!

Wow. This is really an important caveat to be aware of. If I’m understanding everything you’ve said so far about uniformity, would it be safe to consider code that passes the uniformity check to always run the same on different GPU hardware? Or are there other layers to the stack such as drivers and OS system architecture that can introduce bugs and or deviant behavior as well? I guess I’m wondering what sort of performance consistency assurance does uniformity analysis offer?

Yes, but…

That :slight_smile:
But these are bugs, so they don’t really count.

Uniformity analysis is not about performance but consistency.
Excerpt from the spec:

To ensure correct and portable behavior, a WGSL implementation will perform a static uniformity analysis, attempting to prove that each collective operation executes in uniform control flow. Subsequent subsections describe the analysis.

“correct” and “portable” are the key words here.

1 Like

Or are there other layers to the stack such as drivers and OS system architecture that can introduce bugs and or deviant behavior as well?

That :slight_smile:
But these are bugs, so they don’t really count.

That’s fair :laughing:

To summarize:
Uniform analysis at least ensures an exact set of operations are happening… But HOW those operations are carried out in the context of the OS and hardware that’s executing those operations is another layer of variability. In terms of performance, ensuring uniformity at least narrows it down to the latter.

Phew, what a lesson. Thanks for always taking the time to explain these concepts, I really appreciate it :slight_smile:

1 Like

https://google.github.io/tour-of-wgsl/uniformity-analysis/invocations/

Popov:1
Google:0

2 Likes