Should performance (considerably) improve if vertex shader rejects all vertices?

Joe_Kerr · May 30, 2025, 1:16pm

Hi everyone,

Please see this brief playground: Babylon.js Playground

The heart of it is this line which is part of a vertex shader of a ShaderMaterial:

gl_Position = vec4(1.0, 1.0, 1.0, 0.0);

By the looks of it, none of the vertices are rendered anymore. However, I get

without ShaderMaterial: 30fps
with ShaderMaterial: 40fps

Ok, granted, there is an improvement. But if using the ShaderMaterial, there is nothing left to render. So I would expect 60fps.

Do I somehow interfere with the early z-test? Is this just normal and I am out of luck? Is there anything else I can do to signal the GPU to skip/discard/ignore certain vertices?

FYI use case: dynamically enable/disable known vertex ranges (like if you wanted to toggle on/off particular (former) meshes in a merged mesh).

Best wishes
Joe

HiGreg · May 30, 2025, 1:51pm

I looked into this a little bit, but others will certainly know better. Here’s my take.

There isn’t a standard way to “drop” a vertex in the vertex shader in WebGL2. Which means the vertex shader will always process a vertex, that is a vertrx shader must have both an input vertex and an ouput vertex. There might be some savings if there is conditional code in the vertex shader that can be “bypassed.”

The fragment shader can execute “discard” but still at least starts processing (up until the discard).

Conceptually this kind of makes sense in that a vertex is an element of a primitive (e.g. most commonly a triangle) and dropping a single vertex from a triangle doesn’t make sense.

A set of shaders (vertex & fragment) COULD be created to coordinate and save some time, but they would need to be specially programmed to do so.

That said, it seems to me that clipping could be used to implement “primitive dropping” by forcing each vertex of a primitive outside the visible region.

Edit to add: in my research on the subject, it seems to be unlikely that time can be saved within a vertex shader. This is due to the way that vertex shaders are executed in parallel and that all vertex shaders in the same “wave” or “warp” execute in lock step.

Performance Implications: This is where the “wave” terminology becomes relevant. All vertex shaders within a wave execute simultaneously. This allows for high throughput, but it also means that if one thread in the wave has to wait (e.g., due to a branching instruction), all threads in the wave have to wait. So ideally you want uniform execution.

Optimized for Parallelism: Because of wave execution, try to write vertex shaders with as few branches as possible and with consistent execution paths for all vertices for optimal performance.

Clipping happens before the fragment shader: Clipping is a process that determines which primitives (like triangles) are visible within the view frustum (the region of 3D space visible to the camera). This process takes place during the Vertex Post-Processing stage of the graphics pipeline, before rasterization and fragment shading.

Purpose of clipping: Clipping is essential for performance. By removing primitives or portions of primitives that are outside the view frustum, the GPU avoids unnecessary computations in later stages, like rasterization and fragment shading, for pixels that would not be visible anyway.

Fragment shader operates on rasterized fragments: The fragment shader only processes fragments that are generated by the rasterizer. The rasterizer, in turn, only creates fragments for the visible portions of primitives that have passed the clipping stage.

Deltakosh · May 30, 2025, 3:15pm

This is correct.

The only thing you will save with that is the fillrate (Running the fragment shaders). But on modern hardware you will note save a lot because they are damn fast

Each face will still run vertex shader on all vertices. And in that example you have a shit ton of vertices to process

TLDR: Your test is probably not the right one as we still have a ton to process on the GPU (but at vertex level)

Deltakosh · May 30, 2025, 3:17pm

I think what you need to test is something where the fragment shader is UTERLY complex like reading 64 times from a texture or something and having a single fullscreen quad

Joe_Kerr · May 31, 2025, 11:56am

Awww I was so hoping… Thanks for the explanations guys!

So really the only means we have as sort of “Vertex Pre-Processing stage” is the vertex buffer. Back to the drawing board

Topic		Replies	Views
Is it possible to make a clip plane not affect the skybox? Questions	6	848	May 14, 2020
Cutout shader to remove vertices of mesh Questions	31	4225	May 31, 2024
Compute Shaders Order Execution Questions webgpu , wgsl , compute-shader	30	1942	June 28, 2023
Per vertex lighting Questions	2	607	February 25, 2022
Apply vertex shader to custom PostProcess Questions shader	7	2334	August 18, 2019

Should performance (considerably) improve if vertex shader rejects all vertices?

Related topics