Faster convolution or filter PostProcess?

HiGreg · January 29, 2026, 2:36pm

I have worked out the equations for converting a convolution kernel to require fewer texture samples, taking advantage of (and requiring) bilinear sampling. For a linear kernel (single row or column) it reduces texture samples from kernel_size samples to (kernel_length + 1) / 2 samples.

I have also worked out the equations for 3x3 convolution filter to require 5 texture samples instead of 9.

To minimize the number of repeated identical calulations within the shader, the raw input is a set of sample offsets (vec2 array) and a set of coefficients (float array). Calculating the offsets and coefficients would be done a single time (in TypeScript/JavaScript during filter construction.

Each filter needing a different number of samples would be created with DEFINEs. A 5-sample 1x9 kernel I think could use the same DEFINEs as a 3x3 filter.

I’m not sure how it would interact with trilinear sampling, mipmaps, lod, etc.

I think it would speed up any filter using convolution.

Is there any interest in testing a custom postprocessor (not yet written)?

Cedric · January 29, 2026, 3:31pm

IIRC there is already a reduced number of taps for blur in the engine. I’m sure @sebavan remembers

sebavan · January 30, 2026, 7:37pm

This is what we use for our shadows for instance Babylon.js/packages/dev/core/src/Shaders/ShadersInclude/shadowsFragmentFunctions.fx at master · BabylonJS/Babylon.js · GitHub

HiGreg · February 8, 2026, 7:59am

That’s awesome! I am not improving shadowsFragment. I saw the webpage referenced in the source code, but am using a different technique. In the generalized case, kernel elements have no pre-determined relationship. To apply bilinear sampling in both directions relies on 4 surrounding elements participating in each bilinear sample to have proportional ratios: left to right ratios are the same (or close enough) on both the top and bottom, and top to bottom ratios are the same on the left and right. The blur algorithm also ignores the outside kernel elements because they have minimal impact to the output. And because of the symmetry and algorithm, it doesn’t need a specific singular center sample.

My algorithm is different in that

It handles a general kernel with any coefficients
pre-calculates samples required based on specified kernel values.
I only group elements pairwise (not in groups of 4).
Can reproduce the kernel requested within numeric precision.
Because of limitions with bilinear sampling, adjacent oppositely-signed elements will convert to 2 samples instead of one.
Loop count required as a compile-time constant means changing a kernel may require a shader recompile.
Kernel element pairs that are both zero are not sampled at all.

I should probably create and test a post processor implemention soon. I just discovered the “opposite sign” issue in the last week. I hope there aren’t more errors in my assumptions or calculations.

Topic		Replies	Views
Allow using BlurPostProcess with source texture dimensions only Feature requests	3	429	April 26, 2022
How to make fragment shader texture sample (texture2D function) faster? if all mesh sample on same texture Questions material , shader	6	821	July 4, 2023
Suggestions for blurring a DynamicTexture used in NodeMaterial? Questions	5	522	July 30, 2021
What is the exact procedure to do the SuperSampling and DownSampling with DefaultRenderingPipeline Questions	10	932	March 31, 2021
Flickering when updating kernel of BlurPostProcess Questions	6	449	November 21, 2022

Faster convolution or filter PostProcess?

Related topics