I have worked out the equations for converting a convolution kernel to require fewer texture samples, taking advantage of (and requiring) bilinear sampling. For a linear kernel (single row or column) it reduces texture samples from kernel_size samples to (kernel_length + 1) / 2 samples.
I have also worked out the equations for 3x3 convolution filter to require 5 texture samples instead of 9.
To minimize the number of repeated identical calulations within the shader, the raw input is a set of sample offsets (vec2 array) and a set of coefficients (float array). Calculating the offsets and coefficients would be done a single time (in TypeScript/JavaScript during filter construction.
Each filter needing a different number of samples would be created with DEFINEs. A 5-sample 1x9 kernel I think could use the same DEFINEs as a 3x3 filter.
I’m not sure how it would interact with trilinear sampling, mipmaps, lod, etc.
I think it would speed up any filter using convolution.
Is there any interest in testing a custom postprocessor (not yet written)?