Currently instances and thin instances use 4 vec4 buffers for instanced matrices.
But as the matrices are composed from TRS, or multiplied with other matrices with TRS, the last row of it should always be 0, 0, 0, 1, and this last row is also transfered into the gpu.
If this last row is skipped, and reconstructed on gpu side, ~25% of VRAM used by instancing could be saved.
For performance,since opengl matrices are column-major, so removing last row would reduce the change of auto-vectorization by js engines, but I can not tell the exact impact unless some benchmarks are made.
For public apis like thinInstanceSetBuffer, more copy could be needed to copy every 4 vec4 to 4 vec3.
I know that this could break custom shaders with customized handling for instances, and does not expect this to land right now, just leave it as some open discussion, so feel free to move it to the “correct” category if I missed something.
attribute vec4 world0;
attribute vec4 world1;
attribute vec4 world2;
void main(void) {
vec4 instance0 = vec4(world0.xyz, 0);
vec4 instance1 = vec4(world0.w, world1.xy, 0);
vec4 instance2 = vec4(world1.zw, world2.x, 0);
vec4 instance3 = vec4(world2.yzw, 1);
mat4 instanceWorld = mat4(instance0, instance1, instance2, instance3);
}
