Frustum check in simd

For bounding boxed aligned to 4 floats (16 bytes)
Depends on cglm

AVX Impl
CGLM_INLINE
bool
avx_aabb_frustum(vec4 min, vec4 max, vec4 planes[6]) {
  float dp;
  int    i;
  glmm_128 minv, maxv, plane, zero, sign, tmp;
  minv = glmm_load(min);
  maxv = glmm_load(max);
  minv[3] = 1.f;
  maxv[3] = 1.f;
  zero = glmm_set1(0.f);

  for (i = 0; i < 6; i++) {
    plane = glmm_load(planes[i]);
    sign = plane > zero;
    tmp = _mm_blendv_ps(minv, maxv, sign);
    dp = glmm_dot(plane, tmp);

    if (dp < 0)
      return false;
  }

  return true;
}
Wasm Impl
bool
aabb_frustum(vec4 min, vec4 max, vec4 planes[6]) {
  float dp;
  int    i;
  glmm_128 minv, maxv, plane, zero, sign, tmp;
  minv = glmm_load(min);
  maxv = glmm_load(max);
  minv = wasm_f32x4_replace_lane(minv, 3, 1.0f);
  maxv = wasm_f32x4_replace_lane(maxv, 3, 1.0f);
  zero = glmm_set1(0.f);

  for (i = 0; i < 6; i++) {
    plane = glmm_load(planes[i]);
    sign = wasm_f32x4_gt(plane, zero);
    tmp = wasm_v128_bitselect(maxv, minv, sign);
    dp = glmm_dot(plane, tmp);

    if (dp < 0)
      return false;
  }

  return true;
}

Wasm Impl with deps (loop unrolled):

The same algo in scalar js

three.js/src/math/Frustum.js at 05dbc5d9f24d290a80173b218b7b8535015674df · mrdoob/three.js · GitHub

2 Likes

To make it worthwhile, you’d have to calculate all the bounding boxes at once in a loop, as I think there are hidden costs to using/calling wasm code (?).

Also, I’m not sure we currently have a frustum check bottleneck, but that may depend on your scene.

Yes, if calling wasm once for each mesh, the algo improvement might not cover the cost of js-wasm calls, like context switching, and call conversions. But it would worth if doing in batches, like, call per 1k or 2k meshes.
Also, since it only need min/max in world space, instead of all 8 corners, the transformation of bounding boxes can be simplified, like cglm’s impl:

It’s strightforward to convert this impl to simd by replacing all vec3 to vec4 (and align pointers), so transformation and check can all be in simd, this should give more performance boost.

This can at least make thinInstanceRefreshBoundingInfo faster with a lot of thin instances.

1 Like