Support half-float (float16) vertex attribute type ( float16x2/float16x4 on WebGPU)

Hi team,

I have some prebuilt game asset files which happen to have half-float vertex format. Eg. They use 3 float16 for position, 3 bytes for tangent, etc. I want to directly upload those interleaved vertex buffers to WebGPU (without doing CPU side float conversion) but some patches are needed.

  1. Extend GetTypeByteLength(type) , VertexBuffer.HALF_FLOAT and Constants.HALF_FLOAT

  2. Ideally, we would extend the buffer.align.pure.js to support Float16Array so that VertexBufferGetDataType works(inferring vertex format from the underlying data structure). But that requires tool chain supports. The best we can have right now is to let user know “HALF_FLOAT must be passed explicitly; it is never inferred from the data.”

  3. Modify _computeHashCode() . The only reason to modify it is I chose to use the established value Constants.HALF_FLOAT=5131 which is the enum value of GL_HALF_FLOAT. And the hash algorithm only leaves 3 bits for all the vertex types. In WebGL it happens to be ok because those values range from 5120 to 5126. And in WebGL2 half float is added with a odd number 5131.

  4. Patch buffer.align.pure.js _alignBuffer

  5. Patch checkNonFloatVertexBuffers so that half float doesn’t trigger shader recompile.

  6. Patch a few functions in bufferUtils . One thing to note is that the original GetTypeArrayData promotes values to float and then does implicit conversion back to underlying TypedArray, which works for all previous cases but not half float because due to the tool chain limits, the half float data is represented by underlying Uint16Array. So I changed this part to enable byte to byte copy without implicit conversion.

  7. Patch webgpuCacheRenderPipeline.ts _GetVertexInputDescriptorFormat

PR link: Add HALF_FLOAT vertex buffer type support by wiiskii · Pull Request #18524 · BabylonJS/Babylon.js · GitHub

And I have questions on the original _computeHashCode

    private _computeHashCode(): void {
        // note: cast to any because the property is declared readonly
        (this.hashCode as any) =
            (VertexBuffer._GetTypeHashIndex(this.type) << 0) +
            ((this.normalized ? 1 : 0) << 3) +
            (this._size << 4) +
            ((this._instanced ? 1 : 0) << 6) +
            /* keep 5 bits free */
            (this.byteStride << 12);
    }

this._size is the number of components. It ranges from 1 to 4 and when it is 4, it carries to bit 6 and interferes with the instanced bit. And if it was instanced, the carry keeps propagates. Is that intended or a bug?

3 Likes

This is more complicated than I was expecting.

Existing situations:

  1. Conversion algorithm in textureTools.ts (if-based, roundTiesToAway) has a bug.
  2. Conversion algorithm in exrLoader.core.ts (table-based, roundTowardZero) throws on values greater than 65504) was too strict, making the following clamp useless.

Looking at three.js implementation, they use the same algorithm as the one in exrLoader.core.ts, but warn instead of throw. The algorithm is from http://www.fox-toolkit.org/ftp/fasthalffloatconversion.pdf

There are two subtle issues in both implementation:

  1. Original fox-toolkit algorithm flush any fp32 who has (exponent-127<-24) to zero. The two implementations both have a condition of (exponent-127<-27) . Other repo GitHub - petamoriken/float16: ES2025 float16 (IEEE 754 half-precision floating-point) ponyfill/proposal since 2017 · GitHub (perhaps this is the source?) fixed this but neither three.js nor exrLoader merged the fix. Fix algorithm typo for float16 conversion table calculations by almic · Pull Request #1188 · petamoriken/float16 · GitHub The typo has no harm(surprisingly) to the result but apparently incorrect.

  2. The fox-toolkit algorithm can convert a sNaN bit pattern to Inf in some cases, but in JS this is masked because every time we get a number from float32array it quiets the signalling bit. So the algorithm never sees a sNaN.

So which rounding do we choose?

Ideally, round-to-nearest-even(RNE). It is IEEE754 default and makes most sense because everywhere else is using this type of rounding. But to add RNE to this table based method is not trivial and surprisingly (another) slow.
A test on my M2 shows that petamoriken/float16 implementing correct RNE on top of fox-toolkit table based algorithm is 13 times slower than native conversion and 11 times slower than raw fox-toolkit algorithm to convert 2^32 fp32 numbers to fp16.

I believe that a roundTowardsZero, table-based algorithm is the best we can have at the moment. After Node 24 is introduced in this repo, we can move to native Float16Array totally with little (I hope it is invisible) friction on the rounding method change.

cc @Evgeni_Popov to get his thoughts (be patient, he is on deserved vacations:))

Thanks Deltakosh! PR Merged.

1 Like

Thank YOU!

1 Like