Has anyone experienced this bug in real code?
I’ve been looking through code related to partial updates to VertexBuffer.
In particular, I’ve read the MDN Docs and the WebGPU specification docs and found that the MDN Docs related to GPU.Queue.writeBuffer are wrong with respect to alignment requirements.
This post is a just a reminder for me to delve deeper when I get some time later.
Lines around 85 of webgpuBufferManager.ts
The code protects against alignment of the byteLength, and copies the buffer twice if byteLength is not a multiple of 4.
The first copy appears to be a mistake, because it assigns a variable named tempView with a TypedArray slice. Slice actually does not create a view, it copies the array to a new array. What was probably intended was
const tempView = new Uint8Array(src.buffer, chunkStart, chunkEnd);
The code checks for alignment of the data length and copies extra zeros if needed to pad the copy to a 4-byte boundary.
If you have a copy of the GPUBuffer in memory and want to update 2 bytes, the current code will zero out two additional bytes in the GPUBuffer. This seems bad. Also if you try to update the second pair of bytes in the buffer, I expect the current code to fail because, according to the WebGPU specification, the dstOffset must also be aligned.
Better, I think, would be to extend both the beginning and the end of the range to copy additional data from the source to the destination. If the source is just a full copy of the GPUBuffer, then the result would at least not overwrite the GPUBuffer with zeros.
I haven’t worked out the full replacement code, but the last bit does something like this:
// we might copy more than requested to make sure the write is aligned
startPre = dstOffset & 3 // use 3 for 4-byte alignment. Use 7 for 8-byte alignment.
srcByteOffset -= startPre
dstByteOffset -= startPre
byteLength = (byteLength+startPre+3) & ~3
this._device.queue.writeBuffer(buffer, dstByteOffset, src, srcByteOffset, byteLength); // "offset" was removed because it's 0
This avoids multiple CPU-side copies of the data.
If writeBuffer is limited to 1024 * 1024 * 15 bytes, then the loop needs to be modified as well.