Hi guys!
There is quite a difference reading the result from the storage buffer using babylon.js and w/o it. Does anybody know what adds the overhead in babylon.js comparing to the ‘native’ approach? Can it be because of the running renderLoop
in babylon.js?
Another question: if I stop the renderLoop
I can’t read the buffer. Is there a way to run a compute shader and read it’s output w/o the renderLoop
running?
babylon.js: 6.1830078125 ms
w/o babylon.js: 3.323974609375 ms
Running on a MacBook M1 Max.
This babylon.js code:
const valueCount = 20000;
const inputDataArray = [];
for (let i = 0; i < valueCount; i++) {
inputDataArray.push(i);
}
const inputData = Float32Array.from(inputDataArray);
const cs = new ComputeShader(
"doubleValuesCompute",
engine,
{ computeSource: computeShaderSource },
{
bindingsMapping: {
values: { group: 0, binding: 0 },
},
}
);
const bufferValues = new StorageBuffer(engine, inputData.byteLength);
bufferValues.update(inputData);
cs.setStorageBuffer("values", bufferValues);
await cs.dispatchWhenReady(inputData.length);
await cs.dispatch(inputData.length);
console.time("read the buffer");
const bufferValuesView = await bufferValues.read();
const results = new Float32Array(bufferValuesView.buffer);
console.timeEnd("read the buffer");
Compute shader:
@binding(0) @group(0) var<storage, read_write> values : array<f32>;
@compute @workgroup_size(1)
fn main(@builtin(global_invocation_id) gId : vec3<u32>) {
let index : u32 = gId.x;
values[index] = values[index] * 2.0;
}
Reads the output in:
read the buffer: 5.5830078125 ms
This code w/o babylon.js:
const adapter = await navigator.gpu?.requestAdapter();
const device = await adapter?.requestDevice();
if (!device) {
fail('need a browser that supports WebGPU');
return;
}
const module = device.createShaderModule({
label: 'doubling compute module',
code: `
@group(0) @binding(0) var<storage, read_write> data: array<f32>;
@compute @workgroup_size(1) fn computeSomething(
@builtin(global_invocation_id) id: vec3<u32>
) {
let i = id.x;
data[i] = data[i] * 2.0;
}
`,
});
const pipeline = device.createComputePipeline({
label: 'doubling compute pipeline',
layout: 'auto',
compute: {
module,
entryPoint: 'computeSomething',
},
});
const valueCount = 20000;
const inputDataArray = [];
for (let i = 0; i < valueCount; i++) {
inputDataArray.push(i);
}
const input = Float32Array.from(inputDataArray);
// create a buffer on the GPU to hold our computation
// input and output
const workBuffer = device.createBuffer({
label: 'work buffer',
size: input.byteLength,
usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC | GPUBufferUsage.COPY_DST,
});
// Copy our input data to that buffer
device.queue.writeBuffer(workBuffer, 0, input);
// create a buffer on the GPU to get a copy of the results
const resultBuffer = device.createBuffer({
label: 'result buffer',
size: input.byteLength,
usage: GPUBufferUsage.MAP_READ | GPUBufferUsage.COPY_DST,
});
// Setup a bindGroup to tell the shader which
// buffer to use for the computation
const bindGroup = device.createBindGroup({
label: 'bindGroup for work buffer',
layout: pipeline.getBindGroupLayout(0),
entries: [
{ binding: 0, resource: { buffer: workBuffer } },
],
});
// Encode commands to do the computation
const encoder = device.createCommandEncoder({
label: 'doubling encoder',
});
const pass = encoder.beginComputePass({
label: 'doubling compute pass',
});
pass.setPipeline(pipeline);
pass.setBindGroup(0, bindGroup);
pass.dispatchWorkgroups(input.length);
pass.end();
// Encode a command to copy the results to a mappable buffer.
encoder.copyBufferToBuffer(workBuffer, 0, resultBuffer, 0, resultBuffer.size);
// Finish encoding and submit the commands
const commandBuffer = encoder.finish();
device.queue.submit([commandBuffer]);
// Read the results
console.time('read the buffer')
await resultBuffer.mapAsync(GPUMapMode.READ);
const result = new Float32Array(resultBuffer.getMappedRange().slice());
resultBuffer.unmap();
console.timeEnd('read the buffer')
reads the buffer nearly two times faster:
read the buffer: 3.323974609375 ms
Thank you!