Hi everyone! Sorry but I couldn’t post this latest part in the original pathtracing topic because it says I had exceeded my new user limit of 3 posts to the same topic. So I opened up this new topic in hopes of continuing my renderer implementation details. For earlier parts, please see the older ‘path tracing in Babylon.js’ question.
Part 2 - Implementation
So remember when I said that this would be a pure ray/path tracer without the need for traditional rasterization of triangles? Well, that may have not been an entirely true statement.
You see, we need a way to quickly address and calculate every pixel on a webpage, being that this project is meant to run inside the browser. So how do I draw a single colored pixel inside the browser, let alone an entire screen of them?
However, if speed is an important consideration (and it is!), then we must go the WebGL route. Unfortunately, there is not an easy way to directly paint individual pixels using pure screen memory addressing inside WebGL (that’s why I’m glad there is at least Canvas, to allow me to feel like I’m back in the 1980’s, ha!).
If we want to have access to any GPU acceleration, then we have to use the WebGL system, on which modern libraries like Babylon.js and Three.js are based. And in order to use it, the WebGL system requires us to supply vertices to a vertex shader and to supply a fragment shader to color all the pixels that are inside the triangles that get rasterized. So, the only way to write pixels to the screen fast with GPU acceleration is to first supply triangles to work with, then we can directly manipulate the pixels that are inside those triangles. Well, if I want access to every pixel on the screen, what if I just stretched a couple of huge back to back triangles to cover the entire screen? And that is exactly what some smart person long ago thought of doing and it works perfectly!
So basically our task is to make the most simple screen-shaped mesh possible - a full screen quad consisting of no more than 2 large triangles. And since the 2 triangles will completely cover the viewport, when they get rasterized in the traditional manner, the WebGL system will have to end up giving us access (through the fragment shader) to every pixel on the screen!
In Three.js, and I’m almost positive in Babylon.js also, we accomplish this by creating the most simple Plane Mesh (2 triangles max), and giving it a Shader Material. Then what I did is create an Orthogonal camera and position (rotate up if necessary) the plane to face the viewer like a movie screen. When it gets processed by the vertex shader (4 vertices, one in each screen corner), we can use the GLSL built-in gl_FragCoord to access the exact pixel that the GPU is working on during that frame of animation. What’s neat is, since the GPU is parallel, we can write one ray tracer and the fragment shader will call that same ray tracer for every pixel (possibly millions) at essentially the same instant in time!
I find it sort of amusing that our big movie quad that renders our 3D raytraced scene is almost like a facade or trick - like the old cowboy western shows where they use a flat set facade for all the buildings that line a street of an old fashioned western ghost town, ha! But this is no more a trick than a 3D scene projected in the traditional way as triangles on a 2d camera viewport.
There is one catch - Webgl shaders from animation frame to frame do not retain any pixel memory. So if you wanted to do blending/noise reduction, you cannot ask what the previous color was for that same pixel location on the previous animation frame. All that info is lost at 60 times a second. And due to the randomness inherent in path tracing (I will go into more detail in a future post), each image comes back noisy and slightly different noise-wise from frame to frame. If you just render with no pixel memory/pixel history like that, the raw and fast moving, never-converging noise will be very distracting at 60 fps. So in order to smooth out the rendering over time (aka Progressive Rendering), and smooth out moving objects and a possible moving camera, another separate full screen quad is necessary. This second quad’s only job is to quickly copy whatever is on the screen from the first animation frame. It’s full screen copy image is fed back into the original path tracing fragment shader as a large GPU texture uniform, and the path tracer now has a starting point to blend with before it displays the 2nd frame to the screen. In other words, the path tracing fragment shader now suddenly ‘remembers’ what it rendered just before. This extra quad is sometimes referred to as a copy shader - that’s its only purpose. And this saving and feeding back into the original shader is sometimes called a ping pong buffer.
If you keep feeding the image back into the path tracer every frame, and average/divide the results by how many frames you’ve rendered, over time the image will magically converge before your eyes! And if you need a dynamic camera and don’t care too much about obtaining a perfect ‘beauty render’ , you can still use this pixel history buffer to slightly blur/smooth out the camera’s changing view as well as any dynamically moving scene objects. If it is the later, it is sometimes given the derogatory name ‘ghosting’, but I find that if used responsibly, it really helps a dynamic scene and the end user won’t mind too much, or may not even care. Check out my outdoor environment demos or Cornell box filled with moving water demos to see this technique in action.
Well, this post is already long enough, so I’ll be back with further implementation details in the next post. See you then!