Path Tracing with Babylon, Background and Implementation

Hello everyone! My name is Erich and I’m the creator of the threejs-PathTracing-renderer . A couple of weeks ago @PichouPichou reached out to me and wanted to know if I was interested in helping him and others port my renderer over to Babylon.js. I replied, “I’m in!”.

@PichouPichou began a topic in the question forum - here is the link to the original discussion where I joined in progress: Path-tracing in BabylonJS . Since some dedicated Babylon coders were already beginning work on the port, I wanted to use this new thread to share my project’s background, as well as give an overview of how I got it to work using a WebGL library (three.js) and the browser. It wasn’t so much for the benefit of the people who were already contributing to the port source, but for those individuals who might have an interest or curiosity about ray tracing / path tracing and who would like to learn more about it. It was my hope that some of those folks could benefit from my details (and trials and errors, ha), and maybe want to try their own hand at ray/path tracing inside Babylon.js (when the port to Babylon is done). My only warning though is: once you start coding ray tracers, it’s hard to stop! It really is addictive. I should know, I’ve been at it on my project for over 5 years and counting! :smiley:

In these first 3 posts, you might see some duplication between that original questions thread and this new one. But I wanted to have everything under this same roof, so to speak. What follows is some history and background of the ray tracing world, and shortly after that, I start getting into the overview and details of how I accomplished it (and how you can too!) using WebGL for any device that has a modern browser, even smartphones.

As always, if anyone has any questions/comments about what I’m writing, or maybe if they would like to see me go in a direction that they are interested in (i.e. Monte Carlo rendering, or Bi-Directional path tracing), please feel free! I’ll be the first to admit I don’t know everything about ray and path tracing (the subject is so large, it would take a lifetime!), but I’ll do my best to explain things in a clear and consistent manner, and will maybe offer tips on how I learned more about the subject, or came to be able to understand certain aspects of rendering. Full disclosure: I am a hobbyist game/graphics coder, so I have difficulty all the time understanding tracing algorithms, code, and sometimes even the main concepts at first! :smiley:

But once I come to grips with understanding aspects of rendering, I feel like I can share my experience and newfound knowledge with others. So I hope you enjoy the following posts about the world of ray/path tracing and implementing it in the broswer (and hopefully soon with Babylon!).


Part 1 - Background

I’ll begin with some of the differences between traditional rendering and ray tracing. Traditional 3d polygon rendering makes up probably 99.9% of real-time games and graphics, no matter what system or platform you’re on. So if you’re using DirectX, openGL, WebGL2, software rasterizer, etc., you probably will do it/see it done this way. In a nutshell, vertices of 3d triangles are created on the host side (Javascript, C++, etc.) and sent to the graphics card via a vertex shader. The graphics card (from here on referred to as ‘GPU’), takes the input triangle vertex info and usually performs some kind of matrix transformation (perspective camera, etc.) on them to project the 3d vertices to 2d vertices on your screen. They are then fed to the fragment shader which rasterizes (fills in or draws) any flattened triangles’ pixels which are contained inside them.

The major benefit in doing it this way for several decades now, is that as GPUs got more widespread, they were built for this very technique and are now mind-numbingly fast at rendering millions of triangles on the screen at 60fps!

One of the major drawbacks to this seemingly no-brainer fast approach is that in the process of projecting the 3d scene onto the flat screen as small triangles, you lose global scene information that might be just outside the camera’s view. Most importantly, you lose lighting and shadow info from surrounding triangles that didn’t make the cut to get projected to the user’s final view, and there is no way of retrieving that info once the GPU is done with its work. You would have to do an exhaustive per-triangle look-up of the entire scene, which would make the game/graphics screech to a halt. Therefore, many techniques have been employed and much research has been done to make it seem to the user that none of the global Illumination and global scene info had been lost. Shadow maps, screen-space reflections, screen-space ambient occlusion, reflection probes, light maps, spherical harmonics, cube maps, etc. have all been used to try and capture the global scene info that is lost if you choose to render this way. Keep in mind that all these techniques are ultimately ‘fake’ and are sort of ‘hacks’ that are subject to error and require a lot of tweaking to get to look good for even one scene/game - then everything might not look right for the next title and the process begins again. Optical phenomena such as true global reflections, refractions through water and glass, caustics, transparency, become very difficult and even impossible with this approach. BTW this is known as a scene-centric or geometry-centric approach. In other words, vertex info and scene geometry take the front seat, global illumination and per pixel effects take a back seat.

Now another completely different rendering approach has been around the CG world for nearly as long - ray tracing. If the rasterizing approach was geometry-centric, then this ray tracing approach can be classified as pixel-centric. In this scenario on a traditional CPU, the renderer looks at 1 pixel at a time and deals with that pixel and only that pixel, then when it is done, it moves over to the next pixel, and the next, etc. When it is done with the bottom right pixel on your screen, the rendering is complete. Taking the first pixel, rather than giving it a vertex to possibly fill in or rasterize, the renderer asks the pixel, “What color are you supposed to be?”. So in this pixel-centric approach, the pixel color/lighting takes a front seat, and then the scene geometry is queried once it is required. The pixel answers, “Well let me shoot out a straight ray originating from the camera through me (that pixel of the screen viewing plane) out into the world and I’ll let you know!” Let’s say it runs into a bright light or the sun, great - a bright white or yellowish white color is recorded and we’re done! - well with that one pixel. Say the next pixel’s view ray hits a red ball. Ok, we have a choice - we can either record full red and move on to the next pixel and thus having a boring non-lit scene, or we could keep querying further for more info: like, “now that we’re on the ball, can you see the sun/light source from this part of the red ball?” If so we’ll make you bright red and move on, if you’re line of sight to the light is blocked, we’ll make you a darker red. How do we find this answer? We must spawn another ray out of the red ball’s surface, aim it at the light, and again see what the ray hits. If we turn the red ball into a mirror ball, a ray must be sent out according to the simple optical reflection formula, and then see what the reflected ray hits. If the ball was instead made out of glass, at least 2 more rays would need to be sent, one moving into the glass sphere, and one emerging from the back of the sphere out into the world behind it, whatever might be there. Although more complex, there are well known optical refraction formulas to aid us. But the main point is that with this rendering approach, the pixels are asked what they see along their rays, which might involve a complex path of many rays for one pixel, until either a ray escapes the scene, hits a light source, or a maximum number of reflections is reached (think of a room of mirrors - it would crash any computer if not stopped at some point). From the very first ray that comes from the camera, it has to query the entire scene to find out what it runs into and ‘sees’. Then, if it must continue on a reflected/refracted/shadow-query path, it must at each juncture query the entire scene again! No hacks or assumptions can be made. The whole global scene info must be made available at every single turn in the ray’s life.

As far as we know, this is how light behaves in the real world. Although a major difference is that everything in the real world happens in reverse: the rays spawn from the light source, out into the scene, reflecting/refracting, and only a tiny fraction is able to successfully ‘find’ and enter that first pixel in your camera’s sensor. However, this is a computational nightmare because the vast majority of light rays are ‘wasted’ - they don’t happen to enter that exact location of your camera’s single pixel. So we do it in reverse: loop through every viewing plane pixel (which is infinitely more efficient computationally) and instead send a ray out into the world hoping that it’ll eventually hit objects and ultimately a light source. Therefore the field of ray tracing is actually ‘backwards’ ray tracing.

The benefits of rendering this way are that, since we are synthesizing the physics of light rays and optics, images can be generated that are photo-realistic. Sometimes it is impossible to tell if an image is from a camera or a ray/path tracer. Since at each juncture of the ray’s path, the entire ‘global’ scene must be made available and interacted with, this technique is often referred to as global Illumination. All hacks and approximations disappear, and once difficult or impossible effects such as true reflections/refractions, area-light soft shadows, transparency, etc become easy, efficient, and nearly automatic. And these effects remain photo-realistic. No errors or tweaks on a per-scene basis necessary.

This all sounds great, but the biggest drawback to rendering this way is speed. Since we must query each pixel, and info cannot be shared between pixels (it is a truly parallel operation with individual ray paths for each individual pixel), we must wait for every pixel to run through the entire scene, making all the calculations, just to return a single color for 1 pixel out of a possible 1080p monitor. This is CPU style, by the way - the GPU hasn’t entered, yet. Many famous renderers and movies have been made using this approach, coded in C or C++, and run traditionally on the CPU only.

Then fairly recently, maybe 15 years ago, when programmable shaders for GPUs became a more widespread option, graphics programmers started to use GPUs to do some of the heavy lifting in ray tracing. Since it is an inherently parallel operation, and GPUs are built with parallelism in mind, the speed of rendering has gone way up. Renderings for movies that once took days, now take hours or even minutes. And in the cutting edge graphics world, ray tracing reflections and shadows (and to lesser extent, path tracing with true global Illumination), has even become real time at 30-60 fps!

My inspiration for starting the path tracing renderer was being fascinated by older Brigade 1 and Brigade 2 YouTube videos. Search for the Sam Lapere channel, who has 173 insane historic videos. He and his team (which included Jacco Bikker at one point) were able to make 10-15 year old computers approach real time path tracing. Also I was inspired by Kevin Beason’s smallPT: that he could fit a non real-time CPU renderer complete with global Illumination, into just 100 lines of C++! I used his simple but effective code to start with and moved it to the GPU with the help of Webgl and the browser. In the next part, I’ll get into details of how I had to set up everything in Webgl (with the help of a Javascript library) in order to get my first image on the screen inside of a browser.

So in conclusion, this was a really long way of saying that when you choose to go down this road, you are going against the grain, because GPUs were meant to rasterize as many triangles as possible in a traditional projection scheme. However, with a lot of help of some brilliant shader coders / ray tracing legends of yesteryear out there, and a decent acceleration structure, the dream of real time path tracing can be made into reality. And if you’re willing to work around the speed obstacles, you can have photo-realistic images (and even real time renderings) inside any commodity device with a common GPU and a browser! Part 2 will be coming soon! :slight_smile:


Part 1A - Hybrid Rendering Overview

Before I go into the implementation details, I forgot to mention in the last post that if you don’t want to do either pure traditional rasterization, or pure ray tracing (like the path I took), you can actually use elements of both seemingly incompatible rendering approaches and do a hybrid renderer. In fact, if you’ve been keeping up to date with all the latest Nvidia RTX ray tracing demos and games that use these new graphics cards’ ray tracing abilities, then you’ve already seen this in action!

What the current 2020 RTX-enabled ray tracing apps are doing is a blend of rasterization and ray tracing. Remember I mentioned in the previous post that if you do pure ray tracing from scratch like we are doing, you have to sort of go against the grain because older graphics cards weren’t really designed with that in mind. Their main purpose is to rasterize polygons to the screen as fast as possible. Knowing this, Nvidia chose to roll out ray tracing to the masses using the already well-developed triangle rasterization of its own cards, and blast the scene to the screen on the first pass at 60 fps (which is totally reasonable nowadays, no matter the complexity).

After the triangles are traditionally projected (mostly unlit) to the screen, the specialized ray tracing shaders kick in to provide true reflections, refractions, and pixel perfect shadows without the need for shadow maps. The advantages of doing things this way are that on the first pass, you are working in line with the graphics card at what it does best, and only those triangles that made the cut to the camera’s view need be dealt with when later ray tracing. In other words, blank pixels that will ultimately be the sky or background, or triangles that developers choose not to have any ray tracing effects done on them, can be skipped and thus the expensive ray tracing calculations for those pixels can be avoided entirely.

This is almost the best of both worlds when you first hear about it/ see it. But in actuality, you are 'kicking the can down the road" and will eventually have to deal with ray tracing’s speed bottlenecks at some point. Once the polys are blasted to the screen, great - but now you must make the entire scene geometry with all its lights and millions of triangles, some of which may be just offscreen, available to each of the millions of incoherent rays at every twist and turn of their various paths into their surrounding environment. This part is just as expensive as our pure ray tracer. Unless you have some kind of acceleration structure like a good BVH, the game/app will grind to a halt while waiting on the ray traced color and lighting info to come back for the pixels that cover the scene’s rasterized triangles. Another disadvantage is that in order to use the GPU this way, all your scene geometry must be defined in the traditional way: as triangles. You can’t really do pure pixel perfect mathematical shapes like my geometry demos, or realistic camera depth of field, and you can’t do ray marching like clouds and water and other outdoor fractal shapes, unless they are all triangulated, which could add stress to the already busy BVH.

Nvidia has done a good job at providing tools and hardware inside their new GPUs to make the bottlenecked ray tracing part go faster. They have matrix processors for doing many repetitive math operations needed for ray tracing. They also have processors devoted only to maintaining a dynamic BVH every animation frame. The scene and characters can even move inside the BVH - that part is pretty amazing. Also they have AI-trained noise reduction software to aid in smoothing out the soft shadows and glossy reflections. That part is a whole other level - I don’t even know how they do that magic.

So with all that background, I chose to do pure ray/path tracing. I chose to give up the quick rasterization pass that Nvidia and the like enjoy, but with the advantage that I can manipulate the very first rays cast out into the scene from the camera. True depth of field, pixel accurate pure math shapes, no need for switching shaders half way through the animation frame, and the ability to do outdoor fractal non-triangular shapes, all become possible, and in some cases, even trivial (like our DoF in 6 lines!).

One last note before moving on to our implementation overview in the next post: I have been using the word ray tracing and not path tracing when referring to Nvidia’s initial RTX demos and the AAA games that have started to use their technology. Most of those real time renderings are doing specular ray tracing only, as kind of a bonus feature: like mirrors, metal, glass, and puddle reflections and shadows. But that is not true full global Illumination like we are attempting. The closest demos to reaching this goal while using RTX that I can think of are the recent Minecraft RTX demos and the slightly older Quake Path Traced demos. Minecraft was a good test bed and starting point for real time GI because ray-box intersection routines are very efficient and that is what makes up nearly all of Minecraft geometry, which exactly matches the BVH bounding boxes. Quake was another good first test of game path tracing because the polycount is low and the environments are less detailed than more modern titles. Still, both are very popular on YouTube and both show promise for the future, when we will have movie quality lighting and GI moving at 30 to 60 fps!


Part 2 - Implementation

So remember when I said that this would be a pure ray/path tracer without the need for traditional rasterization of triangles? Well, that may have not been an entirely true statement. :wink:
You see, we need a way to quickly address and calculate every pixel on a webpage, being that this project is meant to run inside the browser. So how do I draw a single colored pixel inside the browser, let alone an entire screen of them?

There are 2 main options for drawing/addressing individual pixels inside the browser: the Canvas API and WebGL. A while back, just for fun, I coded up a complete ray/path tracer with progressive rendering inside the Canvas API only - no libraries, just pure Javascript and a simple HTML shell. I recently uploaded it as a new Github repo. Check it out - CanvasTracer I created all the math and tracing routines from scratch in Javascript.

This might sound cool, but there’s only one little problem with it… it runs at only 1 fps!! Ha ha. This is because I am using raw Canvas without GPU acceleration and doing all the routines in Javascript rather than GLSL. But I just wanted to show that it is possible to create a software CPU renderer in a webpage. Also, it matches very closely the algorithms and ray tracing utility functions that are in place with my current Three.js pathtracing renderer. So if anyone is interested in just seeing how the ray tracing side works without any other extraneous code, I thought maybe some folks would enjoy looking at the source. :slight_smile: If anyone wants in the future, I could do a post just about pure ray tracing, the kind that makes this fun little software renderer work.

However, if speed is an important consideration (and it is!), then we must go the WebGL route. Unfortunately, there is not an easy way to directly paint individual pixels using pure screen memory addressing inside WebGL (that’s why I’m glad there is at least Canvas, to allow me to feel like I’m back in the 1980’s, ha!).

If we want to have access to any GPU acceleration, then we have to use the WebGL system, on which modern libraries like Babylon.js and Three.js are based. And in order to use it, the WebGL system requires us to supply vertices to a vertex shader and to supply a fragment shader to color all the pixels that are inside the triangles that get rasterized. So, the only way to write pixels to the screen fast with GPU acceleration is to first supply triangles to work with, then we can directly manipulate the pixels that are inside those triangles. Well, if I want access to every pixel on the screen, what if I just stretched a couple of huge back to back triangles to cover the entire screen? And that is exactly what some smart person long ago thought of doing and it works perfectly!

So basically our task is to make the most simple screen-shaped mesh possible - a full screen quad consisting of no more than 2 large triangles. And since the 2 triangles will completely cover the viewport, when they get rasterized in the traditional manner, the WebGL system will have to end up giving us access (through the fragment shader) to every pixel on the screen!

In Three.js, and I’m almost positive in Babylon.js also, we accomplish this by creating the most simple Plane Mesh (2 triangles max), and giving it a Shader Material. Then what I did is create an Orthogonal camera and position (rotate up if necessary) the plane to face the viewer like a movie screen. When it gets processed by the vertex shader (4 vertices, one in each screen corner), we can use the GLSL built-in gl_FragCoord to access the exact pixel that the GPU is working on during that frame of animation. What’s neat is, since the GPU is parallel, we can write one ray tracer and the fragment shader will call that same ray tracer for every pixel (possibly millions) at essentially the same instant in time!

I find it sort of amusing that our big movie quad that renders our 3D raytraced scene is almost like a facade or trick - like the old cowboy western shows where they use a flat set facade for all the buildings that line a street of an old fashioned western ghost town, ha! But this is no more a trick than a 3D scene projected in the traditional way as triangles on a 2d camera viewport. :wink:

There is one catch - Webgl shaders from animation frame to frame do not retain any pixel memory. So if you wanted to do blending/noise reduction, you cannot ask what the previous color was for that same pixel location on the previous animation frame. All that info is lost at 60 times a second. And due to the randomness inherent in path tracing (I will go into more detail in a future post), each image comes back noisy and slightly different noise-wise from frame to frame. If you just render with no pixel memory/pixel history like that, the raw and fast moving, never-converging noise will be very distracting at 60 fps. So in order to smooth out the rendering over time (aka Progressive Rendering), and smooth out moving objects and a possible moving camera, another separate full screen quad is necessary. This second quad’s only job is to quickly copy whatever is on the screen from the first animation frame. It’s full screen copy image is fed back into the original path tracing fragment shader as a large GPU texture uniform, and the path tracer now has a starting point to blend with before it displays the 2nd frame to the screen. In other words, the path tracing fragment shader now suddenly ‘remembers’ what it rendered just before. This extra quad is sometimes referred to as a copy shader - that’s its only purpose. And this saving and feeding back into the original shader is sometimes called a ping pong buffer.

If you keep feeding the image back into the path tracer every frame, and average/divide the results by how many frames you’ve rendered, over time the image will magically converge before your eyes! And if you need a dynamic camera and don’t care too much about obtaining a perfect ‘beauty render’ , you can still use this pixel history buffer to slightly blur/smooth out the camera’s changing view as well as any dynamically moving scene objects. If it is the later, it is sometimes given the derogatory name ‘ghosting’, but I find that if used responsibly, it really helps a dynamic scene and the end user won’t mind too much, or may not even care. Check out my outdoor environment demos or Cornell box filled with moving water demos to see this technique in action.

Well, this post is already long enough, so I’ll be back with further implementation details in the next post. See you then!


Part 3 - Implementation (cont.)

Ok we have a plan for how to create a feedback (ping pong) buffer so that the noise impact can be slightly softened if the scene/camera is dynamic, and if the camera is held still and the scene is not dynamic, the image will quickly refine and improve over a short time (progressive rendering).

Now you might be wondering, can you just set all this up, buffers and everything, with raw WebGL without any libraries? The answer is yes, and if you’ve seen the older Made by Evan Wallace path tracing demos, there is the proof. I think the only library he used was the glMatrix math library. Everything else was raw WebGL, GLSL, and Javascript.

When I began my project, I briefly considered this route, and although it would have been a fun and interesting challenge to entirely roll my own code, I ultimately decided to build my renderer on top of an existing WebGL library (three.js at the time).

If you take a closer look, Evan’s path tracer, although really cool, is more limited than what can be done by using a base library to help us out. The most notable limitation is the type of scenes you can create. His only had spheres and boxes. Ours (when the Babylon port is complete) can do any kind of scene with complex triangular geometry. Also, just to be able to load in various model formats and read/convert geometry and data from a file requires extensive knowledge and great effort.

I chose to use a pre-existing library because it really helps get all the tedious WebGL boilerplate out of the way and provides many useful resources and routines for setting up 3D environments in the browser. When we start adding in 3D camera movements, render targets, data textures, 3D transforms, physics (for games and interactive apps), and mobile controls, that is when the underlying library really starts to shine.

Having said that, I must admit that I have mainly been involved with three.js over the years, but I have always admired and respected Babylon, its creators, and its community for making such a great engine that is cutting edge, open source, and that has a warm, inviting atmosphere here on the forums and its Github.

So even though I will mainly be sharing how I set up and implemented this project in three.js, I have the utmost confidence that everything can and will also be done with Babylon.js in the near future. I hope that if you are familiar with Babylon, that you will be able to translate my three.js actions and thinking over to the equivalent Babylon.js routines and way of doing things.

For instance, all apps/demos in this path tracing project require 3 connected scenes. The first is the most obvious - the actual scene that will be rendered. I called this the pathTracingScene and it will have the usual 1st person perspective camera and will contain all the scene objects to be rendered. The second is called the screenTextureScene (but I realize nowadays that I should have called it screenCopyScene) because its only purpose is to copy what has been rendered in the first pathTracingScene that we just created. It will consist of an orthographic camera and have only 2 large triangles stretched across the screen like we mentioned in the previous post. Together with the 1st scene, it will create a ping pong buffer that will keep feeding back on itself so that the successive images will get more blended and the missing information will get filled in frame by frame, thus improving the image quality. The 3rd and final scene I called screenOutputScene and one of its tasks is to take the output of the ping-pong buffer and present it to the screen. Not only is it charged with that, more importantly it must bring the unbounded light/color intensity values from the path tracing into the range of the monitor display, (aka tone mapping) and also must gamma-correct the final image. I suppose there is a way I could have had one of the first scenes just display the final image while also acting as the ping or pong buffer, but it gets complicated because you can’t send any tone-mapped or gamma-corrected colors back through the feedback loop - it’ll end up losing precision and the colors will become washed-out looking. Therefore, I chose to separate the 3 stages of rendering and image processing into 3 separate scenes with their own geometry, materials, and shaders.

The 1st pathTracingScene will be where all the action happens. Its material shader’s uniforms will allow us to send real time Javascript data about the scene and object transforms, screen resolution, and moving/changing camera, so that the GLSL path tracer will render the updated and correct view instantaneously, hopefully at 30-60 times a second.

As mentioned, this pathTracingScene does not actually render to the screen, but rather it outputs to a render target. This render target needs to be in RGBA format, must be large enough to cover the full screen dimensions, and most importantly, must be of type Float. When I was starting out with my project, I spent weeks wondering why my renderings were looking awful with severe color banding, pixelation, and reduced output range. I finally figured out that all my render targets still had the usual 0-255 unsigned byte format and the color data was getting clipped and clamped and had poor precision. When I changed it to Float format (0.0 to 1.0), suddenly the image was perfect! When you’re dealing with ray and path tracing, you need that almost continuous range of color values. BTW the tone mapper keeps all the raw unbounded tracing color values inside this 0.0-1.0 range, which might sound limiting, but there are a LOT of possible floating point values between those 2 numbers!

The pathTracingScene’s render target is then fed into the screenTextureScene (screenCopyScene) as a texture uniform. All the screenTextureScene does now is take this texture, copy all the pixel colors, and spits it back out to its own render target. This render target is then fed back into the original pathTracingScene’s material as a texture uniform, in the exact same manner. This back and forth process continues indefinitely. Let’s say that the pathTracingScene does all its ray tracing calculations and outputs a white pixel, 1.0 in all channels. Now the screenTextureScene copies that value and sends the 1.0 value back to the pathTracingScene. The pathTracingScene does all its calculations again and since the camera moved, now that same pixel has a value of 0.0, or black. But, before it can send that out to its own render target, it must sample the previous pixel value, which was white (1.0 remember?), and then add them together. This number, which will get unbounded after a while, (think of full 1.0s getting constantly added and fed-back onto themselves) gets sent to the final 3rd scene, the screenOutputScene, again as a texture uniform. This scene’s material shader then divides whatever numbers it receives by the number of animation frames so far. So in our case, (1.0 or white, plus 0.0 or black), divided by 2 animation frames = 0.5, or a smooth gray pixel. Taking this idea further, if you were to have a random scene with random values constantly changing, like old fashioned black and white television-set noise, then the rendering will eventually converge to a smooth, uniform gray color everywhere, or 0.5. This concept is also the basis of Monte Carlo rendering, which I could go into further maybe in a future post.

Finally the screenOutputScene takes its blended, averaged pixel color values and brings them into a viewable range with a tone mapper, and then finally gamma-corrects the image so that there is more range in the human eye’s natural brightness response to the image, which is on an exponential scale (much like hearing sensitivity and decibel calculations). We then see the final, blended, averaged, tone-mapped, gamma-corrected rendering in all its glory! :slight_smile:

Hopefully I have made clear the functions and responsibilities of each of the 3 scenes and their geometry and materials/shaders. Armed with this knowledge, we can move on to setting up traditional scene objects like scene geometry, object transforms, cameras, lights, materials, etc.

The next part is coming soon!


This is so cool, I love this thread !!!


I am following along with you here. Love it, this is right down my ally. Keep up the good work!

1 Like

Hi, I registered to post this. I’ve got a project in mind perfectly suited for this for so I’m really excited!

You invited suggestions and I do have one for you: adding more samples per frame. By sample I mean a full render pass. I was able to hack this into your demos by using the ping pong code, and alternatively by looping the relevant bits directly in the shader (but only in the shaders that have a main() - I don’t know GLSL and was unable to figure out the other ones).
I would prefer the shader method but the looping seems to drastically increase the shader compilation time. I hope that’s just me doing it wrong.

I recorded 2 short videos to show the results, you may want to download them rather than view the Google encoded version because hey, there’s a lot of noise.
In the following boring descriptions of these videos (feel free to skip) SPF is Samples Per Frame.

The first one is the bi-directional demo, I start off with 3 SPF for a really ugly picture, then I turn on frame blending and see how it resolves. Then I turn frame blending off, and I start increasing the SPF to 77! Even without frame blending the image is pretty nice so I look around a bit, and then I turn on frame blending again. The speed at which it resolves is awesome.

The second video is in the transforming geometry demo, where the motion of the geometry shows rather severe ghosting. I start off with 1 SPF, no frame blending, then 15 SPF with no frame blending to show reduced noise that may make frame blending unnecessary for some applications (certainly mine; I want some noise and I want it to be consistent at all times… saves me adding fake noise in post!). Then I switch back to 1 SPF and turn on frame blending to show the ghosting. Then I set SPF back to 15 and the ghosting is basically gone.
Then, again with 1 SPF, I set it to always blend every frame even when the camera moves. The whole image blurs horribly (although I can see some use for the effect). SPF back to 15 and the blending is not really a problem anymore. My VA monitor has about this level of ghosting even if I don’t blend!

The image resolutions I used here were obviously low and this will be necessary in many cases. But not all! Even fullscreen at 3440x1440, 0.5 pixel ratio, I can run the bi-directional demo at 10x the speed. Yes please!
I hope I have convinced you that this is a desirable feature. Either way, thanks for everything.



Wow! The multi-sample version parts of the videos are awesome! I have a couple of questions about your system specs and on how you hacked the extra samples loop.

What graphics card are you using? Also, I thought that WebGL and requestAnimationFrame() and all browser implementations were capped at 60 FPS, but you were clearly breaking that speed limit with 75 FPS!

Also, could you please elaborate on how you hacked the ping pong buffer to get more render passes per frame? Currently, inside my GLSL main() function, I call CalculateRadiance() which does all the ray/path tracing and returns a color for that particular pixel. Did you put CalculateRadiance() inside some kind of loop, like for (int i = 0; i < 77; i++) { CalculateRadiance(); }

Now the reason I only had 1 sample per frame is because I wanted this path tracer to be able to run on a myriad of devices if they had a browser installed, even tablets and smartphones. Also, a big factor in why I never thought to increase the render pass count to 15 or 77 samples per frame is that I started this project 5+ years ago on my humble laptop with integrated graphics, and I am still using it till this day - and if a beefy routine on any of the demos made the shader compiler crash and lose the WebGL context on my machine, or the framerate on my device fell below 50-60 FPS for simple scenes, or 20-30 FPS for ray marching outdoor scenes, then I would throttle back either the complexity, iterations, or number of instructions per frame.

But you have made me a believer in multi-samples! If you wouldn’t mind sharing how you looped the rendering so many times a frame, maybe I could add a new user-controlled option in JavaScript that would be sent to the GLSL tracer via uniform, which would be the number of loop iterations. With WebGL 1, this was not a possibility because shader loops could not have a variable number for the iterations. But my project switched over to WebGL 2 as soon as I could, and I do believe you could have a uniform variable inside the loop declaration, for example for (int i = 0; i < uIterations; i++)… That way, if a user had a more modern powerful system, they could put whatever number they want until it started crashing on compile. They would get the maximum, almost noise-free benefit. Meanwhile, guys like me on low-end hardware or mobile, could put a ‘1’ in for that iterations uniform and it would hopefully not have an impact - in other words, it would be like nothing changed.

About the ghosting due to blending, yes that was a tough call on almost a per-demo basis. I kind of need some frame blending if the system specs are low and we’re only able to get 1 sample per frame. Now I for one am used to the noise, in fact I’ve come to prefer it over biasing a render, but even I, with a large tolerance for noise, could not stand no blending and 1 sample - it’s just too distracting! But similar to the iterations uniform, a simple boolean uniform could be sent to the tracing shader in order to turn on frame blending (and thus have a little ghosting on dynamic moving scenes), or in a more high-end system, the user could turn off frame blending (and remove any hint of ghosting), and still have really good path traced animations with a lot more convergence to mitigate the distracting noise, as you have so aptly demonstrated!

Looking forward to hearing more. Thank you for sharing the videos and for registering in order to bring this to my attention!

1 Like

I use a 2080 Super. Yeah… that’s a big one. And for the foreseeable future it probably is going to take a discrete GPU to make use of multisampling. But I don’t think it has to be one quite that fast. My laptop struggles too but based on how close it gets and how slow Intel graphics are, I think that anyone with a discrete GPU that isn’t super old should be able to run eg 3-5 samples at least, seeing a large benefit already. IMO most scenes look nice at 10+ and after that the diminishing returns are so heavy it takes a ton more to make a difference. Especially if the scene is well lit. As always, I’m assuming a low resolution here. Part of the trade off.

The fps is not a big surprise, vsync caps framerate at whatever the monitor refresh rate is. And that is 60 hz with most monitors, but 75 hz with mine. If it was uncapped it would have gone well beyond 75 fps… In that bi-directional lighting video, at that resolution and 1 sample, it could probably do like 5000 fps heh. Fluctuating wildly of course, depending on what I’d look at.

The ping pong implementation is a complete alternative to the shader implementation. I don’t change the shader at all here. At the bottom of commonFunctions.js you have those 3 steps for rendering. I put a loop around steps 1 and 2, and for the additional renders I set cameraIsMoving to false so the blending happens. Then I had to fix the noise, because the random seed in the shader is tied to a frame counter. Making sure to update that value, it just worked right away.
I also added a few controls to manipulate the blending in the ways I show in the videos. Sometimes consistency is more important so I want to be able to turn off the progressive effect.

With the shader implementation (I did it in GameEngine_PathTracer_Fragment.glsl) I don’t do any extra ping ponging, it’s just a loop in main() around CalculateRadiance() and the preceding code that sets up the ray, because it has to use a new seed (again remember to make sure it’s a new seed every iteration). Of course it’s now adding pixelColor += CalculateRadiance instead of assigning it, with pixelColor first being set to vec(0) at the start of main(). And don’t forget when pixelColor is used, divide it by the sample count.
Here I was hoping to also add dynamic control of sample amount in the shader, like I did with the ping pong loop. From your post I guess it doesn’t work that way? Since you’re talking about recompiling for a given number?

The only thing that gave me trouble was the randomness. I’ll skip to the end, the cameraIsMoving logic was resetting the framecounter, and this caused the same seed to be used all the time when I used that logic to turn off progressive rendering. It had a very confusing result of noise that somehow still looked random every frame but it also clearly wasn’t random when looking at certain effects like caustics or just dark shadows. I don’t quite understand it.
Anyway this pattern did go away every time I enabled frame blending so that led me to the cause. I fixed it by using a new framecounter in commonFunctions.js and had it override the existing one for all demos (it’s kind of inconvenient that they have all have their copy of this bit of code). This new counter always goes up and just resets when it gets bigger than some high number. And with that every pixel everywhere is properly random.
In the shader version I fell into a much sillier trap. If you’re not paying attention you might do something like framecount 1, framecount 2, framecount 3… and in the shader loop use framecount+i for the seed, but this will of course reuse the same seeds in the next frame because 1+2 and 2+1 are the same thing. So do something like incrementing the framecount by the amount of samples, make the loop jump past all previous seeds to use entirely new ones, and this is fixed.

I think that about covers it!
I have to say, I usually have a hard time figuring out what other people do in their code but I found my may around pretty fast in yours. Well done!



Thank you so much for your detailed and helpful reply! I think I understand the general picture and methods you used. Also I understand the complications that arise from trying to get good randomness in a shader that doesn’t have a native random function! I find this so frustrating. My hope is that one day, graphics card manufacturers will set aside a tiny part of the hardware for a fairly good random number generator (no noticeable repeating, but doesn’t have to be scientific research quality), that quickly creates a random float between 0.0-1.0 - the kind that we currently have to manually hack into the shader ourselves. Is that so much to ask, NVIDIA and AMD? :wink:

Just so I understand, are you looping the ping pong part (the first 2 steps of rendering at the bottom of my commonFunctions.js file), as well as looping the CalculateRadiance() part in the shader main() function?

Are you on GitHub, because it would really help if I could see your changes. Could you maybe create a Gist on GitHub (you can make it public for all to see, or private with link only access, it’s up to you) and just copy and paste the edited commonFunctions.js file and maybe another separate Gist containing the edited GLSL file with the looped CalculateRadiance() inside of the main() function, with the random number edits. That would really help me get your great suggestions into my code base and then everyone could benefit. :slight_smile:

Thanks again!

1 Like

No, it’s either the ping pong change or the shader. Although I suppose you can use both methods together just fine, I don’t see a reason to.

This Gist thing is new to me. It’s just a single file? I’ll put up the shader version as the changes are concentrated in one place.
I’ve improved this since my previous post. The reduction in ghosting was only happening with the ping pong hack, the way I explained the shader hack it would not have better ghosting at all. The code here fixes that.
You can also ignore what I said earlier about the frame counter. I checked this shader with the original and found the noise to be perfectly random. It must be that I had messed it up myself and then later didn’t realize it was my own mess I was fixing.

It’s set to loop 4 times and this runs at mostly 60 fps on my macbook if I make the browser window as small as possible.
The macbook is showing a weird pattern on the red cylinder that doesn’t appear on my PC. For now I’ll go with that being Intel’s fault.
Finally, the for loop in the shader causes a ton of division by zero warnings at compilation time and I don’t know why. It is the loop itself, if I just create a float i = 0.0 instead of doing the for loop, then there’s no warnings. If I do use a for loop but it only runs one iteration with i = 0.0, the warnings appear. This should be the exact same thing so it makes no sense to me.

I don’t mind sharing the ping pong method as well but I’d have to take some time to clean it up. I messed around and broke some things.


Thanks so much for the Gist! Yes, as far as I know Gists are just 1 file each. It’s good for sharing a file without having to create an entire repository or entire branch for the benefit of the person you’re sharing with. Sometimes I make private Gists so I can save a snapshot of a file that is working so far, then I can go mess with it in Visual Studio Code editor and do whatever I please without worrying about getting back to the working version if I totally screw things up. In my case, I’m always messing around with single path tracing GLSL files, and if I try to do it the old fashioned way (version control), it’s just a lot of more maintenance and headaches for only 1 file that I’m constantly messing with.

I suppose dropbox could do the same thing, but it doesn’t have syntax formatting and highlighting like Gist(GitHub) does - which is handy if you were to try and quickly view it without downloading it and bringing it into your editor.

I got similar results with my Toshiba laptop (integrated graphics) - I also had to reduce the window size to a minimum to get 40 FPS with 4 samples. Just for kicks I tried 16 samples but after reducing to smallest window size, I only got 15 FPS. But it looks so good! Ha

It greatly helped me to see how you set up the loop inside main(). I understand what you did with the seed calculations - great job! The numbers it produces would be 0 and 1, 1 and 2, 2 and 3, 3 and 4, etc… which might sound bad because it repeats a number on every iteration, but the nifty rand() function by the brilliant iq (of ShaderToy) takes 2 numbers (as a uvec2) and if those 2 numbers are in different argument order inside rand(), like rand(1,2) vs. rand(2,1) vs rand(2,3), you should get a valid random number regardless from the RNG. I don’t know how his little 4-line bit manipulating random function does it, but it is truly magic!

The only part I’m confused about is your calculations of previousColor vs. pixelColor. Just so we’re on the same page, pixelColor is the current ray traced color being calculated at that very moment. The more of pixelColor you have, the noisier the image, but there is minimal to no ghosting. previousColor is the pixel color from the previous animation frame. The more of previousColor you have, the more blended or ghosted the image is, but also it is a little less noisy and distracting. So, it would seem that if N_SAMPLES was low, you would want less current pixelColor (noisy) and more previousColor to blend with, which although would result in a more blurry or ghosted image, it would be less noisy. On the other hand, if N_SAMPLES was ridiculously high, like 77, we could rely just on pixelColor and not even use previousColor to blend with. This is because pixelColor (although noisy if sample number is low), in this high sample case would come back almost converged and ‘blended’ with itself - so I would think that you could get rid of previousColor altogether. But looking at how you weighted those values and the branches that you take, the exact opposite occurs. If this is not what you intended, it might have been masked by the fact that you were reading all the good multiple-sample stuff from the previousColor (previous frame). The only issue with that is that you are essentially looking back in time so everything is 1 frame late to final screen presentation that the user sees. I hope my assessment of your calculations is somewhat correct, but if I’m wrong, apologies for muddying the waters.

Oh btw, you asked a couple posts ago about dynamic loop iterations. Now in WebGL1 it’s a no go, you must re-compile every time like you are currently doing with define N_SAMPLES 4.0 or define N_SAMPLES 8.0 for instance. However, one of the great things about WebGL2 (which we use) is that you can dynamically update the conditional variable inside the loop declaration in real time! So for instance, float nSamples = u_UserMultiSampleCount; Here u_UserMultiSampleCount is a user specified number that can change willy nilly while the path tracer is running. Then in the loop declaration, you just do (float i = 0.0; i < nSamples; i++) and this loop magically updates as fast as you can read keyboard input to increase or decrease the sample count in real time. So in one frame you could have 2.0 samples and the very next frame you could have 7.0 samples. Just for fun I hooked a uniform up to my mouse, and I was able to dial up and down the sample count (and thus framerate vs. quality) all with my mousewheel as the app was running! :smiley:

If you have a high enough sample count then yeah you can rely on that exclusively. And in my case I want to do that even with a low sample count. The option to do this just isn’t implemented in the shader, it was easier to stay in javascript and just try stuff with the ping pong thing.

Not everyone is going to like that level of noise though and it just can’t be brute forced. Even I can’t do that. So this shader keeps the progressive rendering, and I’ve tried to get the noise level under motion to be up to par with the noise level when static.

At first I wasn’t doing that, the multiple samples were just making a fresh frame a bit less noisy than 1 sample, and the resolve was faster. But I noticed the ghosting on moving objects was cleared up a lot, so I figured if it works for that motion, it should work for camera motion as well. So I put in the extra logic that the more samples you have, the more the blending values for camera motion move toward the values for no motion. Which is indeed the opposite direction of what you expected. It works as every iteration the influence of the old color is reduced more and more.
I think the final influence of the previous frame is something like 0.96^N_SAMPLES? It is probably too much but I didn’t tweak these numbers at all. And there is probably lots of tweaking to be done, especially if the use case is different. For example you won’t ever get a long term resolve out of these values as they give the new color too much influence.
In the end though this plain blending isn’t really the way to go about it… have you looked into TAA?

BTW I hope I didn’t distract you from the posts you were making, I’m looking forward to the next one. It’s good stuff!

1 Like


No worries about talking through this sample count suggestion in my implementation thread! It is actually quite relevant to the final implementation because I will try to add in a sample count option for various user’s systems out in the wild. I’m glad you brought it to my attention, because I wouldn’t have known to even attempt something like what you’re doing, due to my limited system specs.

Yes I agree that people have different tolerances for noise vs. frame rate, especially at the 1 to 4 sample count range. I guess I have gotten used to noise on diffuse surfaces because I’ve been staring at it for so many years, ha!
Of all the established algorithms and exact calculations used for path tracing, I feel that weighting the fresh pixelColor vs. slightly stale previousColor is to me the most finicky, most gray area, and most subjective calculation that we have to deal with. As you probably have, I have tried hundreds of weighting and blending combinations. And unless they are near opposite sides of the spectrum (progressive full blend vs. no ghosting at all), they all start to look similar after a while, and I start second guessing my weighting decisions, ha.

If you do indeed get the js ping pong code cleaned up, I would really like to see that in a Gist file as well. If anything, I can add the uniform sample count that can be specified by the user. I might even make it update real time without the need for re-compilation (as I mentioned in the previous post) so that maybe people could immediately see the results of their weighting and blending choices. Sort of like a chef tasting his/her creation before it is served! :smiley:

Looking forward to seeing more and discussing further. Thank you!

To all, sorry I kind of went off on a tangent discussion with @shimmy, but it is a very relevant topic to the final implementation, plus it’s a suggestion that I think will benefit everyone in the long run. Anyway, here’s the next installment!

Part 4 - Implementation (cont.)

So when we last left off, we had a plan for how to create a ping pong buffer for progressive rendering. If you would like to see the actual code for setting up the render targets inside Babylon, please check out the ongoing port by @PichouPichou (main port) and @JohnK (BVH port) in the original Babylon path tracing thread
I ask that you refer to their port code instead of me directly copying and pasting all my old code here and discussing each line because firstly, it would take too much room and time, secondly it is changing on a daily basis as they work through the port, and lastly, I am not as familiar with Babylon as I am with three.js. So I would like to defer to the above coders’ exact implementation if you would like to see a line by line implementation.

Now we can start talking about the various cameras necessary for path tracing in this system. As previously mentioned, since we have 3 separate scenes - pathTracingScene, screenTextureScene, and screenOutputScene, we must now have 3 cameras, 1 for each scene.

I’ll talk about the screenTextureScene and screenOutputScene cameras first, because they are identical in terms of camera needs, and in fact with three.js, I just created a single camera that was linked up to both scenes. Then you just call render on each sequentially. Since the rendering occurs in 3 separate scenes and these 3 separate stages, we can reuse the same camera for 2 of them.

This camera I’m referring to is an orthographic camera. The view frustum looks like a regular box, so there is no perspective. We don’t need perspective on these 2 scenes because their total geometry is 2 large triangles that cover the screen. These orthographic cameras/or camera (if you reuse) will not move at all, nor change any camera settings during the entire app. The 2 large triangles sit right up front in the view of these orthographic cameras and the 2 triangles in their scenes do not move either. Think of the whole set up as a big movie theater movie screen. The big screen doesn’t move, but the path traced image it is displaying will appear to have 3d depth and will definitely move and constantly update hopefully at around 30-60 fps.

I saved the trickier camera for last - the pathTracingScene camera. It requires multiple features for everything to work right. Similar to the other 2 scenes we just covered, it needs the 2 large triangles so that we have access to every single pixel on the screen and so we can send an individual personalized ray through each of those pixels all at the same instant, in hopes of doing the ray tracing in parallel on the GPU. However, the pathTracingScene also needs a dynamic camera that can move, rotate, change field of view (zoom in and out), and a camera that has some sense of depth with objects that are farther away getting smaller. A free perspective camera will fit the bill perfectly!

If you look at the list of uniforms sent to the pathTracingFragmentShader, you’ll see a 4x4 matrix uniform with the name uCameraMatrix. This is how we’ll get the updated free camera info in Babylon’s js engine sent over to the path tracer. This matrix holds the camera position and rotation and will update every animation frame. Also, in setting up the free look camera, we will provide the usual camera attributes, like its field of view and aperture, both of which will get sent via float uniforms to the ray creation stage inside the shader’s path tracer.

If the field of view is low or narrow, the rays cast out from the camera will be more parallel, and if the field of view is high or wide, the camera rays will diverge away from the line of sight more, giving a more distorted view. If the aperture is 0.0 (a theoretical pinhole camera with an infitesmal aperture opening), the rays will all start exactly at the camera origin point, and travel along their assigned ray directions, producing a perfectly sharp image, no matter what the distance from the camera. As the aperture increases in diameter (more like real world cameras), the rays will first start at a randomized aperture location (based on the requested diameter) and their directions will be toward their own focal point out in the scene, but their directions will be offset slightly by the random aperture position. The scene objects that happen to be close to the focusDistance point will be rendered perfectly sharp. As objects get farther and farther away from this point out in the scene, they get more blurry. As the aperture size increases, this effect becomes more pronounced.

Now here is the tricky part: the path tracing camera must be a perspective camera so we can talk about field of view, but it also has to render to the 2 large triangles that cover its viewing area, and if you recall, when we were dealing with the 2 triangles in the other scenes, we used orthographic cameras. This has the unwanted consequence that if the perspective camera turns or moves, the movie-screen 2 triangle plane gets moved around and eventually gets clipped when it falls out of view! You’re left with a black background and your beautiful path traced scene is being rendered on a quad that has disappeared, like a flat-screen TV floating away in the darkness of space, ha ha.

After much trial and error, I finally found the solution: make the 2 triangle quad a child of the perspective camera. Or another way to look at it, have the camera parent the quad. This way, the quad gets ‘stuck’ on the screen and no matter what crazy motions you make with your camera, the image quad (2 triangles) sticks right to it, kind of like in a funny movie where a large newspaper flies onto a windshield of a speeding car and sticks there no matter what the driver does, ha ha! :smiley:

A couple of extra camera parameters are needed in path tracing: Focus distance as a uniform float, and a boolean uniform of whether the camera is moving or not. I will go into more detail down the line when we get into more of the actual path tracing and sampling theory aspect of rendering.

I won’t cover user input like mouse and keyboard, because it is handled with the underlying js engine (Babylon in this case) in the usual manner. These inputs are typically linked to the camera variables and their uniforms that we just discussed. That way the path tracer can immediately respond to user input, such as increasing the camera’s field of view with a mouse scroll action, for instance.

In the next post, I’ll cover the scene objects and geometry that we want the path tracer to actually render and interact with.

Till next time!


Hey, I got that ping pong multisampling code here.

The relevant code changes are at the bottom.

This puts multisampling in all demos. Controls are 1-4 for setting pixel ratio, - and = for setting sample count, and p for switching between no frame blending / blending when static / always blending. The last mode is only good in demos with enough bias for the new frame.

The issues I had with this method are entirely down to the update function that is in each demo script. I wanted to not loop that so the animation isn’t sped up, and I could deal with the frame counter (had to override this either way), but the sample counter frustrated me. Running my own counter would be fine in some demos but too bright or dark in others, I couldn’t get something that worked for everything.
So I had to loop the call because that does work, but animations get sped up and this needs to be fixed in the demo scripts.


Part 5 - Implementation (cont.)

In this part we’ll take a look at the path traced scene itself: how objects are either loaded from disk or server, or created from scratch inside the path tracer, how to transform these scene objects, and how we can use the underlying library (Babylon.js) to help us with all of these tasks.

Now we hopefully have most of the bootstrap setup code out of the way (3 scenes with 3 render targets and 3 cameras (or 2 if you can reuse the ortho camera)). We can now decide what type of scene environment we wish to render - small or large room?, indoors or outdoors?, mathematical shapes or triangle models?, etc.

I’ll briefly go over all of the above decisions and how we sometimes have to change our approach to filling in scenes with arbitrary objects when we’re dealing with ray/path tracing and the GPU.

Let’s take the most simple scene objects and work up to the most complex. First, let’s talk about spheres. There’s a reason why if you see 99 percent of ray and path tracers out there, they all begin with (and some end with) just spheres. Also, if you take a graphics course in college and your professor tells you that you have to write a ray tracer by the end of the course, chances are the first and possibly only scene geometry requirement is sphere (and maybe plane) rendering.

There are many reasons for this first choice for rendering, but perhaps the more important ones are that the ray-sphere intersection code is historically well-known and optimized, the sphere object is easier to grasp geometrically and algebraically, finding a sphere surface normal is a simple matter of 1 vector subtraction, and when the sphere is lit and different materials are experimented with, you can immediately and easily see the changes.

A sphere (which is really a special case of an ellipsoid), has a surface that can be defined implicitly-> x2 + y2 + z2 - r2 = 0, if I remember correctly (the 2’s mean squared and the r is the sphere radius). Any 3-dimensional point in space that fulfills this implicit equation is a point located exactly on the sphere surface. On the other hand, the ray itself is usually defined explicitly, or with a parametric equation -> ray = ray.origin + ray.direction * t, where t is the parameter that defines how far we are along the ray, assuming that the ray.direction vector is always normalized to a unit length of 1. Without going too deep into the math, some smart person long ago figured out that if you place the explicit ray equation into the implicit sphere equation, and after all the algebraic reductions are performed, it ends up becoming a simple quadratic equation. And methods to solve quadratic equations have been around for many centuries - it basically boils down to root finding. There usually are 2 roots - a smaller root and larger root, often referred to as t0 and t1 in ray tracing. Once we find t0 and t1 with a quadratic formula, we take the smaller of these 2 roots, (but only if it is positive!) and stick this t value back into the simple ray parametric equation mentioned above, and we have our intersection point in 3d space! Since spheres can be defined with implicit equations and solved with quadratic equations, they are part of a class called quadrics. Other quadrics include ellipsoids, cylinders, cones, paraboloids, hyperboloids, and hyperbolic paraboloids (whew, that last one is a mouthful, just to describe a saddle or Pringle potato chip, lol). Ray tracers can handle all of these quadric shapes quite well and they are pixel perfect when rendered, no need for extensive tesselation of triangles or other polygons. You might think a torus (donut) fits into this class, but unfortunately it is known as a quartic shape, which requires a massive amount of calculations to find the closest of 4 possible roots (hence the name quartic). There are some cubic(3 roots) shapes as well, including bezier patches and the like, but those are also prohibitively expensive to analytically ray trace. In the end, cubic and quartic shapes are best represented as triangle meshes, or handled with ray marching distances rather than analytically trying to find multiple roots.

All the shapes described so far have some kind of mathematical curvature, but at the other end of the spectrum are all of the shapes that have straight edges and flat faces: planes, boxes, triangles, and quads. Luckily, just as with quadric shapes, there are many well known and efficient routines to handle a ray’s intersection with these shapes.

When choosing which end of the shape spectrum to use for modeling scene geometry, it is important to note that even though mathematical implicit quadric shapes are fun to ray trace and cool to look at in abstract scenes, it is very difficult to model complex geometry like a human, animal, car, spacecraft, architectural interior, etc. using just those shapes. Eventually everyone realizes that to render arbitrarily complex objects, the objects need to be made of either triangles or voxels. Triangles are a natural choice for representing complex geometry- a triangle has the fewest vertices and edges of all polygons, and by definition, a triangle is embedded in a plane with a well-defined surface normal vector. A quad is a convenient shape also, and one frequently used for modeling software, but great care must be taken to prevent one of the quad’s vertices from slightly veering off its plane, thus making a degenerate, untraceable shape. Therefore, over the years triangles have become the dominant construction primitive in professional renderers. The 1 big issue with using triangles to model everything is that after about 100 triangles, it starts to bog a ray tracer down and after 1000 maybe, it slows to a crawl. In order to reduce this bottleneck, much research has been done to create efficient acceleration structures that create hierarchies of triangles and use a divide and conquer approach when dealing with possibly millions of triangles in a modern scene.

Likewise, much research has gone into voxel rendering. If you’ve ever played Minecraft, then you know what voxels look like! :smiley: Now Minecraft is great and all, but when we say voxel rendering, what it usually refers to is SVO, or Sparse Voxel Octrees. In this rendering scheme, everything is a cube of varying sizes that are organized hierarchically, all the way from the entire scene universe bounding box, down to a grain of sand and smaller, even possibly at the molecular scale. Since there are very fast ray-box routines, these SVO renderers can be realtime also, while containing practically limitless resolution. Rather than the speed bottleneck, one of the major hurdles to overcome with voxel scene representation is storage. It takes many more voxels to create larger curved shapes, that could have easily been handled by a couple of mathematical shapes, or a handful of large arbitrarily rotated triangles. Another major hurdle is the streaming of such large voxel data sets into GPU memory on demand as a viewer navigates the scene. As of yet, I have not attempted ray/path tracing voxels and octrees, but I am currently studying them and will hopefully one day offer a side branch to my renderer that allows for minecraft type scenes as well as true SVO scenes with almost infinite detail and some kind of data streaming in the background.

Another popular alternative to analytical shape and triangle rendering is using distance approximations and instead moving the rays along in a step-wise fashion through the scene, being careful not to overstep and miss any object. This technique is called ray marching. Instead of tracing triangles or math shapes with analyitic t value finding, we can march rays against any kind of shape, as long as it has a distance approximation equation giving the distance from the ray’s origin to its own surface. Now one might be tempted to trace all shapes with this approach, but there is a big problem - ray marching is much slower than ray tracing analytically. However, if you don’t have the analytical implicit equation for an arbitrary shape, you must resort to either tesselating it with lots of triangles and using a BVH, or simply ray march it. Shapes that are great for ray marching include fractal shapes found in nature: mountains, water waves, clouds, etc. as well as purely mathematical fractal 3d shapes like a Mandlebulb. Check out my Terrain_Rendering demo where I do only ray marching for the entire scene. I even use ray marching to trace a torus in my geometry demos, because as previously mentioned, it is prohibitively expensive to analytically trace and find the closest of 4 possible roots of a quartic equation. In contrast, ray marching a torus can be done in about 5 lines of code!

So the next topic in this post that I wanted to cover is how to manipulate the objects once you get them into your scene and are ray tracing them. Let’s say you want to make a simple rotating box scene in which the box has been stretched and scaled. Well, the hard way to do it would be finding a complex ray vs. oriented-scaled-box intersection routine. Likewise, if a triangular mesh in your scene needs to be rotated, positioned, or scaled, you would have to go back and change the entire geometry vertex list and resulting BVH in order to trace an arbitrarily transformed model like that. That sounds like a lot of uneccessary work, and turns out, it is!

The solution that follows is one of my favorite things about ray tracing. Instead of transforming objects and using complex ray intersection routines on them, what would happen if we instead transformed the ray itself by the inverse matrix of the objects? It totally works! What’s happening is that the ray is transformed into the object space of the shape we wish to have translated, rotated, and scaled, and the end result looks like you had used the complex, expensive ray vs. transformed-shape routine all along! I won’t go deep in the math, but we can just use the object’s matrix inverse to multiply against our ray and we get instant, perfect transformation. To see this effect in action, please take a look at my Transforming_Quadric_Geometry demo. In the eyes of the path tracer, none of those shapes are moving at all. Instead the rays are reverse transformed, but the result is the same - although way more efficient!

Finally, how can a library/engine like Babylon.js help us when we want to add shapes and triangular meshes to a ray traced scene? When loading triangle data, we can rely on the loaders of Babylon to correctly and efficiently load an entire triangular mesh off of a server or off disk. Writing such a loader is no easy task. When it comes to creating objects on the fly, as previously mentioned, it is not as straightforward as calling new Sphere() or new Cylinder() and expecting to see anything. However, the underlying library can assist us greatly because we can use its notion of matrix manipulation and matrix math routines like matrix.rotate() and matrix.inverse() to apply to all the rays that are intersecting arbitrarily transformed objects, even entire rigid triangular meshes.

Doing matrix math like rotation and inversion in the shader is too expensive when WebGL is concerned. Therefore, it is better to do all of those manipulations every animation frame on the Javascript side, or in other words, with Babylon’s routines, and then send the results over to the ray tracing shader via matrix uniforms. The really nice thing is that once you have hooked everything up, you can call the familiar Babylon.js translate, rotate, scale, lookAt, etc. on your triangle mesh or math shape (just an empty transform in that instance), every animation frame if you want, and things will just work!

In fact, as far as the path tracer can tell, all it receives from our .js app is a bunch of transforms - think 3d gizmos in a scene editor - they don’t really tell you what the object looks like, but they contain all the info you need about how you want the object to be placed in the scene: its position, rotation, and scale.

The short of it is, if you want traditional math shapes in your scene, you can create them and trace them in the shader directly. Then use an empty transform (gizmo) inside the .js engine to manipulate those shapes, and finally their inverse transform to ray trace them. If you want arbitrarily complex triangle meshes, it’s best to use the Babylon.js loaders to first load in the data, then also use the engine to give a transform (gizmo again) to every model that you want to be moved or rotated, then finally you have to use some kind of data structure like a BVH to be able to efficiently trace all those triangles.
Lastly, if If you need shapes like those found in nature, and you can’t triangulate them, or you don’t have an easy math equation for their surfaces, it is best to ray march a distance to those objects with a distance approximation equation, and knowingly pay the speed penalty, but having the satisfaction of pixel perfect tracing of those arbitrarily complex, organic, possibly fractal objects.

I hope I have sufficiently demonstrated the differences between the traditional graphics geometry pipeline and ray tracing geometry, and the different mindset and approaches necessary for scene geometry rendering. In the next post, we’ll take a look at lighting the scene, and the differences between the library’s notion of lights and the path tracer’s definition and handling of lights.



Hi, thank you so much for posting your multi-sample .js code edits! Now I can see much better how you introduced the sample loop and handled the frameCounter and other variables. I’m away from my computer at the moment, but I will drop your file into my personal editor and play around with everything tonight on my machine. I hope to have feedback for you either late tonight or tomorrow. Just wanted to let you know that I received the Gist sample.

Thanks again for posting! Talk to you soon. :slight_smile:

Hello again, I had a chance to try out the .js ping pong code that you recently shared. Everything is working on that end and I think I understand the changes you made. The one remaining thing I’m confused about is what the GameEngine_PathTracer_Fragment.glsl shader file should look like in its main() section. Is the glsl you earlier shared with Gist meant to be used in conjunction with the recent .js ping pong multi-sample code that you also shared? Or can you possibly re-share that updated shader file as well, in its latest form, in order to correctly gel with the .js code?

I tried a couple of game_engine fragment shader main() versions on my own, but I’m not sure that the end result is exactly what you intended us to see.

Thanks again for sharing the files. Hope to hear from you soon.

1 Like