Ray Tracer

Overview

This project implements a physically-based renderer using the well-known technique of ray tracing. More specifically, it uses path tracing to implement global illumination for more realistic lighting in 3D renderings. Furthermore, this ray tracer is optimized for speed and quality by using a Bounding Volume Hierarchy (BVH) as an acceleration structure and by using adaptive sampling as a noise reduction technique. Overall, it can produce quality renderings of 3D scenes in reasonable amounts of time. Some sample renderings are presented below, as well as explanations of the techniques and optimizations to make it all work.

Part 1: Ray Generation and Intersection

Ray tracing works by emulating the way light travels in the real world--as photons originating from light sources and bouncing off various objects until it reaches your eyes. In practice, this is done in reverse, tracing the path light takes by starting from the eyes and looking in some direction at some object that reflects light from a light source. Thus, the ray tracer starts from a given camera position and view direction, generates appropriate viewing rays for each pixel in the image, and figure's out the pixel's color based on the object that the ray hit and its position/orientation relative to the scene's lighting.

Let us first approach the ray generation. We start with an image that we want to create in screen space coordinates, i.e. 800 x 600 pixels from a particular viewpoint. Samples are generated for each pixel by generating rays from the camera (or our eyes) to the image plane pixel, giving us a parametric equation for the ray, r(t) = o + td, where o is where the ray originates, d is the direction the ray travels, and t is a parameter that varies to make r(t) represent any position on the ray.

Next, we want to figure out which object the ray hits. We'll explore intersections with two types of geometries: triangles and spheres.

We can figure out if the ray hits a triangle by calculating the barycentric coordinates of the intersection point with the plane containing the triangle. This can be done with the equation o + td = (1 - u - v)p1 + u*p2 + v*p3, where p1, p2 and p3 are the triangle points and u, v, (1 - u - v) are barycentric coordinates. This equation can be rearranged to find the unknown values (t, u, v) for the intersection:

  td + u(p1 - p2) + v(p1 - p3) = p1 - o,

or in matrix-vector form,

  [d, (p1 - p2), (p1 - p3)] * [t; u; v] = [p1 - o].

This can be solved with basic linear algebra techniques, but a particularly efficient approach is to use the Möller–Trumbore intersection algorithm. We can verify that the equation is solvable and has a valid t > 0 and valid barycentric coordinates u >= 0, v >= 0, (1 - u - v) >= 0 that land inside the triangle to determine triangle intersection, and we are done!

Now for spheres. To determine intersection here, we can substitute the ray position into the sphere equation and solve for t.

  |(o + td - c)|^2 - R^2 = 0

Here, c is the sphere center, R is the sphere radius, and o + td is a point on the sphere when the equation holds. Expanding this out, we get a quadratic equation for t (the "." here means dot product):

  (d . d)t^2 + (2(o - c) . d)t + ((o - c) . (o - c) - R^2) = 0.

If we can solve for a valid t > 0, we have ray intersection with the sphere.

Once we have the object intersection information, we can do a lighting and color calculation for the pixels to accurately represent what our eyes should see. For now, we can use a debug color based off the surface normals at the intersection to observe that our implementation of ray generation and object intersection works as expected.

Part 2: Bounding Volume Hierarchy

Though we now have the methods to compute ray-object intersections, it would still take O(n) time to loop through all n objects and test them against the ray. This can be improved significantly to O(log n) time, which is a speedup by a factor of 100,000x or more in practice, depending on the scene. To do this, we can use what are typically called ray tracing acceleration structures, which are spatial data structures that help optimize ray traversal performance.

Here, we will use a bounding volume hierarchy (BVH), which is a tree structure that partitions all the shapes in the scene into disjoint groups of objects, whose bounding boxes may overlap. The ray then only needs to traverse the subset of objects within the bounding boxes that it intersects.

Bounding volume hierarchy (BVH) visualization. The entire cow is contained within the initial bounding box. The triangles on the cow's head are contained in a descendant, smaller bounding box.

There are several heuristics that one can choose to do the actual partitioning and construction of the BVH. The approach that is taken here is to split a parent bounding box into two children by the midpoint along the box's longest side. This would help partition space more evenly so that the ray can skip the traversal of more objects on average.

The time to ray trace a 3D scene with 1 ray (sample) per pixel is measured in the table below, using both the naive approach (testing every object against the ray) and the BVH accelerated approach. As we can see, the BVH helps render scenes of even hundreds of thousands of triangles within a fraction of a second. Without the BVH, the same scenes can take several minutes for the same result. The last measurement in the table below is already a 10,000x speedup! This difference would be felt even more as we increase the size, quality, and complexity of the scene (i.e. resolution, samples per pixel, and number of triangles).

Time to render various 3D scenes with and without BVH acceleration.
Scene	Number of Triangles	Naive Approach (seconds)	BVH Accelerated (seconds)
Teapot	2,464	9.5083	0.0417
Cow	5,856	20.0684	0.0480
Bunny	33,696	99.6719	0.0415
Bench	67,808	130.8857	0.0198
Dragon	105,120	328.4802	0.0478
Wall-E	240,326	(over 10 minutes)	0.0705

Here are a few of the moderately complex scenes that can now be ray traced with BVH acceleration, using simple surface normal (debug) coloring and 1 ray sample per pixel. Without using an acceleration structure, we would not be able to ray trace some 3D scenes with decent quality in a reasonable amount of time.

Part 3: Direct and Global Illumination

To complete the rendering, we must calculate the light reflecting off the surface in the direction of the viewing ray. This will depend on the properties of the material and position/orientation relative to light sources. We can also make a distinction between direct light sources and indirect lighting, which incorporates light bouncing from other surfaces instead of just the light source. We can perform all of these calculations to make a physically-based renderer by implementing a method called path tracing.

Here are some renderings of 3D scenes path traced for global illumination! This is the Stanford Bunny in the Cornell Box.

Directly lighted bunny. — Direct lighting.
(Only light sources and one bounce of light.)

Indirectly lighted bunny. — Indirect lighting.
(Only two or more bounces of light.)

Globally illuminated bunny. — Global illumination.
(Direct and indirect lighting.)

Here is another example.

Directly lighted spheres. — Direct lighting.

Indirectly lighted spheres. — Indirect lighting.

Globally illuminated spheres. — Global illumination.