Explain how Inigo Quilez calculates SDF box normals

Question

Inigo Quilez's website has a page of 3D ray-surface intersectors, one of which is for a basic 3D box:

// axis aligned box centered at the origin, with size boxSize vec2 boxIntersection( in vec3 ro, in vec3 rd, vec3 boxSize, out vec3 outNormal ) { vec3 m = 1.0/rd; // can precompute if traversing a set of aligned boxes vec3 n = m*ro; // can precompute if traversing a set of aligned boxes vec3 k = abs(m)*boxSize; vec3 t1 = -n - k; vec3 t2 = -n + k; float tN = max( max( t1.x, t1.y ), t1.z ); float tF = min( min( t2.x, t2.y ), t2.z ); if( tN>tF || tF<0.0) return vec2(-1.0); // no intersection outNormal = (tN>0.0) ? step(vec3(tN),t1)) : // ro ouside the box step(t2,vec3(tF))); // ro inside the box outNormal *= -sign(rd); return vec2( tN, tF ); }

In addition to calculating the intersection, it also calculates the surface normal at the point of intersection. Calculating the normal is the part I'm interested in. I would like to understand how it does this, but the page doesn't break down the math at all, and I haven't found any other sources that do (also, not understanding the process limits my understanding of what to search for).

Some parts make sense conceptually, like the calculation of near/far intersection points for the ray. But the calculation of constants that make up that calculation are lost on me. Why calculate 1.0/ray_direction? How is ray_origin/ray_direction useful? Etc.

I'd like to understand how this works, rather than treat it like a black box.

Edit: I found another version, modified to only return normals, but shares many of the same calculations. It may be easier to explain, I just wish I knew how it worked.

vec3 boxNormal( in vec3 ro, in vec3 rd, vec3 boxSize) { vec3 m = 1.0/rd; vec3 n = m*ro; vec3 k = abs(m)*boxSize; vec3 t1 = -n - k; return -sign(rd)*step(t1.yzx,t1.xyz)*step(t1.zxy,t1.xyz); }

To be honest, it's very very weird. Variable names are confusing, also it has a "rad" as the box input. Mr. Quilez is very smart, but not a good teacher. — darkgaze
– darkgaze, Commented Dec 26, 2024 at 9:44

Theraot · Accepted Answer · 2023-02-25 07:03:17Z

We are looking at the function boxIntersection.

I'll reiterate the information from the comment:

// axis aligned box centered at the origin, with size boxSize

We are working with a box.
Centered at the origin.
And it is axis aligned.

And let us read the parameter names:

ro is ray origin.
rd is ray direction.
boxSize is the size of the box.
outNormal is the computed normal.

We will pretend for an instant that we don't have the code and see how we can approach this… By the way, I'll be writing pseudo-code.

As you probably know, the points along the ray can be described like this:

pt = ro + rd * t

However, let us step back a bit, and let us work on individual axis:

pt.x = ro.x + rd.x * t pt.y = ro.y + rd.y * t pt.z = ro.z + rd.z * t

And, of course, we want to compute the collision of that ray with the box. Let us start with just one of its planes... Since the box is axis aligned, and boxSize has its dimensions, there should be a plane of the box at boxSize.x * 0.5... Except that does not seem to match the shader. It appears there is a plane at boxSize.x instead. Which lead us to this equation:

ro.x + rd.x * t = boxSize.x

If we solve for t, we get this:

t = (boxSize.x - ro.x) / rd.x

If we do the same for all planes of the box we have:

ta = (-boxSize.x - ro.x) / rd.x tb = (-boxSize.y - ro.y) / rd.y tc = (-boxSize.z - ro.z) / rd.z td = (boxSize.x - ro.x) / rd.x te = (boxSize.y - ro.y) / rd.y tf = (boxSize.z - ro.z) / rd.z

Which we can do in two lines:

t1 = (-boxSize - ro) / rd t2 = (boxSize - ro) / rd

We can further approach the code we are trying to understand one step at a time.

First, in both lines we divide by rd, which is the same as multiplying by 1.0/rd, we can pre-compute that:

m = 1.0/rd t1 = (-boxSize - ro) * m t2 = (boxSize - ro) * m

We can distribute those products:

m = 1.0/rd t1 = m * -boxSize - m * ro t2 = m * boxSize - m * ro

And m * ro is something else we can also pre-compute:

m = 1.0/rd n = m * ro t1 = m * -boxSize - n t2 = m * boxSize - n

And we pre-compute m * boxSize:

m = 1.0/rd n = m * ro k = m * boxSize t1 = -k - n t2 = k - n

Ok, ok, let us write n first so it looks like the code we want:

m = 1.0/rd n = m * ro k = m * boxSize t1 = - n - k t2 = - n + k

We are missing an absolute value, we will get to that.

At this point it might help to have a geometric interpretation of these variables...

It might help to look at only one axis again, since these t1 and t2 should not be interpreted as position or vectors in space:

m.x = 1.0/rd.x n.x = m.x * ro.x k.x = m.x * boxSize.x t1.x = - n.x - k.x t2.x = - n.x + k.x

Remember that the t values (t1.x and t2.x for the x axis) are parameters to find the points along the ray. In this formula:

pt = ro + rd * t

One way of thinking of that is this: t tells you how many rds the ray needs to advance to reach the point.

Thus, we are working in rd units. Hence, it makes sense that we divide by rd.

As per the other variables - once we stop worrying about the division by rd - we can see that k is the separation between the planes of the box in rd units. And since the box is centered the origin, but the ray starts at ro, we also need to offset that by the ro... but also in rd units, i.e. n.

With that understanding we can understand that adding the absolute value to k, as shown below, would give us the same t values, it just changes the order in which we get them.

m = 1.0/rd n = m * ro k = abs(m) * boxSize t1 = - n - k t2 = - n + k

In what order do we get the t values? Well - assuming that boxSize is positive, which is not checked - k should be positive.

And thus t1 would have smaller distances than t2. We can think of t2 as the back face of the box, and t1 as the front face.

We would not have gotten the back and front faces separated like this without the absolute value.

Let us also reason about the sign of n. In order to do so, it might help to change from this:

m = 1.0/rd n = m * ro

To this:

n = ro / rd

And, again, let us look at only one axis:

n.x = ro.x / rd.x

So a components of n is positive if the component of ro and rd have the same sign. In other words, if the ray is going away from the origin.

Ergo, the components of - n are positive when the ray is looking towards the origin.

This makes sense given that when the box (which is centered at the origin) is behind the ray origin, the ray would have to travel backwards to intercept it. So when the ray is looking away from the origin (the center of the box) we get a negative - n. In that situation the ray might still collide with the box, if it is large enough. Which would be a case of t2 = - n + k (the back face), never t1 = - n - k (because - remember - k should be positive… unless the size of the box is inverted… And the intersection should be in front of the ray, and thus a positive t).

Back to the code:

m = 1.0/rd // rd unit conversion n = m * ro // origin in rd steps k = abs(m) * boxSize // box size in rd steps t1 = - n - k // front face ts t2 = - n + k // back face ts

Let us narrow our search for the correct t…

I want you to imagine the box, and you are looking at one of the cornets. The planes of the front faces extend to infinity.

If we get the minimum (the first plane that the ray intersect) we would see these planes that cover the whole picture partitioning it in three and meeting at the corner of the box. The corner of the box is the point further away from the camera that we see.

But behind the front faces of the box, the box does not have any other front faces. In fact if we get the maximum, we get the planes coming out of the corner of the box and escaping to infinity (they won't cover the whole picture, at least not in perspective projection). The corner of the box is the point closer to the camera that we see.

So we want… The maximum:

m = 1.0/rd // rd unit conversion n = m * ro // origin in rd steps k = abs(m) * boxSize // box size in rd steps t1 = - n - k // front face ts t2 = - n + k // back face ts front_t = max(t1.x, max(t1.y, t1.z))

And for the back faces we want the opposite:

m = 1.0/rd // rd unit conversion n = m * ro // origin in rd steps k = abs(m) * boxSize // box size in rd steps t1 = - n - k // front face ts t2 = - n + k // back face ts front_t = max(t1.x, t1.y, t1.z) back_t = min(t2.x, t2.y, t2.z)

Well, except the code does it a bit different:

m = 1.0/rd // rd unit conversion n = m * ro // origin in rd steps k = abs(m) * boxSize // box size in rd steps t1 = - n - k // front face ts t2 = - n + k // back face ts tN = max(max(t1.x, t1.y), t1.z) tF = min(min(t2.x, t2.y), t2.z)

Now we understand that the N in tN stands for NEAR, and the F in tF stands for FAR.

The next line I want to read from the code:

if( tN>tF || tF<0.0) return vec2(-1.0); // no intersection

This is checking two things:

If the near face of the box is further away from the ray origin than the far face of the box tN>tF, there is no intersection.

How can tN>tF happen? Well, there are two situations:
- Remember that we are considering these to be near and far (front and back) faces of the box based on an assumption of the sign of k, which is based on an assumption on the sign of boxSize. So tN>tF happens if the box is inverted (inside out).
- The near planes also extend away from the camera, so you will find intersections with them that are behind the far planes. Similarly, the far planes also extend towards the camera… Thus, tN>tF happens out of the bounds of the box.
If the far face of the box is behind the ray origin tF<0.0 there is no intersection.

First version of the normal computation:

 outNormal = (tN>0.0) ? step(vec3(tN),t1)) : // ro ouside the box step(t2,vec3(tF))); // ro inside the box

I don't think that is right. The parenthesis do not seem to match.

Anyway, let us start by making sense of the comments. When tN>0.0 it means that the near face is in front of the ray, and thus the ray origin is outside of the box.

Now, the step function will return 0.0 or 1.0 depending on the comparison of the arguments. In particular step(edge, x)

Let us expand step(vec3(tN),t1) to its components:

vec3( // 1.0 if tN < t1.x otherwise 0.0 step(tN,t1.x), // 1.0 if tN < t1.y otherwise 0.0 step(tN,t1.y), // 1.0 if tN < t1.z otherwise 0.0 step(tN,t1.z) )

Instead of writing 1.0 if blah otherwise 0.0, I'll write blah, and take that true is 1.0 and false is 0.0:

vec3( // tN < t1.x step(tN,t1.x), // tN < t1.y step(tN,t1.y), // tN < t1.z step(tN,t1.z) )

So we get a 1.0 for any component such that tN is smaller than it. In other words, we get a 1.0 for any component greater or equal to tN.

As we know, tN is the maximum of t1.x, t1.y, and t1.z. So we get a 1.0 for any component greater or equal to the maximum of the components.

This means you get (1.0, 1.0, 1.0) for the corners. For the sides of the cube you get a 1.0 along the axis normal to the face. And for the edges between two faces you get a 1.0 along the axis normal of each face that meet.

For the other case we do similarly with tF, except the order is flipped because tF have the minimum not the maximum.

Thus, this (parenthesis fixed by me):

 outNormal = (tN>0.0) ? step(vec3(tN),t1): // ro ouside the box step(t2,vec3(tF)); // ro inside the box

Gives you a vector with a 1.0 along the axis normal to each face the ray hits.

And the next line:

outNormal *= -sign(rd);

Gives it the opposite sign to the direction of the ray.

Second version of the normal computation:

return -sign(rd)*step(t1.yzx,t1.xyz)*step(t1.zxy,t1.xyz);

Again, the factor -sign(rd) gives us the opposite sign to the direction of the ray.

The rest is swizzling trickery, let us break it down. We have:

return -sign(rd) * step(t1.yzx,t1.xyz) * step(t1.zxy,t1.xyz);

Which is the same as:

return -sign(rd) * vec3( // t1.y < t1.x step(t1.y,t1.x), // t1.z < t1.y step(t1.z,t1.y), // t1.x < t1.z step(t1.x,t1.z) ) * vec3( // t1.z < t1.x step(t1.z,t1.x), // t1.x < t1.y step(t1.x,t1.y), // t1.y < t1.z step(t1.y,t1.z) );

I remind you that when the expression I wrote is true we have a 1.0, otherwise it is a 0.0. Therefore, when we multiply them, it is equivalent to doing an AND on the expressions.

So, the above, is the same as:

return -sign(rd) * vec3( // t1.y < t1.x AND t1.z < t1.x step(t1.y,t1.x) * step(t1.z,t1.x), // t1.z < t1.y AND t1.x < t1.y step(t1.z,t1.y) * step(t1.x,t1.y), // t1.x < t1.z AND t1.y < t1.z step(t1.x,t1.z) * step(t1.y,t1.z) );

Which means this is what we are doing:

return -sign(rd) * vec3( // t1.x is the maximum step(t1.y,t1.x) * step(t1.z,t1.x), // t1.y is the maximum step(t1.z,t1.y) * step(t1.x,t1.y), // t1.z is the maximum step(t1.x,t1.z) * step(t1.y,t1.z) );

Notice that this code only consider the front faces. Thus it will work correctly as long as the origin of the ray is outside the box.

Although the opposite faces have the same normal, when the ray is inside the box, this code would be selecting the wrong one.

I found a third version of the code in Box - intersection. This version also comments on the checking both near and far faces:

 #if 1 // this works as long as the ray origin is not inside the box vec4 res = vec4(tN, step(tN,t1) ); #else // use this instead if your rays origin can be inside the box vec4 res = (tN>0.0) ? vec4( tN, step(vec3(tN),t1)) : vec4( tF, step(t2,vec3(tF))); #endif

That version of the code also takes two matrices which represents transformation from and to the box space. Which are used to encode rotation and translation of the box. The code converts the ray into box space, and proceeds as usual, then converts the result back to ray space.

"If the near face of the box is further away from the ray origin than the far face of the box tN>tF, there is no intersection." — user122973
– user122973, Commented Feb 25, 2023 at 7:26
@Strom I'll start by reiterating that we are solving the intersection of a ray with a box. We have already figured out where the ray intersects each plane of the box, we have the solutions as parameters for the ray equation. We have grouped them in two set: near planes (each component of t1), and far planes (each component of t2). And from each of those sets we picked the most appropriate intersection (tN for near, and tF for far). To be continued… — Theraot
– Theraot, Commented Feb 25, 2023 at 7:33
@Strom Now, if, somehow, the intersection with the near faces is further away with the intersection with the far faces tN > tF, it is either the ray is not pointing towards the faces of the box (e.g. it it passed by a side of the box). Remember that these are planes, and they extend to infinity. Or because the size of the box has negatives. Those aren't intersections with the box. — Theraot
– Theraot, Commented Feb 25, 2023 at 7:36
@Strom It might be worth mentioning that this is for ray marching. So we define a ray for each pixel of the viewport, intercept it with the geometry, and shade it according what it collided with. Some of those rays might not going to intercept the geometry. The rays are processed in parallel in the GPU, so the code we write only has to worry about one ray. So, yes, we only solved one point because we are only working with one ray, but collectively the code for all the rays will lead to a picture. Addendum: ray tracing and ray casting are close relatives of ray marching. — Theraot
– Theraot, Commented Feb 25, 2023 at 7:39

score 1 · Accepted Answer · 2023-02-25 07:12:47Z

Why calculate 1.0/ray_direction?

m = 1.0/rd;

This allows the slower division operation to be treated as a multiply later. (I do not fully understand the correctness of the zero handling, but if it works...)

vec3 n = m*ro;` this is `n=ro/rd;

Scale the rays origin by the raycast, this scales the normalized ray per box axis.

vec3 k = abs(m)*boxSize;

This line scales the size of the box to the "rd" space. The abs is needed since the size is direction independent.

vec3 t1 = -n - k; vec3 t2 = -n + k;

Translate the size about the normalized axis.

This set gives the bounds(top left and bottom right, or top right and bottom left ...(+-z expansion) due to the abs) of the original box in terms of the ray "normalized" vector. The -n term is arbitrary, but used later in the collision check.

float tN = max( max( t1.x, t1.y ), t1.z ); float tF = min( min( t2.x, t2.y ), t2.z );

This code gives the axis independent 2D bounds, from the transformed size of the box.

''' if( tN>tF || tF<0.0) return vec2(-1.0); // no intersection '''

Read the previous line as if the minimum of any element is greater than the maximum of any element or the maximum does not cross the new origin(+n), a collision has not occurred. This seems odd, but remember the (-n from before).

The tN and tF are 2 projections of the cube into 2d space, either contains it completely(tN>tF), or partially crosses(tF<0.0),

Scale the rays origin by the raycast, this "Normalizes" the ray per axis. Can you clarify this? Assuming the ray direction is already normalized, why does the ray origin need to be normalized? — Nairou
– Nairou, Commented Feb 20, 2023 at 5:06
The quotes "Normalize" means to extend the length of the ray to the bounds of the box. sorry poor terminology. — user122973
– user122973, Commented Feb 23, 2023 at 3:54

DMGregory · Accepted Answer · 2023-02-26 23:30:32Z

The other answers do a great job of showing the in-depth mathematical derivation, so what I want to try here is to build some intuition for how this works:

It's parameterizing the problem in terms of time.

You can think of the ray as a particle that starts at position ro ("ray origin") relative to the center of the box and moves with constant velocity rd ("ray direction") over time. We can imagine this velocity being measured in metres per second, though the units are arbitrary since we'll fast forward along the ray's entire future in one instantaneous pass. To find the intersection (and the normal), we want to know at what time t that particle hits the box.

To do this, we need a way to convert between distances and time, which is what this line does:

vec3 m = 1.0/rd;

m is a component-wise reciprocal of the velocity, so m.x answers the question "how many seconds does it take the particle to cross +1 unit along the x axis?" (this time can be negative if rd.x is negative, meaning you'd have to go back into the past / backwards along the ray to move right along the x axis, if the ray is pointing left). This gives us our exchange rate between distance and time.

vec3 n = m*ro;

This converts the ray position into a time, for each component:-n.x answers the question "How long will it take the particle to reach the x=0 plane?" (negative because ro is the vector from the box center to the ray origin, so it's measuring the distance backwards. That's why n always appears with a minus sign after this)

vec3 k = abs(m)*boxSize;

This applies a similar space-to-time conversion to the box dimensions, answering "how long does it take to get from the center of the box to one side?" along each axis. Here, boxSize is the box's half-extents, the distance from each side to the center.

From here it helps to think of the box as being bordered by six planes: three pairs of half-spaces like: -boxSize.x <= x <= boxSize.x, for each axis. From a given ray start point, three of those planes are on the "near" side of the box to us, and three of those planes are on the "far" side. To enter the box, we need to pass into all three near planes before we exit any of the far planes, otherwise we miss the box.

The next two variables give us the entry and exit times for each plane/half-space:

vec3 t1 = -n - k; vec3 t2 = -n + k;

t1 is the time it takes to reach the center plane minus the time it takes to travel between the center and one side, so that's the time when the particle enters the plane defining the near side of the box, on each axis.

Similarly, t2 is the time when the particle exits the plane defining the far side of the box, on each axis.

But we don't need to be inside just one of these planes/half-spaces, we need to be inside all six at once to actually be in the box. So now we need to combine our per-axis calculations into an assessment for the box as a whole:

float tN = max( max( t1.x, t1.y ), t1.z ); float tF = min( min( t2.x, t2.y ), t2.z );

tN ("time - near") is the first moment when the particle is on the "inside" of all three near planes. Ie. the latest time-of-entry for any one of the three near half-spaces.

tF ("time - far") is the earliest moment the particle exits any of the three far planes.

if( tN>tF || tF<0.0) return vec2(-1.0); // no intersection

If tF already happened in the past, then our ray started outside the box pointing away from it. If tN is greater than tF, then we pass by the box completely on one axis before we start to overlap it on another. Either way, we miss.

Otherwise, the particle does hit the box at time tN, which means we have an intersection point at position ro + rd * tN. But that might be in the past if tN is negative, meaning the ray origin was already inside the box.

In the case where tN is positive, we have a conventional intersection with whichever of the near planes we entered last:

outNormal = (tN>0.0) ? step(vec3(tN),t1)) : // ro ouside the box

This compares the last-entry time tN against the three entry times we used to compute it. This step function will return 1 on any axis that matches tN (or exceeds it, but that can't happen here since tN is the maximum of t1's three components), and zero on the other axes. So if you hit the box from a side perpendicular to the x axis, you get a 1 in the first component: (1, 0, 0).

We can only hit a face that's pointing opposite our direction of motion though, so the outNormal *= -sign(rd); line fixes this into (-1, 0, 0) if we were traveling right (and so hit the side of the box facing left).

Note that you could get a ±1 on two or even all three axes if you cross two or three of the planes at the exact same time — ie. you hit the box exactly edge-on or corner-on. But that will be vanishingly rare.

In the event we start inside the box, we still need a value for outNormal, even though we don't hit a face as usual. In the first code you show, the other branch of the ternary expression does basically the same thing with the exit times, to give you the normal of the "inside" face of the box that the ray will exit through: step(t2,vec3(tF)))

The second code you show doesn't have a special case for starting inside the box, and always gives you the normal of the face you entered through, even if that entry happened "in the past" / "behind" the start of the ray.

The space/time conversion perspective really nailed it, took my understanding of those early variables from fuzzy to solid. Thank you! — Nairou
– Nairou, Commented Feb 26, 2023 at 23:24

Stack Exchange Network

Explain how Inigo Quilez calculates SDF box normals

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Explain how Inigo Quilez calculates SDF box normals

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions