Is the "View transform" just a change of basis matrix + a translation applied to each point? (to bring them into camera space)

Question

I am learning about the graphics pipeline. My understanding of the view transformation is that you:

First create a normalized, separate coordinate system for the camera, based on its position and orientation in world space.
Using those basis vectors, you can transform the world coordinate system into the camera's (by applying the view matrix to each point).
I believe the change of basis part of the view matrix is derived from the computed vectors of the camera's coordinate system.
I am confused about the translation part. I think this involves moving each vertex in world space by the offset of the camera, effectively making the camera the origin of the vector space.

I wish I could actually make a simulation of a grid transforming with the change of basis vector, with the 2 coordinate grids eventually aligning. And then in my head I'm imagining that the objects in the scene are then translated towards the camera (effectively putting the camera at the origin)

DMGregory · Accepted Answer · 2025-06-11 13:01:12Z

Yes.

In games, we'll typically represent the coordinate transformation of an object (including the camera) as a 4x4 matrix - its "model matrix" or "object / local to world matrix". We can build this matrix out of columns representing the object's basis vectors and position in the world:

\$\vec r = \begin{bmatrix}r_x \\ \vec r_y\\ \vec r_z \end{bmatrix}\$ the object's "right" vector, in world space.
\$\vec u = \begin{bmatrix}u_x \\ \vec u_y\\ \vec u_z \end{bmatrix}\$ the object's "up" vector, in world space.
\$\vec f = \begin{bmatrix}f_x \\ \vec f_y\\ \vec f_z \end{bmatrix}\$ the object's "forward" vector, in world space.
\$\vec t = \begin{bmatrix}t_x \\ \vec t_y\\ \vec t_z \end{bmatrix}\$ the object's translation - the position of the object's origin / pivot, in world space.

We combine these with an extra row like so:

$$ M = \begin{bmatrix} \vec r & \vec u & \vec f & \vec t\\ 0 & 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} r_x & u_x & f_x & t_x\\ r_y & u_y & f_y & t_y\\ r_z & u_z & f_z & t_z\\ 0 & 0 & 0 & 1 \end{bmatrix} $$

(Here I'm showing one popular, left-handed coordinate convention, assuming a column vector will be multiplied on the right. You'll see other convenctions in use too)

With this matrix in hand, transforming a point from an object's local space into world space is just a matrix-vector multiplication: \$\vec v_\text{world} = M \vec v_\text{local}\$

Here \$\vec v = \begin{bmatrix}x\\y\\z\\1\end{bmatrix}\$ for a point (translation applies),

...or \$ \vec v = \begin{bmatrix}x\\y\\z\\0\end{bmatrix}\$ for a direction/displacement (no translation)

The view transform, mapping points from world space into view space (the camera's local coordinate system), is just the inverse of the camera's local to world matrix:

$$ V = \left(M_\text{camera}\right)^{-1} = \begin{bmatrix} r_x & r_y & r_z & -\vec r \cdot \vec t \\ u_x & u_y & u_z & -\vec u \cdot \vec t \\ f_x & f_y & f_z & -\vec f \cdot \vec t \\ 0 & 0 & 0 & 1 \end{bmatrix}$$

This matrix \$V\$ is the view matrix. You can verify that multiplying this by a point \$\vec v_\text{world}\$ gives the zero vector if that point was at the camera's position (\$\vec t\$), that its x coordinate increases as the point moves parallel to the camera's right vector (\$\vec r\$), etc.

Here I'm assuming we haven't applied any scale or shear transformations to the camera, so \$\{\vec r, \vec u, \vec f\}\$ form a left-handed orthonormal basis (a pure rotation matrix, in this context), and we can just transpose the upper-left 3x3 block of the camera's local to world matrix to get its inverse.

See this answer for more on how transformation matrices in games tend to be structured.

Thanks, dude. So, is it right to say that each basis vector is represented by a column in the transformation matrix for the purpose of mutating them (and thus transforming the overall vector)? like, the column vectors represent each basis vector, got it, but they don't really represent their actual values... The linear transform for a 2D square would look like [1, 0, 0, 1]. But this would only maintain its scale (identity matrix), it doesn't include the actual coordinates, let's say the vector it's transforming is (5, 7). — Jared Kosiba
– Jared Kosiba, Commented Jun 12 at 18:25
Can you clarify what you mean by "actual values" / "actual coordinates"? If you mean the translation, that's the rightmost column. — DMGregory
– DMGregory ♦, Commented Jun 12 at 19:12
All i was saying, i think, is that the transformation matrix just represents a change to the original vector. The original vector is separate. I think the "change of basis" part of the transformation could be.. scaling, rotation, shearing, etc. The homogeneous coordinate allows you to include a translation, too. All just operate on the original vector, though. — Jared Kosiba
– Jared Kosiba, Commented Jun 12 at 20:28
Something about the View Matrix just trips me up, hardcore. I'm not sure what.... If it really is an inverse model matrix, then theoretically the model matrix also has a "change of basis" matrix in the first 3 columns. And you are just inverting that (changing it back). So, you get the inverse / opposite of the.. scaling, rotation, etc, applied to the camera. — Jared Kosiba
– Jared Kosiba, Commented Jun 12 at 20:29
sorry. i think i'm misunderstanding something. The change of basis transform is fundamentally different from a scaling, shearing, or rotation transform. — Jared Kosiba
– Jared Kosiba, Commented Jun 12 at 21:29

Stack Exchange Network

Is the "View transform" just a change of basis matrix + a translation applied to each point? (to bring them into camera space)

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

Is the "View transform" just a change of basis matrix + a translation applied to each point? (to bring them into camera space)

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions