Skip to main content
Elaborating on matrix description.
Source Link
DMGregory
  • 140.8k
  • 23
  • 257
  • 401

Yes.

In games, we'll typically represent the coordinate transformation of an object (including the camera) as a 4x4 matrix - its "model matrix" or "object / local to world matrix". We can build this matrix out of columns representing the object's basis vectors and position in the world:

  • \$\vec r = \begin{bmatrix}r_x \\ \vec r_y\\ \vec r_z \end{bmatrix}\$ the object's "right" vector, in world space.

  • \$\vec u = \begin{bmatrix}u_x \\ \vec u_y\\ \vec u_z \end{bmatrix}\$ the object's "up" vector, in world space.

  • \$\vec f = \begin{bmatrix}f_x \\ \vec f_y\\ \vec f_z \end{bmatrix}\$ the object's "forward" vector, in world space.

  • \$\vec t = \begin{bmatrix}t_x \\ \vec t_y\\ \vec t_z \end{bmatrix}\$ the object's translation - the position of the object's origin / pivot, in world space.

We combine these with an extra row like so:

$$ M = \begin{bmatrix} \vec r & \vec u & \vec f & \vec t\\ 0 & 0 & 0 & 1 \end{bmatrix} $$$$ M = \begin{bmatrix} \vec r & \vec u & \vec f & \vec t\\ 0 & 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} r_x & u_x & f_x & t_x\\ r_y & u_y & f_y & t_y\\ r_z & u_z & f_z & t_z\\ 0 & 0 & 0 & 1 \end{bmatrix} $$

(Here I'm showing one popular, left-handed coordinate convention, assuming a column vector will be multiplied on the right. You'll see other convenctions in use too)

With this matrix in hand, transforming a point from an object's local space into world space is just a matrix-vector multiplication: \$\vec v_\text{world} = M \vec v_\text{local}\$

Here \$\vec v = \begin{bmatrix}x\\y\\z\\1\end{bmatrix}\$ for a point (translation applies),

...or \$ \vec v = \begin{bmatrix}x\\y\\z\\0\end{bmatrix}\$ for a direction/displacement (no translation)

The view transform, mapping points from world space into view space (the camera's local coordinate system), is just the inverse of the camera's local to world matrix:

$$ V = \left(M_\text{camera}\right)^{-1} = \begin{bmatrix} r_x & r_y & r_z & -\vec r \cdot \vec t \\ u_x & u_y & u_z & -\vec u \cdot \vec t \\ f_x & f_y & f_z & -\vec f \cdot \vec t \\ 0 & 0 & 0 & 1 \end{bmatrix}$$

This matrix \$V\$ is the view matrix. You can verify that multiplying this by a point \$\vec v_\text{world}\$ gives the zero vector if that point was at the camera's position (\$\vec t\$), that its x coordinate increases as the point moves parallel to the camera's right vector (\$\vec r\$), etc.

Here I'm assuming we haven't applied any scale or shear transformations to the camera, so \$\{\vec r, \vec u, \vec f\}\$ form a left-handed orthonormal basis (a pure rotation matrix, in this context), and we can just transpose the upper-left 3x3 block of the camera's local to world matrix to get its inverse.

See this answer for more on how transformation matrices in games tend to be structured.

Yes.

In games, we'll typically represent the coordinate transformation of an object (including the camera) as a 4x4 matrix - its "model matrix" or "object / local to world matrix". We can build this matrix out of columns representing the object's basis vectors and position in the world:

  • \$\vec r = \begin{bmatrix}r_x \\ \vec r_y\\ \vec r_z \end{bmatrix}\$ the object's "right" vector, in world space.

  • \$\vec u = \begin{bmatrix}u_x \\ \vec u_y\\ \vec u_z \end{bmatrix}\$ the object's "up" vector, in world space.

  • \$\vec f = \begin{bmatrix}f_x \\ \vec f_y\\ \vec f_z \end{bmatrix}\$ the object's "forward" vector, in world space.

  • \$\vec t = \begin{bmatrix}t_x \\ \vec t_y\\ \vec t_z \end{bmatrix}\$ the object's translation - the position of the object's origin / pivot, in world space.

We combine these with an extra row like so:

$$ M = \begin{bmatrix} \vec r & \vec u & \vec f & \vec t\\ 0 & 0 & 0 & 1 \end{bmatrix} $$

(Here I'm showing one popular, left-handed coordinate convention, assuming a column vector will be multiplied on the right. You'll see other convenctions in use too)

With this matrix in hand, transforming a point from an object's local space into world space is just a matrix-vector multiplication: \$\vec v_\text{world} = M \vec v_\text{local}\$

Here \$\vec v = \begin{bmatrix}x\\y\\z\\1\end{bmatrix}\$ for a point (translation applies),

...or \$ \vec v = \begin{bmatrix}x\\y\\z\\0\end{bmatrix}\$ for a direction/displacement (no translation)

The view transform, mapping points from world space into view space (the camera's local coordinate system), is just the inverse of the camera's local to world matrix:

$$ V = \left(M_\text{camera}\right)^{-1} = \begin{bmatrix} r_x & r_y & r_z & -\vec r \cdot \vec t \\ u_x & u_y & u_z & -\vec u \cdot \vec t \\ f_x & f_y & f_z & -\vec f \cdot \vec t \\ 0 & 0 & 0 & 1 \end{bmatrix}$$

This matrix \$V\$ is the view matrix.

Here I'm assuming we haven't applied any scale or shear transformations to the camera, so \$\{\vec r, \vec u, \vec f\}\$ form a left-handed orthonormal basis (a pure rotation matrix, in this context), and we can just transpose the upper-left 3x3 block of the camera's local to world matrix to get its inverse.

See this answer for more on how transformation matrices in games tend to be structured.

Yes.

In games, we'll typically represent the coordinate transformation of an object (including the camera) as a 4x4 matrix - its "model matrix" or "object / local to world matrix". We can build this matrix out of columns representing the object's basis vectors and position in the world:

  • \$\vec r = \begin{bmatrix}r_x \\ \vec r_y\\ \vec r_z \end{bmatrix}\$ the object's "right" vector, in world space.

  • \$\vec u = \begin{bmatrix}u_x \\ \vec u_y\\ \vec u_z \end{bmatrix}\$ the object's "up" vector, in world space.

  • \$\vec f = \begin{bmatrix}f_x \\ \vec f_y\\ \vec f_z \end{bmatrix}\$ the object's "forward" vector, in world space.

  • \$\vec t = \begin{bmatrix}t_x \\ \vec t_y\\ \vec t_z \end{bmatrix}\$ the object's translation - the position of the object's origin / pivot, in world space.

We combine these with an extra row like so:

$$ M = \begin{bmatrix} \vec r & \vec u & \vec f & \vec t\\ 0 & 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} r_x & u_x & f_x & t_x\\ r_y & u_y & f_y & t_y\\ r_z & u_z & f_z & t_z\\ 0 & 0 & 0 & 1 \end{bmatrix} $$

(Here I'm showing one popular, left-handed coordinate convention, assuming a column vector will be multiplied on the right. You'll see other convenctions in use too)

With this matrix in hand, transforming a point from an object's local space into world space is just a matrix-vector multiplication: \$\vec v_\text{world} = M \vec v_\text{local}\$

Here \$\vec v = \begin{bmatrix}x\\y\\z\\1\end{bmatrix}\$ for a point (translation applies),

...or \$ \vec v = \begin{bmatrix}x\\y\\z\\0\end{bmatrix}\$ for a direction/displacement (no translation)

The view transform, mapping points from world space into view space (the camera's local coordinate system), is just the inverse of the camera's local to world matrix:

$$ V = \left(M_\text{camera}\right)^{-1} = \begin{bmatrix} r_x & r_y & r_z & -\vec r \cdot \vec t \\ u_x & u_y & u_z & -\vec u \cdot \vec t \\ f_x & f_y & f_z & -\vec f \cdot \vec t \\ 0 & 0 & 0 & 1 \end{bmatrix}$$

This matrix \$V\$ is the view matrix. You can verify that multiplying this by a point \$\vec v_\text{world}\$ gives the zero vector if that point was at the camera's position (\$\vec t\$), that its x coordinate increases as the point moves parallel to the camera's right vector (\$\vec r\$), etc.

Here I'm assuming we haven't applied any scale or shear transformations to the camera, so \$\{\vec r, \vec u, \vec f\}\$ form a left-handed orthonormal basis (a pure rotation matrix, in this context), and we can just transpose the upper-left 3x3 block of the camera's local to world matrix to get its inverse.

See this answer for more on how transformation matrices in games tend to be structured.

Source Link
DMGregory
  • 140.8k
  • 23
  • 257
  • 401

Yes.

In games, we'll typically represent the coordinate transformation of an object (including the camera) as a 4x4 matrix - its "model matrix" or "object / local to world matrix". We can build this matrix out of columns representing the object's basis vectors and position in the world:

  • \$\vec r = \begin{bmatrix}r_x \\ \vec r_y\\ \vec r_z \end{bmatrix}\$ the object's "right" vector, in world space.

  • \$\vec u = \begin{bmatrix}u_x \\ \vec u_y\\ \vec u_z \end{bmatrix}\$ the object's "up" vector, in world space.

  • \$\vec f = \begin{bmatrix}f_x \\ \vec f_y\\ \vec f_z \end{bmatrix}\$ the object's "forward" vector, in world space.

  • \$\vec t = \begin{bmatrix}t_x \\ \vec t_y\\ \vec t_z \end{bmatrix}\$ the object's translation - the position of the object's origin / pivot, in world space.

We combine these with an extra row like so:

$$ M = \begin{bmatrix} \vec r & \vec u & \vec f & \vec t\\ 0 & 0 & 0 & 1 \end{bmatrix} $$

(Here I'm showing one popular, left-handed coordinate convention, assuming a column vector will be multiplied on the right. You'll see other convenctions in use too)

With this matrix in hand, transforming a point from an object's local space into world space is just a matrix-vector multiplication: \$\vec v_\text{world} = M \vec v_\text{local}\$

Here \$\vec v = \begin{bmatrix}x\\y\\z\\1\end{bmatrix}\$ for a point (translation applies),

...or \$ \vec v = \begin{bmatrix}x\\y\\z\\0\end{bmatrix}\$ for a direction/displacement (no translation)

The view transform, mapping points from world space into view space (the camera's local coordinate system), is just the inverse of the camera's local to world matrix:

$$ V = \left(M_\text{camera}\right)^{-1} = \begin{bmatrix} r_x & r_y & r_z & -\vec r \cdot \vec t \\ u_x & u_y & u_z & -\vec u \cdot \vec t \\ f_x & f_y & f_z & -\vec f \cdot \vec t \\ 0 & 0 & 0 & 1 \end{bmatrix}$$

This matrix \$V\$ is the view matrix.

Here I'm assuming we haven't applied any scale or shear transformations to the camera, so \$\{\vec r, \vec u, \vec f\}\$ form a left-handed orthonormal basis (a pure rotation matrix, in this context), and we can just transpose the upper-left 3x3 block of the camera's local to world matrix to get its inverse.

See this answer for more on how transformation matrices in games tend to be structured.