I am working on a renderer of sorts and I have some fundamental questions. Let's say I have a vector in a 3D virtual world space which is unitless, and I want to project it onto a 2D screen, so I need its pixel position.
My background is in computer vision, so this is how I would model the problem.
p = K * [R | t] * v
where v is my 3D vector in homogenous coordinates. R is a 3x3 rotation matrix, t is a 3x1 translation vector, and K is a 3x3 "camera matrix". The camera matrix takes the following form
f 0 cx 0 f cy 0 0 1 f is the "focal length" of the virtual camera and cx, cy is the "principal point" in pixels (usually the center of the screen).
So, assuming my camera is at the origin (and static), [R | t] moves the 3D vector v somewhere in front of the camera and I get some v'. Now, I want to project 3D vector v' onto the screen.
v' = [x' y' z']^T
Kv' = [fx*x'+cx fy*y'+cy z']^T
Now, dividing by the perspective z' I get p.
p = [(fx*x'+cx)/z' (fy*y'+cy)/z' 1]^T
This certainly works in code and I'm getting favorable results in my renders.
But I fundamentally don't understand what it means to express focal length in terms of pixels. In a real camera it's the physical distance between the imaging plane and the focal point of the lens, but there's no such analog in 3D graphics.
Furthermore, I don't really see why the dimensional analysis works out, and why p is indeed in units of pixels.
Thanks for reading!
