You are missing a few key points.
After the application of the projection matrix, you have a 4-component vector in clip space (not screen space), which is a homogeneous coordinate system in which clipping will be performed (after your vertex shader).
After clipping, the surviving coordinates are divided by the w component to get normalized device coordinates in (-1, 1). A transformation will then be applied to move from NDC space to window coordinates, where the X and Y coordinates are normalized based on the viewport provided to OpenGL and the Z coordinate is normalized based on the depth range, which is ultimately what gives you your (0, 1) range for depth (unless you use glDepthRange to set a different range).
If you want to access this normalized Z value in your vertex shader, you will need to do the computation manually in the shader (based on the information above).