There is an advanced technology based on compute shaders that will analyze the depth buffer of the main camera, and return the min depth, and max depth, with 1 pixel precision (reduce operation). This information can be used to create a very precise frustum for the shadow camera. Elements outside the view that casts shadows are only in the frustum that goes from the viewed zone to the light. (oblique in the direction of the lighting).

One issue with this is that you don't know the min/max before rendering the world, and you need the shadow map to render the world. So to palliate to that, you can simply use a min/max pair that is one frame late. And to mitigate potential little artefacts if the player moves too fast, you can inflate the volume slightly by some epsilon.