I would consider it a valuable output if review presented a reasonably accurate estimate / prediction of how a post is going to be received in a future.
To keep it practically scoped, I would limit "future" to one month after review completion (maybe 2-3 months if recently proposed longer term auto-deletion gets implemented).
In this case, usefulness of review could be analyzed by checking post status (positive, negative, neutral, deleted, edited) a month after review and comparing it against review outcome.
- Somewhat more ambitious goal of review could be to estimate how a poster is going to be received, analysis for this could then involve "aggregate" of first, 2nd, 3rd etc posts and comparing it against review outcome. For the sake of simplicity I'll keep that out of my answer.
Results of retrospective analysis can be further used to adjust reviewers behavior if it deviates too much from observed data. System can plant more "known bad" audits into the queue if reviewers tend to be too optimistic and more "known good" ones for the opposite outcome.
Note how similar approach can be used to analyze and adjust behavior of individual reviewers.
System can check and analyze later state of the posts they reviewed and present them an individually tuned desired mix of audits. It can even report particularly gross individual deviations to moderators for further manual analysis.
Difference from "aggregate" case is that for "individual" analysis and corrections to be timely, a shorter (and thus less accurate) retrospective time frame is probably needed, a week or two instead of a month.
Speaking of review audits, their primary goal is said to help hone moderation skillshelp hone moderation skills and per my observations of FP queue, audits there are too infrequent to accomplish that goal. You simply don't show folks enough "known good" / "known bad" examples to learn from.
- Note by the way that more "known bad" audits would help decrease the problem you seem to worry about, one about folks who "don't do anything at all", simply because such audits fail at No Action Needed.
Teaching reviewers to appropriately use Skip actionappropriately use Skip action would also help getting better quality, more relevant reviews.