I've been looking into Energy Based Models recently which Yann LeCun has been strongly advocating for. One problem that he lists with probabilistic based models is that in the case when there are multiple possible outputs for one given input, the probabilistic model will return the Expected Value of the possible outputs. An example is if a model is given the task of completing a video of soccer ball being kicked, the possible output videos could have the ball going left, straight, or right. All our possible outputs. However many models will just return the expected value which means the output will be a really noisy blurred image which makes sense.
My Question is how do Energy Based Models solve this problem. What example architecture is there that would solve this and why is this so.