According to Deblurring Dynamic Scenes via Spatially Varying Recurrent Neural Networks, given a 2D sharp image $x(m, n)$ and a blur kernel $h(k, l)$, the blurred image is obtained as
$$ y(m, n) = (x*h)(m, n) + e(m, n) = \sum_{k=-\infty}^{\infty}\sum_{l=-\infty}^{\infty} h(k,l)x(m-k, n-l) + e(m,n), $$
where $*$ is the convolution operator, $m$ and $n$ are row and column indices of pixel elements, $e(m, n)$ denotes the additive noise, and it is assumed that the image extends infinitely in the positive and negative directions, therefore, the boundary effects are ignored.
The deblurred image $\hat{x}$ can be obtained by minimizing the loss index including quadratic penalties for the restoration error and for the excessive values in the restored image
$$ \hat{x} = \arg \min_x \sum_{k=-\infty}^{\infty}\sum_{l=-\infty}^{\infty} \| y(k,l) - (H* x)(k,l)\|^2 + R(\hat{x}), $$
where $R(\hat{x})$ is a regularization function and $H$ is a two-dimensional discrete transform function of $h(k,l)$:
$$ H(\lambda_1, \lambda_2)=\sum_{k=-\infty}^{\infty}\sum_{l=-\infty}^{\infty}h(k,l) \lambda_{1}^{k} \lambda_{2}^{l}, $$
where $\lambda_1$ and $\lambda_2$ denote spatial one-pixel shift operators.
Since in some papers, (e.g. Fast Image Deconvolution using Hyper-Laplacian Priors) the $H$ operator is not used in the deconvolution process and instead, they use $h$ in the formula directly as the following, my question is: What does the $H$ operator exactly do and why is it needed?
$$ \hat{x} = \arg \min_x \sum_{k=-\infty}^{\infty}\sum_{l=-\infty}^{\infty} \| y(k,l) - (h* x)(k,l)\|^2 + R(\hat{x}). $$