I'm trying to make very simple (LUT-like) operations on a 16-bit gray-scale OpenCV Mat, which is efficient and doesn't slow down the debugger.
While there is a very detailed page in the documentation addressing exactly this issue, it fails to point out that most of those methods are only available on 8-bit images (including the perfect, optimized LUT function).
I tried the following methods:
uchar* p = mat_depth.data; for (unsigned int i = 0; i < depth_width * depth_height * sizeof(unsigned short); ++i) { *p = ...; *p++; } Really fast, unfortunately only supporting uchart (just like LUT).
int i = 0; for (int row = 0; row < depth_height; row++) { for (int col = 0; col < depth_width; col++) { i = mat_depth.at<short>(row, col); i = .. mat_depth.at<short>(row, col) = i; } } Adapted from this answer: https://stackoverflow.com/a/27225293/518169. Didn't work for me, and it was very slow.
cv::MatIterator_<ushort> it, end; for (it = mat_depth.begin<ushort>(), end = mat_depth.end<ushort>(); it != end; ++it) { *it = ...; } Works well, however it uses a lot of CPU and makes the debugger super slow.
This answer https://stackoverflow.com/a/27099697/518169 points out to the source code of the built-in LUT function, however it only mentions advanced optimization techniques, like IPP and OpenCL.
What I'm looking for is a very simple loop like the first code, but for ushorts.
What method do you recommend for solving this problem? I'm not looking for extreme optimization, just something on par with the performance of the single-for-loop on .data.