Efficient way to loop through pixels of 16-bit Mat in OpenCV

Question

I'm trying to make very simple (LUT-like) operations on a 16-bit gray-scale OpenCV Mat, which is efficient and doesn't slow down the debugger.

While there is a very detailed page in the documentation addressing exactly this issue, it fails to point out that most of those methods are only available on 8-bit images (including the perfect, optimized LUT function).

I tried the following methods:

uchar* p = mat_depth.data; for (unsigned int i = 0; i < depth_width * depth_height * sizeof(unsigned short); ++i) { *p = ...; *p++; }

Really fast, unfortunately only supporting uchart (just like LUT).

int i = 0; for (int row = 0; row < depth_height; row++) { for (int col = 0; col < depth_width; col++) { i = mat_depth.at<short>(row, col); i = .. mat_depth.at<short>(row, col) = i; } }

Adapted from this answer: https://stackoverflow.com/a/27225293/518169. Didn't work for me, and it was very slow.

cv::MatIterator_<ushort> it, end; for (it = mat_depth.begin<ushort>(), end = mat_depth.end<ushort>(); it != end; ++it) { *it = ...; }

Works well, however it uses a lot of CPU and makes the debugger super slow.

This answer https://stackoverflow.com/a/27099697/518169 points out to the source code of the built-in LUT function, however it only mentions advanced optimization techniques, like IPP and OpenCL.

What I'm looking for is a very simple loop like the first code, but for ushorts.

What method do you recommend for solving this problem? I'm not looking for extreme optimization, just something on par with the performance of the single-for-loop on .data.

hyperknot · Accepted Answer · 2015-02-11 05:03:56Z

I implemented Michael's and Kornel's suggestion and benchmarked them both in release and debug modes.

code:

cv::Mat LUT_16(cv::Mat &mat, ushort table[]) { int limit = mat.rows * mat.cols; ushort* p = mat.ptr<ushort>(0); for (int i = 0; i < limit; ++i) { p[i] = table[p[i]]; } return mat; } cv::Mat LUT_16_reinterpret_cast(cv::Mat &mat, ushort table[]) { int limit = mat.rows * mat.cols; ushort* ptr = reinterpret_cast<ushort*>(mat.data); for (int i = 0; i < limit; i++, ptr++) { *ptr = table[*ptr]; } return mat; } cv::Mat LUT_16_if(cv::Mat &mat) { int limit = mat.rows * mat.cols; ushort* ptr = reinterpret_cast<ushort*>(mat.data); for (int i = 0; i < limit; i++, ptr++) { if (*ptr == 0){ *ptr = 65535; } else{ *ptr *= 100; } } return mat; } ushort* tablegen_zero() { static ushort table[65536]; for (int i = 0; i < 65536; ++i) { if (i == 0) { table[i] = 65535; } else { table[i] = i; } } return table; }

The results are the following (release/debug):

LUT_16: 0.202 ms / 0.773 ms
LUT_16_reinterpret_cast: 0.184 ms / 0.801 ms
LUT_16_if: 0.249 ms / 0.860 ms

So the conclusion is that reinterpret_cast is the faster by 9% in release mode, while the ptr one is faster by 4% in debug mode.

It's also interesting to see that directly calling the if function instead of applying a LUT only makes it slower by 0.065 ms.

Specs: streaming 640x480x16-bit grayscale image, Visual Studio 2013, i7 4750HQ.

Kornel · Accepted Answer · 2015-02-10 08:47:52Z

OpenCV implementation is based on polymorphism and runtime dispatching over templates. In OpenCV version the use of templates is limited to a fixed set of primitive data types. That is, array elements should have one of the following types:

8-bit unsigned integer (uchar)
8-bit signed integer (schar)
16-bit unsigned integer (ushort)
16-bit signed integer (short)
32-bit signed integer (int)
32-bit floating-point number (float)
64-bit floating-point number (double)
a tuple of several elements where all elements have the same type (one of the above).

In case your cv::Mat is continues you can use pointer arithmetics to go through the whole data pointer and you should only use the appropriate pointer type to your cv::Mat. Furthermore, keep it mind that cv::Mats are not always continuous (it can be a ROI, padded, or created from pixel pointer) and iterating over them with pointers will crash.

An example loop:

cv::Mat cvmat16sc1 = cv::Mat::eye(10, 10, CV_16SC1); if (cvmat16sc1.data) { if (!cvmat16sc1.isContinuous()) { cvmat16sc1 = cvmat16sc1.clone(); } short* ptr = reinterpret_cast<short*>(cvmat16sc1.data); for (int i = 0; i < cvmat16sc1.cols * cvmat16sc1.rows; i++, ptr++) { if (*ptr == 1) std::cout << i << ": " << *ptr << std::endl; } }

Michael Burdinov · Accepted Answer · 2015-02-10 07:21:49Z

2

Best solution for your problem is already written in the tutorial that you mentioned, in the chapter named "The efficient way". All you need is to replace every instance of uchar with ushort. No other changes are needed.

answered Feb 10, 2015 at 7:21

Michael Burdinov

4,4681 gold badge20 silver badges30 bronze badges

3 Comments

Adam Polak Moetsi Over a year ago

If you look at "The efficient way" that you mention it does not describe how to handle 16 bit data. This comment does not answer the original poster.

Michael Burdinov Over a year ago

There no difference between code for 8 bit and code for 16 bit except for change of uchar for ushort. This is all that need to be done

Adam Polak Moetsi Over a year ago

I believe what you are describing is the answer to the question. It may seem obvious to you, but knowing that "you can do that" to someone who is not familiar with C++/OpenCV is non-negligible

Collectives™ on Stack Overflow

Efficient way to loop through pixels of 16-bit Mat in OpenCV

3 Answers 3

1 Comment

Comments

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

3 Comments

Linked

Related