Skip to content

jdtremaine/hue-codec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Table of Contents

What is this?

Hue Codec is a header-only C++ library that efficiently encodes and decodes 16-bit, single-channel bitmaps (i.e. depth maps, disparity maps, infrared images, or high bit-depth grayscale images) to and from 8-bit, 3-channel bitmaps (i.e. RGB images).

Once in RGB format, conventional lossy RGB image and video codecs can achieve high compression ratios and relatively low signal loss on the encoded data. This approach has the advantage of being simple, fast, and portable, and it allows data inspection and previewing using commonly available image and video viewers. This library uses OpenCV for basic image manipulation but also includes a library-independent encoder and decoder for those that want to use other libraries.

This work builds on a whitepaper and reference code published by Intel. Note that the whitepaper has several errors, and neither the encoder included in the RealSense SDK nor the reference decoder provided along with the whitepaper is correct. The publication of this codec allows it to be used with other sensors not supported by the Intel RealSense SDK.

Why would I use this?

This library may be helpful to you if you are trying to stream or save the output from a depth sensor or another type of sensor that outputs 16-bit grayscale images.

Depth sensors generate large amounts of data. For instance, the Intel RealSense D400 series depth sensors can output 16-bit, 848x480 pixel depth maps at 90 frames per second [D400 Datasheet]. That's a data rate of 73MB/s (586Mbps) or over 4 gigabytes of raw sensor data per minute.

This codec allows you to encode that data to RGB so that conventional image and video codecs may be used to achieve large compression ratios. This technique lets you use hardware-accelerated image and video codecs to achieve high data throughputs.

The naive approach to encoding 16-bit depth data as RGB data would be to scale the 16-bit value to be a 24-bit value and use that as the RGB pixel value. Unfortunately, all modern lossy image compression techniques use perceptual compression that causes losses unevenly across the RGB image colour space. The result of data loss in the most significant bits will cause severe artifacts in the recovered depth maps.

A better approach would be to encode the 16 bits of depth data as hue information, as colour information is well-preserved by modern lossy image and video codecs. This codec transforms the depth values to hues in the RGB colour space, making encoding and decoding fast, simple, and relatively robust to artifacts.

How does it work?

A single-frame version of the pipeline works like this:

Step Description Result
Read Get a depth frame from a depth sensor Depth Map
Encode Hue-encode the depth frame Hue-Encoded Image
Compress Save the encoded image to a file webp file (quality=50%)
Decompress Read the image from the file Hue-Encoded Image
Decode Decode the depth frame Decoded Depth Map

In this example, the raw depth data is 115kb. The raw depth data saved as a 16-bit grayscale PNG is 63kb in size - a compression ratio of 1.8. The hue-encoded webp file (quality=50%) is 9kb in size - a compression ratio of 13.2.

A series of depth frames is called a depth stream. Depth streams can be saved using standard video codecs like H.264 and H.265.

What performance can I expect?

Use the benchmarks to get compression rates and frame encoding/decoding times for your data and hardware. Note that these benchmarks only include hue encoding and decoding, image compression and decompression, and do not include saving the data to disk.

The output of the benchmarks test for an Ubuntu system with an AMD Ryzen 7 2700X CPU and an Nvidia GeForce RTX 3080 is below. For single frames, the JPEG format offers the fastest combined save and load times, while WebP offers the fastest load times and best compression ratios while maintaining fidelity. A compression ratio of 60X can be achieved using WebP image compression without significant compression artifacts.

Image encoding benchmarks

Encoding PSNR CR save (ms) load (ms) save (kB/s) load (kB/s)
Hue-encoded only (Q=100) 77.4 1.0 4.5 3.0 68.3 100.7
Hue-encoded PNG (Q= 10) 77.4 4.8 548.2 4.7 0.6 65.8
Hue-encoded PNG (Q= 9) 77.4 4.8 533.7 4.6 0.6 66.3
Hue-encoded PNG (Q= 8) 77.4 4.7 234.4 4.9 1.3 63.3
Hue-encoded PNG (Q= 7) 77.4 4.6 76.5 4.8 4.0 63.7
Hue-encoded PNG (Q= 6) 77.4 4.5 47.1 4.8 6.5 63.4
Hue-encoded PNG (Q= 5) 77.4 4.3 27.6 5.1 11.1 59.8
Hue-encoded PNG (Q= 4) 77.4 4.1 20.0 5.3 15.4 58.2
Hue-encoded PNG (Q= 3) 77.4 3.8 22.5 6.8 13.7 45.0
Hue-encoded PNG (Q= 2) 77.4 3.5 16.9 5.4 18.2 57.3
Hue-encoded PNG (Q= 1) 77.4 3.3 16.2 5.5 19.0 55.8
Hue-encoded JPEG (Q= 0) 27.5 97.7 3.3 2.8 91.8 111.4
Hue-encoded JPEG (Q= 10) 31.3 66.8 3.5 2.9 88.7 106.1
Hue-encoded JPEG (Q= 20) 35.1 47.7 3.7 2.9 82.3 105.5
Hue-encoded JPEG (Q= 30) 37.8 38.5 3.5 2.9 88.3 107.2
Hue-encoded JPEG (Q= 40) 39.7 33.1 3.5 3.0 87.5 100.9
Hue-encoded JPEG (Q= 50) 43.3 29.1 3.6 2.9 86.4 105.7
Hue-encoded JPEG (Q= 60) 43.3 25.7 3.6 3.6 86.1 85.8
Hue-encoded JPEG (Q= 70) 44.8 21.9 3.6 3.0 85.0 102.9
Hue-encoded JPEG (Q= 80) 47.7 17.5 3.6 3.0 85.8 102.5
Hue-encoded JPEG (Q= 90) 52.8 12.0 3.7 3.5 83.4 88.8
Hue-encoded JPEG (Q=100) 56.5 3.5 4.3 4.7 72.1 66.0
Hue-encoded WebP (Q= 0) 36.4 121.2 17.7 3.0 17.3 103.4
Hue-encoded WebP (Q= 10) 39.5 82.1 19.7 2.9 15.6 104.3
Hue-encoded WebP (Q= 20) 40.2 66.0 20.0 3.1 15.4 97.8
Hue-encoded WebP (Q= 30) 42.2 56.2 21.4 3.2 14.3 95.9
Hue-encoded WebP (Q= 40) 44.7 49.5 20.5 3.3 15.0 94.5
Hue-encoded WebP (Q= 50) 44.7 44.4 21.6 3.3 14.2 92.0
Hue-encoded WebP (Q= 60) 45.0 39.9 21.8 3.4 14.1 89.5
Hue-encoded WebP (Q= 70) 47.0 35.9 23.0 3.8 13.3 81.4
Hue-encoded WebP (Q= 80) 50.8 29.1 27.6 3.8 11.1 81.8
Hue-encoded WebP (Q= 90) 54.0 19.3 30.7 4.2 10.0 74.0
Hue-encoded WebP (Q=100) 55.4 8.3 37.7 6.5 8.1 46.9

Video encoding benchmarks

On the reference system, hue encoding of depth streams adds an overhead of 0.5s per frame, and hue decoding adds a negligible overhead. Codecs are shown below in the format [fourcc code/container format]. All codecs achieved roughly the same PSNR when the default settings were used. The x264 (a.k.a. avc1) codec represents the best speed and compression ratio combination.

Here are the video benchmarks when ffmpeg is built with vpx, x264, and x265 codecs with CUDA support:

Encoding mean PSNR CR save (ms) load (ms) save (kB/s) load (kB/s)
Hue-encoded only/ 0.0 1.0 0.5 0.0 247.6 9950831.0
Hue-encoded MJPG/avi 28.7 6.8 1.4 0.6 83.2 180.7
Hue-encoded XVID/avi 29.1 6.3 1.0 0.6 116.2 202.3
Hue-encoded x264/avi 28.0 20.9 0.6 0.9 180.8 126.9
Hue-encoded VP80/avi 29.6 7.4 19.4 0.8 5.9 138.8
Hue-encoded VP90/avi 29.1 13.2 5.3 0.9 21.8 128.6
Hue-encoded mp4v/mp4 29.1 6.3 0.9 0.5 128.6 232.1
Hue-encoded avc1/mp4 28.0 21.7 0.9 0.9 134.7 130.9
Hue-encoded vp09/mp4 29.1 13.6 6.1 1.1 18.8 107.5
Hue-encoded hvc1/mp4 29.8 5.7 0.6 1.8 190.5 62.5

Here are the video benchmarks when ffmpeg is built with nvcodec:

Encoding mean PSNR CR save (ms) load (ms) save (kB/s) load (kB/s)
Hue-encoded x264/avi 28.0 14.6 0.6 0.6 202.4 178.7
Hue-encoded avc1/mp4 28.0 15.0 0.7 0.7 162.8 169.1
Hue-encoded hvc1/mp4 29.1 9.6 0.5 1.0 220.2 112.5

While the nvcodec codecs offer marginally higher save and load rates, they have lower compression ratios.

How do I use this?

The hue_codec.h file is a header-only library with an external dependency on OpenCV.

You must add the include/hue_codec.h file to your project. You will also need to add OpenCV to your project. How you do this depends on your compiler and setup, but this project includes cross-platform build support using CMake to manage the build and vcpkg to install the dependencies.

Once you have added the hue_codec.h file into your project, Add an include directive into your code like so:

#include "hue_codec.h" 

To use the Hue Codec:

This code expects you to use 16-bit depth maps, meaning your depth data will be encoded as unsigned 16-bit integers. Some known scaling factor will be used to convert depth map values into real-world distances. For Intel Depth Sensors, a scaling factor of 0.001 is used to convert between depth map values and metres.

Initialize the codec as so:

const int min_sensor_depth_m = 0.3f; // Minimum sensor depth in metres const int max_sensor_depth_m = 10.0f; // Maximum sensor depth in metres const float depth_scale = 0.001f; // Depth map 16-bit integer values to metres) bool inverted = false; // Use standard colourization (explained later) HueCodec codec(min_sensor_depth_m, max_sensor_depth_m, scaling_factor, inverted); 

During hue encoding, the entire depth field is reduced to 1530 values in a lossy way. To preserve as much fidelity as possible, initialize HueCodec objects using depth_min and depth_max values that accurately represent the depth range of all your data.

You would then retrieve a 16-bit depth frame from your sensor. To encode that depth frame, you would use the following:

cv::Mat encoded_frame = codec.encode(depth_frame); 

The OpenCV Mat encoded_frame is a standard 3-channel, 8-bit BGR image (OpenCV uses BGR instead of RGB). You can now write the encoded frame with a standard RGB image or video codec and write it to disk. An example of this would be:

cv::imwrite("compressed.jpg", encoded_frame); 

Later, you can read the image or video file from the disk. An example to read the file saved above would be:

cv::Mat encoded_frame = cv::imread("compressed.jpg"); 

And then decode it like so:

cv::Mat decoded_frame = codec.decode(encoded_frame); 

If lossy compression is used, the OpenCV Mat decoded_frame will lose fidelity from compression artifacts.

Flying pixels and median filtering

Compression artifacts in the compressed RGB image can result in a "flying pixel" artifact when decoded. Flying pixels are individual pixels that are much closer to the foreground than they should be. These flying pixels can be cleaned up by postprocessing with a median filter as below.

int kernel_size = 1; float diff_threshold = 0.10; cv::Mat cleaned_depth_frame = median_filter(decoded_frame, kernel_size, diff_threshold); 

The median filter takes two parameters: a kernel size and a difference threshold. The kernel size is the box size in pixels around the current pixel that will be used to calculate the median. If a kernel size of 1 is specified, the median will be calculated with a box that spans from 1 pixel up and left to 1 pixel below and right of the current pixel. The difference threshold is a percentage difference (pixel value - median)/median above which the pixel will be replaced with the median. So a difference threshold of zero will replace all pixels with their local median.

See below for a comparison of median filter results for different kernel sizes and difference thresholds.

threshold kernel size = 1 kernel size = 4
0.00 k1t00 k4t00
0.05 k1t05 k4t05
0.10 k1t10 k4t10

As shown above, the difference threshold will also fill zero-valued pixels with the local median value. More information about flying pixels can be found in the Intel whitepaper.

Inverted colourization

Standard colourization scales values to the encoding value range. The depth measurement error of depth sensors scales quadratically with distance [Intel Sensor Tuning Guide]. To match this, it makes sense to scale the inverse of the depth value to the encoding value range instead. This variation of hue encoding is known as inverse colourization; an example is shown below.

Encoding Type Result
Standard colourization Hue-Encoded Image
Inverse colourization Hue-Encoded Image

Note that the background detail is better preserved in the standard encoding, while foreground detail is better preserved in the inverted encoding. More detail about inverted colourization is provided in the Intel whitepaper.

What is the hue encoding scheme?

The hue encoding method uses a simple, formulaic mapping of values to RGB values in the hue colour space. The mapping covers 1,531 distinct RGB values. Note that a range of 1,531 different values represents about 11 bits of information, so the 16-bit depth values must be scaled to values in the [0-1530] range. You might expect this to incur some data loss, but fortunately, most depth sensors do not have sufficient resolution and range coverage to require all 16 bits of data.

The hue encoding scheme is as follows:

value red green blue description
0 0 0 0 black
1 255 0 0 red
2 - 255 255 v-1 0 red with green ascending
256 255 255 0 red + green = yellow
257 - 510 511-v 255 0 green with red descending
511 0 255 0 green
512 - 765 0 255 v-511 green with blue ascending
766 0 255 255 green + blue = cyan
767 - 1020 0 1021-v 255 blue with green descending
1021 0 0 255 blue
1022 - 1275 v-1021 0 255 blue with red ascending
1276 255 0 255 blue + red = purple
1277 - 1530 255 0 1531-v red with blue descending

A complete mapping of values to RGB values is included in docs/full_mapping.csv.

Comparison to the RealSense encoder and decoder

This implementation's encoding scheme matches the 1531-point encoding scheme described in the Intel whitepaper, adding a zero value mapping to an all-black RGB value as in the RealSense hue encoder.

The errors in the RealSense SDK encoder cause a mean 10.5dB drop in peak signal-to-noise ratio (PSNR) on the included reference sequence relative to this encoder.

For further examination of the errors in the Intel RealSense SDK encoder and reference decoder, see the comparison test. A copy of the output of the comparison test is included in docs/comparison_output.txt.

What is included in this repository?

directory description
data reference depth data for testing
env platform-specific setup scripts
docs documentation
include the hue_codec.h header file
src basic examples
test tests and interactive tools

How do I get the tests and examples working?

If you don't want to run the tests and examples and only want to use this library in your project, see the usage instructions above under How do I use this?.

The examples and tests use hard-coded paths to simplify cross-platform asset handling.

Ubuntu

Ubuntu installation instructions for examples and tests:

Action command
0. Clone the git repo git clone https://github.com/jdtremaine/hue-codec.git
1. Navigate to the env directory cd hue-codec/env/ubuntu_22.04
2. Install platform dependencies sudo ./install_deps.sh
3. Source environment variables source ./set_env.sh
4. Install vcpkg libraries ./vcpkg_install.sh
5. Navigate to the build directory cd ../../build/
6. Run cmake (and wait for vcpkg) cmake -DCMAKE_BUILD_TYPE=Release ..
7. Run make make -j8
8. Navigate to the bin directory cd ../bin
9. Run a test ./interactive

Windows

Windows installation instructions for examples and tests:

Action command
0. Open PowerShell
1. Clone the git repo git clone https://github.com/jdtremaine/hue-codec.git
2. Navigate to the env directory cd hue-codec/env/windows_10
3. Run the dependency installer install_deps.ps1
4. Open the hue-codec folder in Visual Studio or Visual Studio Code
5. Follow the GUI prompts to setup a CMake build

Docker (Ubuntu )

Dockerfile

A Dockerfile that can be used to build an Ubuntu 22.04 instance with CUDA 12.0.1 support is included in env/ubuntu_22.04. Read the comments in the Dockerfile for more information.

Docker images

A docker development image as well as docker binary images for benchmarking are available a on Dockerhub at jdtremaine/hue-codec. Note that each of these images has hue-codec installed in /root/hue-codec with pre-built libraries and binaries. To run a docker benchmark, make sure that the docker host has CUDA support.

To run the benchmarks built with standard codecs (vpx, x264, and x265 codecs with CUDA support), run:

docker run \ -e NVIDIA_DRIVER_CAPABILITIES=all \ -it jdtremaine/hue-codec:bin-vidcodecs /bin/bash -c 'cd /root/hue-codec/bin/;./benchmarks' 

To run the benchmarks built with nvcodec by Nvidia, run:

docker run \ -e NVIDIA_DRIVER_CAPABILITIES=all \ -it jdtremaine/hue-codec:bin-nvicodecs /bin/bash -c 'cd /root/hue-codec/bin/;./benchmarks' 

To run a development environment that includes pre-built libraries and build files, run:

docker run \ -e NVIDIA_DRIVER_CAPABILITIES=all \ -it jdtremaine/hue-codec:dev-vidcodecs \ /bin/bash 

Licence

This project is distributed under the Apache Licence 2.0.

About

Hue Codec is a header-only C++ library that efficiently encodes and decodes 16-bit, single-channel bitmaps (i.e. depth maps, disparity maps, infrared images, or high bit depth grayscale images) to and from 8-bit, 3-channel bitmaps (i.e. RGB images).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors