5

I came across this syntax for reading a BMP file in C++

#include <fstream> int main() { std::ifstream in('filename.bmp', std::ifstream::binary); in.seekg(0, in.end); size = in.tellg(); in.seekg(0); unsigned char * data = new unsigned char[size]; in.read((unsigned char *)data, size); int width = *(int*)&data[18]; // omitted remainder for minimal example } 

and I don't understand what the line

int width = *(int*)&data[18]; 

is actually doing. Why doesn't a simple cast from unsigned char * to int, int width = (int)data[18];, work?

13
  • It is taking the memory address of data[18], treating it as a pointer to an integer, then dereferencing it. Basically, treating it as a number. This seems like UB though, since data is only size 1 Commented Dec 5, 2019 at 0:00
  • 2
    What is *(int)&data[18] actually doing in this code?* Violating the Strict Aliasing Rule, so it could be doing absolutely anything. Commented Dec 5, 2019 at 0:10
  • 2
    @WilliamMiller, unless I'm misreading, data is allocated as an array of 1 unsigned char. I think it should have been new unsigned char[size] Commented Dec 5, 2019 at 0:11
  • 1
    @ChrisMM that was a typo, thanks for pointing it out Commented Dec 5, 2019 at 0:14
  • 1
    Yes, but *(int*)&data[18] will also fail on CPUs that require a 32 bit number to be aligned to a 32 bit address (Some CPUs will allow mis-aligned data, but access it much more slowly). Assuming that data is aligned to whatever size data the CPU prefers (usually 32 or 64 bits) data[18] will not be because 18 is not evenly divisible by 4 (32 bits in bytes). It will also fail if the CPU is big endian and the byte order is backwards. Commented Dec 5, 2019 at 0:47

1 Answer 1

7

Note

As @user4581301 indicated in the comments, this depends on the implementation and will fail in many instances. And as @NathanOliver- Reinstate Monica and @ChrisMM pointed out this is Undefined Behavior and the result is not guaranteed.

According to the bitmap header format, the width of the bitmap in pixels is stored as a signed 32-bit integer beginning at byte offset 18. The syntax

int width = *(int*)&data[18]; 

reads bytes 19 through 22, inclusive (assuming a 32-bit int) and interprets the result as an integer.

How?

  • &data[18] gets the address of the unsigned char at index 18
  • (int*) casts the address from unsigned char* to int* to avoid loss of precision on 64 bit architectures
  • *(int*) dereferences the address to get the referred int value

So basically, it takes the address of data[18] and reads the bytes at that address as if they were an integer.

Why doesn't a simple cast to `int` work?

sizeof(data[18]) is 1, because unsigned char is one byte (0-255) but sizeof(&data[18]) is 4 if the system is 32-bit and 8 if it is 64-bit, this can be larger (or even smaller for 16-bit systems) but with the exception of 16-bit systems it should be at minimum 4 bytes. Obviously reading more than 4 bytes is not desired in this case, and the cast to (int*) and subsequent dereference to int yields 4 bytes, and indeed the 4 bytes between offsets 18 and 21, inclusive. A simple cast from unsigned char to int will also yield 4 bytes, but only one byte of the information from data. This is illustrated by the following example:

#include <iostream> #include <bitset> int main() { // Populate 18-21 with a recognizable pattern for demonstration std::bitset<8> _bits(std::string("10011010")); unsigned long bits = _bits.to_ulong(); for (int ii = 18; ii < 22; ii ++) { data[ii] = static_cast<unsigned char>(bits); } std::cout << "data[18] -> 1 byte " << std::bitset<32>(data[18]) << std::endl; std::cout << "*(unsigned short*)&data[18] -> 2 bytes " << std::bitset<32>(*(unsigned short*)&data[18]) << std::endl; std::cout << "*(int*)&data[18] -> 4 bytes " << std::bitset<32>(*(int*)&data[18]) << std::endl; } 
data[18] -> 1 byte 00000000000000000000000010011010 *(unsigned short*)&data[18] -> 2 bytes 00000000000000001001101010011010 *(int*)&data[18] -> 4 bytes 10011010100110101001101010011010 
Sign up to request clarification or add additional context in comments.

2 Comments

You should note that this cast is UB. There is no integer there so synthesizing one is illegal.
@NathanOliver-ReinstateMonica Good point, I'm glad others noticed that

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.