Revisions to Writing hexadecimal values to text file, showing up as binary [duplicate]

added 610 characters in body

edited Mar 7, 2020 at 6:49

105.4k
15
109
187

This is a bug in Notepad.

The core of the problem is that your text file is getting misinterpreted by Notepad as UTF-16 instead of as ascii.

Notepad has an algorithm that attempts to infer what type of encoding a file uses. There's a The actual Windows API that it usesinvokes is called, - which escapes me at the momentIsTextUnicode. But itThat API basically checks for a byte-ordering-mark (BOM) header. In the absence of a BOM, it reads the first few hundred bytes of text and does some inference and heuristics to detect if the file is ascii, utf-8, or unicode. It's essentially guessing.

TheIn you case, the first byte of the file is 0x7E or ~. If I delete that in a hex editor and save it back, Notepad shows the correct file encoding and looks like this.

Something about that initial byte sequence of upper ascii characters throws the algorithm off.

So the correct fix would be to insert a Byte Order Mark into your text stream so Notepad won't try to infer the encoding based on heuristics.

file.open("display.txt", std::ios::out); const char* utf7_bom = "+/v8"; file << utf7_bom;

That little 4-byte sequence tells text decoders, which most editors will recognize, that "this file is ascii". You can read more about BOM tokens here: https://en.wikipedia.org/wiki/Byte_order_mark

This is a bug in Notepad.

The core of the problem is that your text file is getting misinterpreted by Notepad as UTF-16 instead of as ascii.

Notepad has an algorithm that attempts to infer what type of encoding a file uses. There's a Windows API that it uses - which escapes me at the moment. But it basically checks for a byte-ordering-mark (BOM) header. In the absence of a BOM, it reads the first few hundred bytes of text and does some inference and heuristics to detect if the file is ascii, utf-8, or unicode.

The first byte of the file is 0x7E or ~. If I delete that in a hex editor and save it back, Notepad shows the correct file encoding and looks like this.

So the correct fix would be to insert a Byte Order Mark into your text stream so Notepad won't try to infer the encoding based on heuristics.

file.open("display.txt", std::ios::out); const char* utf7_bom = "+/v8"; file << utf7_bom;

That little 4-byte sequence tells text decoders, which most editors will recognize, that "this file is ascii". You can read more about BOM tokens here: https://en.wikipedia.org/wiki/Byte_order_mark

This is a bug in Notepad.

The core of the problem is that your text file is getting misinterpreted by Notepad as UTF-16 instead of as ascii.

Notepad has an algorithm that attempts to infer what type of encoding a file uses. The actual Windows API it invokes is called, IsTextUnicode. That API basically checks for a byte-ordering-mark (BOM) header. In the absence of a BOM, it reads the first few hundred bytes of text and does some inference and heuristics to detect if the file is ascii, utf-8, or unicode. It's essentially guessing.

In you case, the first byte of the file is 0x7E or ~. If I delete that in a hex editor and save it back, Notepad shows the correct file encoding and looks like this.

Something about that initial byte sequence of upper ascii characters throws the algorithm off.

So the correct fix would be to insert a Byte Order Mark into your text stream so Notepad won't try to infer the encoding based on heuristics.

file.open("display.txt", std::ios::out); const char* utf7_bom = "+/v8"; file << utf7_bom;

That little 4-byte sequence tells text decoders, which most editors will recognize, that "this file is ascii". You can read more about BOM tokens here: https://en.wikipedia.org/wiki/Byte_order_mark

added 1007 characters in body

Source Link

edited Mar 7, 2020 at 6:34

selbie

105.4k
15
109
187

InsteadThis is a bug in Notepad.

The core of the problem is that your text file is getting misinterpreted by Notepad as UTF-16 instead of as ascii.

Notepad has an algorithm that attempts to infer what type of encoding a file uses. There's a Windows API that it uses - which escapes me at the moment. But it basically checks for a byte-ordering-mark (BOM) header. In the absence of a BOM, it reads the first few hundred bytes of text and does some inference and heuristics to detect if the file is ascii, utf-8, or unicode.

The first byte of the file is 0x7E or ~. If I delete that in a hex editor and save it back, Notepad shows the correct file encoding and looks like this:.

for (int val = 0; val <= 255; val += 1) file << std::hex << digits[(val / 10) % 10] << " ";

This:So the correct fix would be to insert a Byte Order Mark into your text stream so Notepad won't try to infer the encoding based on heuristics.

for file.open(int val = 0; val <= 255; val += 1) { "display.txt", std::stringstream ss; ss << stdios::hex << digits[(val / 10) % 10]; file << ss.str(out);   const char* utf7_bom = "+/v8"; file << " "; }utf7_bom;

That little 4-byte sequence tells text decoders, which most editors will recognize, that "this file is ascii". You can read more about BOM tokens here: https://en.wikipedia.org/wiki/Byte_order_mark

Instead of this:

for (int val = 0; val <= 255; val += 1) file << std::hex << digits[(val / 10) % 10] << " ";

This:

for (int val = 0; val <= 255; val += 1) {  std::stringstream ss; ss << std::hex << digits[(val / 10) % 10]; file << ss.str(); file << " "; }

This is a bug in Notepad.

The core of the problem is that your text file is getting misinterpreted by Notepad as UTF-16 instead of as ascii.

Notepad has an algorithm that attempts to infer what type of encoding a file uses. There's a Windows API that it uses - which escapes me at the moment. But it basically checks for a byte-ordering-mark (BOM) header. In the absence of a BOM, it reads the first few hundred bytes of text and does some inference and heuristics to detect if the file is ascii, utf-8, or unicode.

The first byte of the file is 0x7E or ~. If I delete that in a hex editor and save it back, Notepad shows the correct file encoding and looks like this.

So the correct fix would be to insert a Byte Order Mark into your text stream so Notepad won't try to infer the encoding based on heuristics.

file.open("display.txt", std::ios::out);   const char* utf7_bom = "+/v8"; file << utf7_bom;

That little 4-byte sequence tells text decoders, which most editors will recognize, that "this file is ascii". You can read more about BOM tokens here: https://en.wikipedia.org/wiki/Byte_order_mark

Source Link

answered Mar 7, 2020 at 6:14

selbie

105.4k
15
109
187

Instead of this:

for (int val = 0; val <= 255; val += 1) file << std::hex << digits[(val / 10) % 10] << " ";

This:

for (int val = 0; val <= 255; val += 1) { std::stringstream ss; ss << std::hex << digits[(val / 10) % 10]; file << ss.str(); file << " "; }

Collectives™ on Stack Overflow

Return to Answer