Skip to main content
added 610 characters in body
Source Link
selbie
  • 105.4k
  • 15
  • 109
  • 187

This is a bug in Notepad.

The core of the problem is that your text file is getting misinterpreted by Notepad as UTF-16 instead of as ascii.

enter image description here

enter image description here

Notepad has an algorithm that attempts to infer what type of encoding a file uses. There's a The actual Windows API that it usesinvokes is called, - which escapes me at the momentIsTextUnicode. But itThat API basically checks for a byte-ordering-mark (BOM) header. In the absence of a BOM, it reads the first few hundred bytes of text and does some inference and heuristics to detect if the file is ascii, utf-8, or unicode. It's essentially guessing.

TheIn you case, the first byte of the file is 0x7E or ~. If I delete that in a hex editor and save it back, Notepad shows the correct file encoding and looks like this.

  
enter image description here

Something about that initial byte sequence of upper ascii characters throws the algorithm off.

So the correct fix would be to insert a Byte Order Mark into your text stream so Notepad won't try to infer the encoding based on heuristics.

file.open("display.txt", std::ios::out); const char* utf7_bom = "+/v8"; file << utf7_bom; 

That little 4-byte sequence tells text decoders, which most editors will recognize, that "this file is ascii". You can read more about BOM tokens here: https://en.wikipedia.org/wiki/Byte_order_mark

This is a bug in Notepad.

The core of the problem is that your text file is getting misinterpreted by Notepad as UTF-16 instead of as ascii.

enter image description here

enter image description here

Notepad has an algorithm that attempts to infer what type of encoding a file uses. There's a Windows API that it uses - which escapes me at the moment. But it basically checks for a byte-ordering-mark (BOM) header. In the absence of a BOM, it reads the first few hundred bytes of text and does some inference and heuristics to detect if the file is ascii, utf-8, or unicode.

The first byte of the file is 0x7E or ~. If I delete that in a hex editor and save it back, Notepad shows the correct file encoding and looks like this.

 enter image description here

So the correct fix would be to insert a Byte Order Mark into your text stream so Notepad won't try to infer the encoding based on heuristics.

file.open("display.txt", std::ios::out); const char* utf7_bom = "+/v8"; file << utf7_bom; 

That little 4-byte sequence tells text decoders, which most editors will recognize, that "this file is ascii". You can read more about BOM tokens here: https://en.wikipedia.org/wiki/Byte_order_mark

This is a bug in Notepad.

The core of the problem is that your text file is getting misinterpreted by Notepad as UTF-16 instead of as ascii.

enter image description here

enter image description here

Notepad has an algorithm that attempts to infer what type of encoding a file uses. The actual Windows API it invokes is called, IsTextUnicode. That API basically checks for a byte-ordering-mark (BOM) header. In the absence of a BOM, it reads the first few hundred bytes of text and does some inference and heuristics to detect if the file is ascii, utf-8, or unicode. It's essentially guessing.

In you case, the first byte of the file is 0x7E or ~. If I delete that in a hex editor and save it back, Notepad shows the correct file encoding and looks like this. 
enter image description here

Something about that initial byte sequence of upper ascii characters throws the algorithm off.

So the correct fix would be to insert a Byte Order Mark into your text stream so Notepad won't try to infer the encoding based on heuristics.

file.open("display.txt", std::ios::out); const char* utf7_bom = "+/v8"; file << utf7_bom; 

That little 4-byte sequence tells text decoders, which most editors will recognize, that "this file is ascii". You can read more about BOM tokens here: https://en.wikipedia.org/wiki/Byte_order_mark

added 1007 characters in body
Source Link
selbie
  • 105.4k
  • 15
  • 109
  • 187

InsteadThis is a bug in Notepad.

The core of the problem is that your text file is getting misinterpreted by Notepad as UTF-16 instead of as ascii.

enter image description here

enter image description here

Notepad has an algorithm that attempts to infer what type of encoding a file uses. There's a Windows API that it uses - which escapes me at the moment. But it basically checks for a byte-ordering-mark (BOM) header. In the absence of a BOM, it reads the first few hundred bytes of text and does some inference and heuristics to detect if the file is ascii, utf-8, or unicode.

The first byte of the file is 0x7E or ~. If I delete that in a hex editor and save it back, Notepad shows the correct file encoding and looks like this:.

for (int val = 0; val <= 255; val += 1) file << std::hex << digits[(val / 10) % 10] << " "; 

enter image description here

This:So the correct fix would be to insert a Byte Order Mark into your text stream so Notepad won't try to infer the encoding based on heuristics.

for file.open(int val = 0; val <= 255; val += 1) { "display.txt", std::stringstream ss; ss << stdios::hex << digits[(val / 10) % 10]; file << ss.str(out);   const char* utf7_bom = "+/v8"; file << " "; }utf7_bom; 

That little 4-byte sequence tells text decoders, which most editors will recognize, that "this file is ascii". You can read more about BOM tokens here: https://en.wikipedia.org/wiki/Byte_order_mark

Instead of this:

for (int val = 0; val <= 255; val += 1) file << std::hex << digits[(val / 10) % 10] << " "; 

This:

for (int val = 0; val <= 255; val += 1) {  std::stringstream ss; ss << std::hex << digits[(val / 10) % 10]; file << ss.str(); file << " "; } 

This is a bug in Notepad.

The core of the problem is that your text file is getting misinterpreted by Notepad as UTF-16 instead of as ascii.

enter image description here

enter image description here

Notepad has an algorithm that attempts to infer what type of encoding a file uses. There's a Windows API that it uses - which escapes me at the moment. But it basically checks for a byte-ordering-mark (BOM) header. In the absence of a BOM, it reads the first few hundred bytes of text and does some inference and heuristics to detect if the file is ascii, utf-8, or unicode.

The first byte of the file is 0x7E or ~. If I delete that in a hex editor and save it back, Notepad shows the correct file encoding and looks like this.

enter image description here

So the correct fix would be to insert a Byte Order Mark into your text stream so Notepad won't try to infer the encoding based on heuristics.

file.open("display.txt", std::ios::out);   const char* utf7_bom = "+/v8"; file << utf7_bom; 

That little 4-byte sequence tells text decoders, which most editors will recognize, that "this file is ascii". You can read more about BOM tokens here: https://en.wikipedia.org/wiki/Byte_order_mark

Source Link
selbie
  • 105.4k
  • 15
  • 109
  • 187

Instead of this:

for (int val = 0; val <= 255; val += 1) file << std::hex << digits[(val / 10) % 10] << " "; 

This:

for (int val = 0; val <= 255; val += 1) { std::stringstream ss; ss << std::hex << digits[(val / 10) % 10]; file << ss.str(); file << " "; }