Return to Answer

Spelling Fix and Format

edit approved May 14, 2017 at 15:20

First of all: your title uses/d ANSI, while in the text you refer to ASCII. Please note that ANSI does not equal ASCII. ANSI incorporates the ASCII set. But the ASCII set is limited to the first 128 numeric values (0 - 127).

If all your data is restricted to ASCII (7-bit), it doesn't matter whether you use UTF-8, ANSI or ASCII, as both ANSI and UTF-8 incorperate the full ASCII set. In other words: the numeric values 0 up to and including 127 represent exactly the same characters in ASCII, ANSI and UTF-8.

If you need characters oursideoutside of the ASCII set, you'll need to choose an encoding. You could use ANSI, but then you run into the problems of all the different code pages. Create a file on machine A and read it on machine B may/will produce funny looking texts if these machines are set up to use different code pages, simple because numeric value nnn represents differents characters in these code pages.

This "code page hell" is the reason why the Unicode standardUnicode standard was defined. UTF-8 is but a single encoding of that standard, there are many more. UTF-16 being the most widely used as it is the native encoding for Windows.

So, if you need to support anything beyond the 128 characters of the ASCII set, my advice is to go with UTF-8UTF-8. That way it doesn't matter and you don't have to worry about with which code page your users have set up their systems.

If you need characters ourside of the ASCII set, you'll need to choose an encoding. You could use ANSI, but then you run into the problems of all the different code pages. Create a file on machine A and read it on machine B may/will produce funny looking texts if these machines are set up to use different code pages, simple because numeric value nnn represents differents characters in these code pages.

This "code page hell" is the reason why the Unicode standard was defined. UTF-8 is but a single encoding of that standard, there are many more. UTF-16 being the most widely used as it is the native encoding for Windows.

So, if you need to support anything beyond the 128 characters of the ASCII set, my advice is to go with UTF-8. That way it doesn't matter and you don't have to worry about with which code page your users have set up their systems.

If you need characters outside of the ASCII set, you'll need to choose an encoding. You could use ANSI, but then you run into the problems of all the different code pages. Create a file on machine A and read it on machine B may/will produce funny looking texts if these machines are set up to use different code pages, simple because numeric value nnn represents differents characters in these code pages.

This "code page hell" is the reason why the Unicode standard was defined. UTF-8 is but a single encoding of that standard, there are many more. UTF-16 being the most widely used as it is the native encoding for Windows.

So, if you need to support anything beyond the 128 characters of the ASCII set, my advice is to go with UTF-8. That way it doesn't matter and you don't have to worry about with which code page your users have set up their systems.

Source Link

answered Jul 30, 2011 at 15:21

Marjan Venema

8.2k
3
35
36