5

Is std::string supposed to hold a set of characters in Ascii encoding on all platforms and standard compilers?

In other words, can I be sure that my C++ program will get a set of Ascii characters if I do this:

std::string input; std::getline(std::cin, input); 

EDIT:

In more accurate words, I want to make sure that if the user enter "a0" I will get a std::string with two elements. The first one is 97 and the second is 48

5
  • 2
    There's absolutely no guarantee. UTF-8 is a very popular character encoding, and if you type "á0" on such a system your string will contain three elements. Commented Jun 21, 2016 at 20:10
  • @MarkRansom I see.. I will post another question about how can I force or ensure that an Ascii string would be inputed. Thanks Commented Jun 21, 2016 at 20:11
  • "I have a variable std::string xml. Does the compiler or the STL enforce that there is only XML strings inside?" - No. The type is `char' not "XML" or "Unicode". Don't confuse type, format or encoding. There is a valid question in there though: "How can I control the standard IO encoding?" Commented Jun 21, 2016 at 20:28
  • @Fozi yes you are right.. and I have asked this stackoverflow.com/questions/37953843/… Commented Jun 21, 2016 at 20:28
  • @HumamHelfawi: I think the correct behavior is, validate that the input contains only ASCII if that is your precondition, which is easy to do, and fail with a clear error message if the input doesn't meet your conditions. I don't think you can go "back" from unicode characters to ASCII representing the user's key strokes -- that would probably be extremely difficult. If you are really asking how to reconfigure the terminal so that it behaves differently, I think that will also be hard and will be platform dependent. Commented Jun 22, 2016 at 0:51

3 Answers 3

10

No. std::string does not hold "characters"; it holds bytes.

Those bytes could form some human-readable string through encoding as ASCII or EDBCIC or Unicode. They could be a binary encoding storing computer-readable information (e.g. a JPEG image). They could be guidelines from aliens on how to use Stack Overflow for three weeks straight without being downvoted even once. They could be total random white noise.

Your program needs to be made to understand what the data it is reading actually means, and how it is encoded. This shall be part of your task as the programmer.

(It's unfortunate, and nowadays misleading, that char is named char.)

Sign up to request clarification or add additional context in comments.

12 Comments

"without being downvoted " I happy now that there is no guarantee. At least, I may found the guidelines someday
@HumamHelfawi: Assuming you can write a program to decode those the guidelines ;)
Are you saying a char is a btye even if char is a signed type?
@RSahu: Absolutely.
@RSahu: Yes, it does represent a different abstraction than a byte, and that's why it's wrong, because these objects do not have any encoding. Some object of type char is not "abstract" and the type char suggests/implies/requires no encoding. That's why they're not characters! They're just bytes, with some numerical value. Any encoding is purely application-determined. Except literals, I'll grant you.
|
3

No, there is no guarantee that

std::string input; std::getline(std::cin, input); 

will return only ASCII characters. The range of values that be held by a char is not limited to the ASCII characters.

If your platform uses different encoding than ASCII, you'll obviously get a different set of characters.

Even if your platform uses ASCII encoding, if char on the platform is an unsigned type, than it can very easily hold the extended ASCII characters too.

8 Comments

Thanks.. What can I do if I want the input to be treated as Ascii? just a link will help if you do not mind. (I am afraid of looking myself because of the tons of wrong and non mature contexts)
@HumamHelfawi, are you asking how you can prevent non-ASCII characters from being read into input?
In more accurate words, I want to make sure that if the user enter "a0" I will get a string with two elements. The first one is 97 and the second is 48
@HumamHelfawi, I am afraid you'll have to write code to do that if your platform uses a different character encoding. If you platform uses ASCII encoding, you will get that by default.
What a std::string can hold or std::cin can read need not even have anything to do with the "platform's encoding", or with ASCII, or with extended ASCII. Try piping in the result of dd, or a cat someimage.jpg and you'll see :) The correct answer is that std::string has no notion of encoding at all. And neither does std::cin.
|
3

In other words, can I be sure that my C++ program will get a set of Ascii characters if I do this ...

No. std::string is actually a specialization for std::basic_string<>, like
using std::string std::basic_string<char>;:

template< class CharT, class Traits = std::char_traits<CharT>, class Allocator = std::allocator<CharT> > class basic_string; 

and can hold any type of character that is defined with Traits.

In short std::string can contain ASCII character encodings, as well as EBCDIC, or any others. But it should be transparent as how you're using it.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.