3

I have string input. I want to check all the characters and prompt the user if there is any Unicode character in input string.

How can I do this validation in C++.

eg. In Notepad if you enter any Unicode character and try to save it with ANSI Encoding, it will prompt about Unicode character. I want to do similar validation.

4
  • You need to specify how you are storing the data in the string - is it a std::string with UTF-8 or a std::wstring with UTF-16? You also are probably asking whether the character cannot be represented in 7-bit ASCII ( or possibly 8-bit ASCII plus a code page ), as all ASCII characters also have Unicode codepoints. Commented Apr 15, 2014 at 11:53
  • If you think it's possible you could have a string that doesn't have Unicode characters in it, you almost certainly need to read joelonsoftware.com/articles/Unicode.html Commented Apr 15, 2014 at 12:09
  • I am using LPTSTR in VC++.And you got my point correctly. I am want to detect the character which can't be represented by 8 Bit ASCII or extended ASCII. Commented Apr 15, 2014 at 12:23
  • 1
    There's no such thing as 8 bits ASCII, and there are hundreds of extensions to ASCII. One such extension is UTF-8, and it supports all Unicode characters. Commented Apr 15, 2014 at 13:02

3 Answers 3

3

You can use IsTextUnicode function. That's the function notepad uses as far as I know.

MSDN-Link: http://msdn.microsoft.com/en-us/library/windows/desktop/dd318672%28v=vs.85%29.aspx

Just insert NULL as the last parameter.

#include <string> #include <Windows.h> int main() { std::string s = "Hallo!"; std::wstring ws = L"Hello!"; if (::IsTextUnicode(ws.c_str(), ws.length(), NULL) == 1) { // is unicode int i = 0; } else { // no unicode int i = 1; } return 0; } 
Sign up to request clarification or add additional context in comments.

3 Comments

Can you please share the link or any sample as I am confused with last parameter.
Here's a good tutorial about writing a text editor. He'a also covers encoding and how to handle this. catch22.net/tuts/neatpad
Thanks for sample code!!WideCharToMultiByte() has solved the exact requirement.
1

What Notepad warns you about is slightly different: It warns you about Unicode characters that cannot be converted to the desired code page. IOW, WideCharToMultiByte(CP_ACP, ..., &lpUsedDefaultChar) causes lpUsedDefaultChar to be set to TRUE.

Substitute CP_ACP for the encoding you want, except CP_UTF8 which makes no sense. UTF8 supports all Unicode characters.

Comments

1

An easy way is to allow Unicode and store the text as UTF-8. As UTF-8 is a superset of ASCII it's very easy to find characters which are not ASCII (they have the high bit set).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.