OK, following on from the comments above, I think it's highly likely that the input string is in UTF-8 (after all, in an HTML context, what else would it be?).
On that basis, I humbly submit this:
#include <string> #include <codecvt> #include <locale> std::string narrow (const std::wstring& ws) { std::wstring_convert <std::codecvt_utf8 <wchar_t>, wchar_t> convert; return convert.to_bytes (ws); } std::wstring widen (const std::string& s) { std::wstring_convert <std::codecvt_utf8 <wchar_t>, wchar_t> convert; return convert.from_bytes (s); } std::string detect_Unicode (const std::string& s) { std::wstring ws = widen (s); if (ws.empty() || ws.find_first_not_of (L" \t\n\r\f\v\u00A0\u00C2\u00E2\u20AC\u2039") != std::wstring::npos) return " "; return s; } #include <iostream> int main () { std::cout << narrow (L"\u00A0 \u00C2 \u00E2 \u20AC \u2039\n\n"); std::cout << "0.\t\"" << detect_Unicode (u8"abcde") << "\"\n"; std::cout << "1.\t\"" << detect_Unicode (u8" ​ ​ ") << "\"\n"; std::cout << "2.\t\"" << detect_Unicode (u8"are   there is something    ​ combination ​") << "\"\n"; std::cout << "3.\t\"" << detect_Unicode (u8"   ") << "\"\n"; std::cout << "4.\t\"" << detect_Unicode (u8"​   ​") << "\"\n"; std::cout << "5.\t\"" << detect_Unicode (u8"  â â") << "\"\n"; }
Output:
 ⠀ ‹ 0. " " 1. " ​ ​ " 2. " " 3. "   " 4. "​   ​" 5. "  â â"
Now this is not the output the OP expects, but I think that's simply because the logic (as opposed to the implementation) of detect_Unicode() looks flawed. The point here is that converting the input string to a wide string means that you can use standard basic_string operations on it reliably, because there are no multibyte issues now.
An alternative, slightly radical, implementation of detect_Unicode() might be:
for (auto wide_char : ws) { if (wide_char > 0xff) return " "; } return s;
But really, now you have a wide string to hand in detect_Unicode, anything is possible, so go wild OP.
Other notes:
std::codecvt is deprecated in C++17, but since there is no other obvious choice you might as well run with it. You can always change the implementations of narrow and widen if it comes to it. - Depending on platform,
std::wstring might not be the best choice but it's probably fine. You could also look at std::u16string and std::u32string.
Live demo.
Inspiration taken from here.
std::wstringstd::stringdoesn't contain unicode character but "encoded" byte (possibly utf-8). so for multibyte character, you have to usestd::searchinstead offind_first_not_of.wcharis not guarantied to be 2, even in that case, unicode might need severalwchars.std::searchwithstring