I have Unicode string and I want to compare with the following requirements.
Confusable s [1] character should be consider the same character, example: T (LATIN CAPITAL LETTER T U 0054) should be == T (GREEK CAPITAL LETTER TAU U03A4) etc
(* [1] example http://unicode.org/cldr/utility/confusables.jsp?a=TESTt&r=None*)
http://www.unicode.org/Public/security/revision-03/confusablesSummary.txt
I will use the above file in order to make the code, but if there are already any free libraries I would prefer to use it.
I am thinking that the code would create a temporary ustring in which every confusable character would be replaced with the corresponding latin character.
In the real program I will be testing 10x5000x10000 strings containing one word each.
Test program:
std::locale::global(std::locale("")); std::cout.imbue(std::locale()); Glib::ustring s1,s2; s1="TEST"; s2="TΕST"; s1.normalize(Glib::NORMALIZE_NFKD ); s2.normalize(Glib::NORMALIZE_NFKD ); std::cout<<"1->true, 0->false (s1==s2) => "<<(s1==s2)<<"\n"; Test program output:
1->true, 0->false (s1==s2) => 0 Ubuntu locale command Output:
Ubuntu 12.04 64 bit>$ locale LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= Thank you for your time!