2

I need your help.. How to convert unicode characters like this in C++

Thére Àre sôme spëcial charâcters ïn thìs têxt عربى 

to HTML encoding like this ?

Thére Àre sôme spëcial charâcters ïn thìs têxt عربى 

Your help will be greatly appreciated Thank you :)

1
  • Thanks Kevin but this isn't what I want. Commented Aug 5, 2014 at 18:57

1 Answer 1

2

Unless you can find a third-party API to handle this for you, you will likely have to code it yourself manually:

  1. Convert the input string data to codepoint values (ie, to UTF-32).

  2. For each codepoint value:

    a. if it is in the ASCII visual range (U+0009, U+000A, U+000D, and U+0020 through U+007E), store/display the value as-is as an 8bit ASCII character.

    b. otherwise, check if there is an available entity name associated with the codepoint (see this, this, this and this) and if so then store/display that name in &name; format.

    c. otherwise, store/display the codepoint value in &#XXXX; format, where XXXX is the numeric value of the codepoint.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much Remy Lebeau :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.