4

Please help me to read the UNICODE characters as it is from the properties file in java. For example : if I pass the key "Account.label.register" it should return to me as "\u5BC4\u5B58\u5668" but not its character representation like "寄存器" . Here is my sample properties file

file_ch.properties

Account.label.register = \u5BC4\u5B58\u5668 Account.label.login = \u767B\u5F55 Account.label.username = \u7528\u6237\u540D Account.label.password = \u5BC6\u7801 

Thank you.

Hi , I am reading properties file using the following java code

@Override public ResourceBundle getTexts(String bundleName) { ResourceBundle myResources = null; try { myResources = ResourceBundle.getBundle(bundleName, getLocale()); } catch (Exception e) { myResources = ResourceBundle.getBundle(getDefaultBundleKey(), getLocale()); } return myResources; } 

Using the above approach it's ok fine, I am getting chinese characters. But for some of the ajax requests in my application I need to pass the chinese text in X-JSON header. Sample code is given below

 HashMap<String, List<String>> map = new HashMap<String, List<String>>(); List<String> errors = new ArrayList<String>(); errors.add(str); /*ex: str = "无效的代码" , value taken from properties file through resource bundle*/ map.put("ERROR", errors); JSONObject json = JSONObject.fromObject(map); response.setCharacterEncoding("UTF-8"); response.setHeader("X-JSON", json.toString()); response.setStatus(500); 

I am passing english for example str="Invalid Code" X-JSON header is carrying the information as it is. But if the str="无效的代码" (chinese or any other text) X-JSON header is carrying the text as empty like below is the response I am getting

 response : connection:close Content-Encoding:gzip Content-Type:text/html;charset=UTF-8 Date:Wed, 08 Jun 2016 10:17:43 GMT Server:Apache-Coyote/1.1 Transfer-Encoding:chunked Vary:Accept-Encoding X-JSON:{"ERROR":["Invalid Code"]} 

However if the "error" contains "chinese" text for ex:"无效的代码"

response :

 connection:close Content-Encoding:gzip Content-Type:text/html;charset=UTF-8 Date:Wed, 08 Jun 2016 10:17:43 GMT Server:Apache-Coyote/1.1 Transfer-Encoding:chunked Vary:Accept-Encoding **X-JSON:{"ERROR":[" "]}** /*expecting the response X-JSON:{"ERROR":["无效的代码"]}*/ 

As the chinese text is coming as empty , I thought of sending unicode through X-JSON header like below

{"ERROR":["\u65E0\u6548\u7684\u4EE3\u7801"]} 

After that want to parse the Unicode characters using Javascript code after evaluating X-JSON header like below

var json; try { json = xhr.getResponseHeader('X-Json'); } catch (e) { alert(e); } if (json) { var data = eval('(' + json + ')'); decodeMsg(data); } function decodeMsg(message) { var mssg = message; var r = /\\u([\d\w]{4})/gi; mssg = mssg.replace(r, function (match, grp) { return String.fromCharCode(parseInt(grp, 16)); } ); mssg = unescape(mssg); return mssg; } 

Please give suggestions. Thank you.

3
  • 2
    Please show what have you tried so far. Commented Aug 17, 2016 at 7:07
  • 2
    Why do you want the unicode escapes? Commented Aug 17, 2016 at 7:08
  • Thank you for the immediate response. Please see my edited post representing some detailed explanation regarding the need for reading Unicode characters from the properties file. Commented Aug 17, 2016 at 7:38

2 Answers 2

3

Update of answer:

The original encoding of .properties was in Latin-1, ISO-8859-1 (éö). This needed u-escaping for the full Unicode range of characters.

However the newer java versions try UTF-8 first. So you can keep the .properties file in UTF-8! Which is a tremendous improvement.


Original answer: .properties in ISO-8859-1 as of java 1.

The error is that in HTTP the header lines are in ISO-8859-1, basic Latin-1. The solution there is to use %XX conversion of UTF-8 bytes (in this case). However you are better served in case of JSON simply doing as you intended.

So you want to send u-escaped Unicode, using \uXXXX. As not only Java, but also JavaScript/JSON knows this convention, you only need this u-escaping in java on the server.

static String uescape(String s) { StringBuilder sb = new StringBuilder(s.length() * 6); for (int i = 0; i < chars.length; ++i) { char ch = s.charAt(i); if (ch < 128) { sb.append(ch); } else { sb.append(String.format("\\u%04X", (int) ch)); } } return sb.toString(); } errors.add(uescape(str)); 

This zero-pads every non-ASCII (>=128) char as 4 digit hex, the exact format.

Or use apache-commons StringEscapeUtils.escapeJava which also does quotes and \n and such - much safer.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank You Very much Mr. Joop Eggen Sir. It worked for me. Struggling for this since so many days. It saved me a lot in completing my task. I was so thankful to you sir. I also got clear explanation from you Sir.
1

Escape the backslashes in your properties file by doubling them:

Account.label.register = \\u5BC4\\u5B58\\u5668 Account.label.login = \\u767B\\u5F55 Account.label.username = \\u7528\\u6237\\u540D Account.label.password = \\u5BC6\\u7801 

1 Comment

Hi, I will accept your answer but double escaping them will not translate the text when coding in jsp file like <s:text name="Account.label.register"/>. I need to read Unicode characters only in few cases. Please see my edited post regarding the need for Unicode characters. Please give suggestions if possible. Thank you.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.