I am troubled by this typical problem with special characters.
We have an mbean running in production tomcat server (installed on Linux) which picks up xml feeds and sends for further processing. The problem crops up when the mbean has to process special characters which are replaced by '??' marks. The same code is available in the local dev and QA servers which works fine though the OS version, the tomcat version are all same. The part of code which reads the xml feed and send to a JMS Q is pasted below:
StringBuffer article = new StringBuffer(); InputStreamReader is = new InputStreamReader(new FileInputStream(pendingFile), "utf-8"); int data; while ((data = is.read()) != -1) { article.append((char)data); } is.close(); is = null; log.debug("Read in \n" + article.toString()); try { js.writeTextMessage(article.toString(), "server", hostName, processor); } catch (JMSException je) { log.error("jms exception: " + je.getMessage()); // server probably shutdown this.stop(); return; } The above code reads the files from "pending file" , appends it to Stringbuffer, reads the file to a log and posts to JMS queue. The log file displays the special charas as ?? 'Only in Prod' The Xml feed with special characters is as below:
<?xml version="1.0" encoding="UTF-8"?> <hedline> <hl1> Hotelliyöpymiset: Missä hinta ja palvelu vastaavat toisiaan (tai eivät) - asiakastyytyväisyyden huippukaupungit </hl1> </hedline>* We tried all the possibilites which include:
- URI encoding to utf-8 in server.xml for tomcat.
- verified the LANG environment variable is en_US.UTF-8 on linux.
- verified that the xml file has default encoding as UTF8 without BOM.
We are unable to find whether the cause is with Tomcat server or Linux OS. Please help.