0

I am troubled by this typical problem with special characters.

We have an mbean running in production tomcat server (installed on Linux) which picks up xml feeds and sends for further processing. The problem crops up when the mbean has to process special characters which are replaced by '??' marks. The same code is available in the local dev and QA servers which works fine though the OS version, the tomcat version are all same. The part of code which reads the xml feed and send to a JMS Q is pasted below:

StringBuffer article = new StringBuffer(); InputStreamReader is = new InputStreamReader(new FileInputStream(pendingFile), "utf-8"); int data; while ((data = is.read()) != -1) { article.append((char)data); } is.close(); is = null; log.debug("Read in \n" + article.toString()); try { js.writeTextMessage(article.toString(), "server", hostName, processor); } catch (JMSException je) { log.error("jms exception: " + je.getMessage()); // server probably shutdown this.stop(); return; } 

The above code reads the files from "pending file" , appends it to Stringbuffer, reads the file to a log and posts to JMS queue. The log file displays the special charas as ?? 'Only in Prod' The Xml feed with special characters is as below:

<?xml version="1.0" encoding="UTF-8"?> <hedline> <hl1> Hotelliyöpymiset: Missä hinta ja palvelu vastaavat toisiaan (tai eivät) - asiakastyytyväisyyden huippukaupungit </hl1> </hedline>* 

We tried all the possibilites which include:

  1. URI encoding to utf-8 in server.xml for tomcat.
  2. verified the LANG environment variable is en_US.UTF-8 on linux.
  3. verified that the xml file has default encoding as UTF8 without BOM.

We are unable to find whether the cause is with Tomcat server or Linux OS. Please help.

2 Answers 2

1

Don't log the article string just as text. Dump each character out as a hex integer. That way you can tell whether it's the logging which is failing, or the reading which is failing.

It's not clear to me what the JMS queue's behaviour is - is it only the logging which is failing, or the JMS as well?

Sign up to request clarification or add additional context in comments.

Comments

0

When you are logging via Log4j for example with a FileAppender, you can set the encoding of the logfile:

<appender name="SOME_LOG" class="org.apache.log4j.RollingFileAppender"> <param name="Encoding" value="UTF-8" /> 

Additionally, there must an appropriate charset installed for displaying the chars correctly.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.