-1

I have created a java code that will store an uploaded text document. Then I return the text in that file. All the text are in "sinhala" language. UTF-8 encoded text

 streamReader = new InputStreamReader(new FileInputStream(new File(filePath)), "utf8" /*Here I have tried 'UTF-8', 'utf-8'*/); br = new BufferedReader(streamReader); PrintStream printStream= new PrintStream(f); while ((line = br.readLine()) != null) { ..... } 

The output is directly sent to jsp page, there it's shown as '????????????????????????????????'.

Windows 8.1, tomcat and java version 7. I have tested jsp with sinhala characters, they are working. I have added UTF-8, as content type.

I have tried this one, this one, and this one too.

5
  • I would try setting UTF-8 for the output i.e. PrintStream as well. Commented Aug 18, 2015 at 16:27
  • 1
    Unicode characters certainly are recognized in Java, but your Java program can read, manipulate, or output them incorrectly if you so choose. Also, whatever mechanism you are using to examine the result might do the wrong thing with your program's output. Your code fragment looks a bit suspicious, but you haven't given us a complete example to work with, so we can't say much. Commented Aug 18, 2015 at 16:28
  • 1
    @PeterLawrey, PrintStreams and other OutputStreams don't have an encoding associated with them. That is what I find most suspicious, in fact. OutputStreams are for writing binary data; for character data one should use a Writer. Commented Aug 18, 2015 at 16:32
  • You need to show us the JSP, and to show how the output is sent directly to it. Also make sure that the font used in your web page is able to show those "sinhala" characters. Commented Aug 18, 2015 at 16:35
  • @JohnBollinger I was thinking PrintWriter so your comment helped me come up with an answer, Commented Aug 18, 2015 at 16:37

3 Answers 3

1

The JSP must provide the specified encoding as UTF-8 well as all the InputStream/Writer and OutputStream/Writers having the UTF-8 character set explicitly provided.

<%@ page contentType="text/html; charset=UTF-8" %> 
Sign up to request clarification or add additional context in comments.

Comments

0

To set the encoding for a Writer you can do

PrintWriter out = new PrintWriter(new InputStreamWriter(f, "UTF-8")); 

You can use a PrintWriter instead or a PrintStream as it has the same methods.

1 Comment

and the JSP/HTML headers need to specify UTF-8 as well for this to be a complete solution. <%@ page contentType="text/html; charset=UTF-8" %>
0

You need to ensure the correct encoding of the HTTP response.

If you insert the text in JSP, set the JSP encoding at the top of the .jsp file (see also UTF-8 encoding in JSP page):

<%@ page contentType="text/html; charset=UTF-8" %> ... <c:out value="${myDocumentTextInUnicode}"/> 

If generating the response in a servlet, set the encoding there:

response.setContentType("text/plain"); response.setCharacterEncoding("UTF-8"); PrintWriter out = response.getWriter(); while ((line = br.readLine()) != null) { out.println(line); } 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.