41

I have a JavaEE project, in which I use message properties files. The encoding of those file is set to UTF-8. In the file I use the german umlauts like ä, ö, ü. The problem is, sometimes those characters are replaced with unicode like \uFFFD\uFFFD, but not for every character. Now, I have a case where ä and ü are both replaced with \uFFFD\uFFFD, but not for every occurring of ä and ü.

The Git diff shows me something like this:

 mail.adresses=E-Mail hinzufügen: -mail.adresses.multiple=E-Mails durch Kommata getrennt hinzufügen. +mail.adresses.multiple=E-Mails durch Kommata getrennt hinzuf\uFFFD\uFFFDgen. mail.title=Einladungs-E-Mail box.preview=Vorschau box.share.text=Sie können jetzt die ausgewählten Bilder mit Ihren Freunden teilen. @@ -6880,7 +6880,7 @@ browser.cancel=Abbrechen browser.selectImage=übernehmen browser.starImage=merken browser.removeImage=Löschen -browser.searchForSimilarImages=ähnliche +browser.searchForSimilarImages=\uFFFD\uFFFDhnliche browser.clear_drop_box=löschen 

Also, there are lines changed, which I have not touched. I don't understand why I get such a behavior. What could be the cause for the above problem?

My system:

  • Antergos / Arch Linux

    • System encoding UTF-8

      Python 3.5.0 (default, Sep 20 2015, 11:28:25) [GCC 5.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getdefaultencoding() 'utf-8' 
  • Eclipse Mars 1

    • Text file encoding UTF-8 ext file encoding
    • Properties file encoding UTF-8 Properties file encoding
  • Tomcat 8
  • Java JDK 8

If I use another Editor like Atom to edit those message properties files, I don't ran into this problem.

I also realized in a case, if I copy the original value browser.searchForSimilarImages=ähnliche from Git diff and replace the wrong value browser.searchForSimilarImages=\uFFFD\uFFFDhnliche in Eclipse with that, then I have the correct umlauts in the message properties file.

11
  • some of the Unicode letters in esponal carries one additional padded character, I would recommend you to use special tools to convert all the letters to escaped string before paste inside the properties file. Otherwise use Java Code new String(value.getBytes("ISO-8859-1"), "UTF-8"); where value is the properties value Commented Jun 30, 2015 at 16:55
  • What special tool do you mean? How should I do new String(value.getBytes("ISO-8859-1"), "UTF-8"); to have it correct in the properties file? Commented Jun 30, 2015 at 17:02
  • Because of the ISO-8859-1 problem I would recommend not use the default properties loader provided by Java. Replace the loading process so that it directly loads everything from UTF-8 files instead: stackoverflow.com/questions/4659929/… Commented Jun 30, 2015 at 17:13
  • My colleagues do not have this problem. I wonder why and what the cause is it. Commented Jun 30, 2015 at 17:25
  • 1
    @BalusC You haven't provided your reasons on why "you think" that its not good, just saying so is not at all sufficient. Commented Nov 22, 2015 at 20:16

8 Answers 8

54

Root cause:

By default ISO 8859-1 character encoding is used for Eclipse properties file (read here), so if the file contains any character beyond ISO 8859-1 then it will not be processed as expected.

Solution 1

If you use Eclipse then you will notice that it implicitly converts the special character into \uXXXX equivalent. Try copying

会意字 / 會意字

into a properties file opened in Eclipse.

EDIT: As per comment from OP

Update the encoding of your Eclipse as shown below. If you set encoding as UTF-32 then even you can see Chinese character, which you cannot see generally.

How to change Encoding of properties file in Eclipse: See this Eclipse Bugzilla bug for more details, which talks about several other possibilities and in the end suggest what I have highlighted below. enter image description here

Chinese characters can be seen in Eclipse after encoding is set properly: enter image description here

Solution 2

If above doesn't work consistently for you (it does work for me and I never see encoding issues) then try this using some Eclipse plugin which handles encoding of properties or other files. For example Eclipse ResourceBundle Editor or Extended Resource-Bundle editor

I would recommend using Eclipse ResourceBundle Editor.

Solution 3

Another possibility to change encoding of file is using Edit --> Set Encoding option. It really matters because it changes the default character set and file encoding. Play around with by changing encoding using Edit --> Set Encoding option and do following Java sysout System.out.println("Default Charset=" + Charset.defaultCharset()); and System.out.println(System.getProperty("file.encoding"));

enter image description here


As an aside: 1

Process the properties file to have content with ISO 8859-1 character encoding by using native2ascii - Native-to-ASCII Converter

What native2ascii does: It converts all the non-ISO 8859-1 character in their equivalent \uXXXX. This is a good tool because you need not to search the \uXXXX equivalent of special character.

Usage for UTF-8: native2ascii -encoding utf8 e:\a.txt e:\b.txt


As an aside: 2

Every computer program whether an IDE, application server, web server, browser, etc. understands only bits, so it need to know how to interpret the bits to make expected sense out of it because depending upon encoding used, same bits can represent different characters. And that's where "Encoding" comes into picture by giving a unique identifier to represent a character so that all computer programs, diverse OS etc. knows exact right way to interpret it.

So, if you have written into a file using some encoding scheme, lets say UTF-8, and then reading using any editor but running with encoding scheme as UTF-8 then you can expect to get correct display.

Please do read my this answer to get more details but from browser-server perspective.

Sign up to request clarification or add additional context in comments.

24 Comments

I do not want to have things like \uXXXX in the properties file. I want to have the correct UTF-8 representation in the file.
@BuZZ-dEE I have edited my answer to address you concern. Chinese is ideographic language, if you can see Chinese character then you can see almost everything. Please let me know if it doesn't help.
Got some solution on this ??
Note that you can set the encoding at the file level as well (via the file's Properties from the Package Explorer or Navigator). Also, in your code be sure to use the load/store methods that take Reader/Writer objects, respectively. That ensures you can specify the encoding when reading the file into your app.
Note: in JAVA9 the UTF-8 is now the default for the properties docs.oracle.com/javase/9/intl/… - but you may have to configure eclipse specifically.
|
4

Add the following arguments to your eclipse.ini file.

-Dclient.encoding.override=UTF-8 -Dfile.encoding=UTF-8 

By default Eclipse uses the encoding format picked up by the Java Virtual Machine (JVM). Also, you can set the file encoding to utf-8.

2 Comments

The JVM uses the system encoding and my system uses UTF-8 and also my properties encoding is set to UTF-8.
I have requested a feature from oracle to remove the default 8859 encoding. No response yet. let's see if they will fix it.
4

Resolved by doing the below changes :

  1. Modified below properties in eclipse.ini and close and start the eclipse applications

    -Dclient.encoding.override=UTF-8 -Dfile.encoding=UTF-8 
  2. Set the encoding to the UTF-8 [Navigation path : Edit -> Set encoding]

Set the encoding to the UTF-8 [Navigation path : Edit -> Set encoding]

Comments

2

Properties Files are expected to be ISO-8859-1 (Latin-1) encoded. Most likely this what eclipse was set to by default as well.

You have to make sure that every tool which is run in the build or whatever disregards the spec and uses UTF-8 instead.

12 Comments

But there also ä, ü and ö in the file, which are not replaced. Why those are not replaced? How should I find setting which cause this problem? Do I need to search all Eclipse settings and also for every Eclipse plugin to find the wrong setting?
My guess is that a tool (maybe a save action?) updates only lines which are somehow touched. But it will get hard to find the culprit.
But there are lines changed, that I have not touched.
\uFFFDis an Java escaped character. Regular ISO-8859-1 encoded files don't use such an escaping. Therefore it must be the editor you use. Make sure you are not using the "Properties File Editor" in Eclipse or a similar external tool.
It changes: since java 9 it is expected to be UTF-8 docs.oracle.com/javase/9/intl/…
|
1

This looks like a mixture of Eclipse and git encoding or rather not-encoding.

Git uses raw bytes and doesn't care about encoding. Using git diff you might get characters like shown here. An example there is R<C3><BC>ckg<C3><A4>ngig # should be "Rückgängig".

As you can see there's two funny bracket things showing per umlaut. And in your editor, there are always two \uFFFD for each umlaut in the lines starting with +.

So I assume that your UTF-8 editor tries to interpret the git notation and fails. This in turn leads to the representation \uFFFD, which basically meands that this is character whose value is unknown or unrepresentable (see here).

Like suggested in the first link, you can try setting LESSCHARSET=UTF-8 in your environment variable (Windows). Hmm, in Linux it should be in etc/profile ?

1 Comment

I used set LESSCHARSET UTF-8 in the FISH shell and after that I had also \uFFFD\uFFFD instead of correct sign.
0

see: a marker such as FFFD (REPLACEMENT CHARACTER) in http://unicode.org/faq/utf_bom.html

and see native2ascii --help

 -encoding encoding_name Specifies the name of the character encoding to be used by the conversion procedure. If this option is not present, then the default character encoding (as determined by the java.nio.charset.Charset.defaultCharset method) is used. The encoding_name string must be the name of a character encoding that is supported by the JRE. See Supported Encodings at http://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html 

a case

$ file yourfile.properties yourfile.properties : ISO-8859 text, with very long lines $ native2ascii -encoding ISO-8859-1 yourfile.properties yourfile.properties 

Comments

0

You could solve that issue by changing your Region settings if you're using Windows 11. Don't know if this works on earlier versions.

Take a look a this full detailed answer

1 Comment

I use Arch Linux.
0

Same problem (or very close):

Switching to Eclipse 2023-09 (4.29.0) from some early version I have found that my property files (encoded with UTF-8) are always treated as ISO-8859-1 and it cannot be changed via Preferences/General/Content Types, Java Properties File. This entry is marked "Locked" and whatever encoding I put is overwritten with ISO-8859-1.

It can be fixed by creating an override preference file as described here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=68270#c9 :

Create a file in the workspace settings folder named

.metadata\.plugins\org.eclipse.core.runtime\.settings\org.eclipse.core.runtime.prefs

with a content:

content-types/org.eclipse.jdt.core.javaProperties/charset=UTF-8

and restart Eclipse for the changes to take effect. From now on all property files will be considered UTF-8 (at least all my *.properties files are).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.