Eclipse wrong Java properties UTF-8 encoding

Question

I have a JavaEE project, in which I use message properties files. The encoding of those file is set to UTF-8. In the file I use the german umlauts like ä, ö, ü. The problem is, sometimes those characters are replaced with unicode like \uFFFD\uFFFD, but not for every character. Now, I have a case where ä and ü are both replaced with \uFFFD\uFFFD, but not for every occurring of ä and ü.

The Git diff shows me something like this:

 mail.adresses=E-Mail hinzufügen: -mail.adresses.multiple=E-Mails durch Kommata getrennt hinzufügen. +mail.adresses.multiple=E-Mails durch Kommata getrennt hinzuf\uFFFD\uFFFDgen. mail.title=Einladungs-E-Mail box.preview=Vorschau box.share.text=Sie können jetzt die ausgewählten Bilder mit Ihren Freunden teilen. @@ -6880,7 +6880,7 @@ browser.cancel=Abbrechen browser.selectImage=übernehmen browser.starImage=merken browser.removeImage=Löschen -browser.searchForSimilarImages=ähnliche +browser.searchForSimilarImages=\uFFFD\uFFFDhnliche browser.clear_drop_box=löschen

Also, there are lines changed, which I have not touched. I don't understand why I get such a behavior. What could be the cause for the above problem?

My system:

Antergos / Arch Linux

System encoding UTF-8

Python 3.5.0 (default, Sep 20 2015, 11:28:25) [GCC 5.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getdefaultencoding() 'utf-8'

Eclipse Mars 1
- Text file encoding UTF-8
- Properties file encoding UTF-8
Tomcat 8
Java JDK 8

If I use another Editor like Atom to edit those message properties files, I don't ran into this problem.

I also realized in a case, if I copy the original value browser.searchForSimilarImages=ähnliche from Git diff and replace the wrong value browser.searchForSimilarImages=\uFFFD\uFFFDhnliche in Eclipse with that, then I have the correct umlauts in the message properties file.

some of the Unicode letters in esponal carries one additional padded character, I would recommend you to use special tools to convert all the letters to escaped string before paste inside the properties file. Otherwise use Java Code new String(value.getBytes("ISO-8859-1"), "UTF-8"); where value is the properties value — Dickens A S
– Dickens A S, Commented Jun 30, 2015 at 16:55
What special tool do you mean? How should I do new String(value.getBytes("ISO-8859-1"), "UTF-8"); to have it correct in the properties file? — BuZZ-dEE
– BuZZ-dEE, Commented Jun 30, 2015 at 17:02
Because of the ISO-8859-1 problem I would recommend not use the default properties loader provided by Java. Replace the loading process so that it directly loads everything from UTF-8 files instead: stackoverflow.com/questions/4659929/… — Robert
– Robert, Commented Jun 30, 2015 at 17:13
My colleagues do not have this problem. I wonder why and what the cause is it. — BuZZ-dEE
– BuZZ-dEE, Commented Jun 30, 2015 at 17:25
@BalusC You haven't provided your reasons on why "you think" that its not good, just saying so is not at all sufficient. — hagrawal7777
– hagrawal7777, Commented Nov 22, 2015 at 20:16

Community · Accepted Answer · 2017-05-23 11:53:50Z

Root cause:

By default ISO 8859-1 character encoding is used for Eclipse properties file (read here), so if the file contains any character beyond ISO 8859-1 then it will not be processed as expected.

Solution 1

If you use Eclipse then you will notice that it implicitly converts the special character into \uXXXX equivalent. Try copying

会意字 / 會意字

into a properties file opened in Eclipse.

EDIT: As per comment from OP

Update the encoding of your Eclipse as shown below. If you set encoding as UTF-32 then even you can see Chinese character, which you cannot see generally.

How to change Encoding of properties file in Eclipse: See this Eclipse Bugzilla bug for more details, which talks about several other possibilities and in the end suggest what I have highlighted below. enter image description here

Chinese characters can be seen in Eclipse after encoding is set properly: enter image description here

Solution 2

If above doesn't work consistently for you (it does work for me and I never see encoding issues) then try this using some Eclipse plugin which handles encoding of properties or other files. For example Eclipse ResourceBundle Editor or Extended Resource-Bundle editor

I would recommend using Eclipse ResourceBundle Editor.

Solution 3

Another possibility to change encoding of file is using Edit --> Set Encoding option. It really matters because it changes the default character set and file encoding. Play around with by changing encoding using Edit --> Set Encoding option and do following Java sysout System.out.println("Default Charset=" + Charset.defaultCharset()); and System.out.println(System.getProperty("file.encoding"));

As an aside: 1

Process the properties file to have content with ISO 8859-1 character encoding by using native2ascii - Native-to-ASCII Converter

What native2ascii does: It converts all the non-ISO 8859-1 character in their equivalent \uXXXX. This is a good tool because you need not to search the \uXXXX equivalent of special character.

Usage for UTF-8: native2ascii -encoding utf8 e:\a.txt e:\b.txt

As an aside: 2

Every computer program whether an IDE, application server, web server, browser, etc. understands only bits, so it need to know how to interpret the bits to make expected sense out of it because depending upon encoding used, same bits can represent different characters. And that's where "Encoding" comes into picture by giving a unique identifier to represent a character so that all computer programs, diverse OS etc. knows exact right way to interpret it.

So, if you have written into a file using some encoding scheme, lets say UTF-8, and then reading using any editor but running with encoding scheme as UTF-8 then you can expect to get correct display.

Please do read my this answer to get more details but from browser-server perspective.

I do not want to have things like \uXXXX in the properties file. I want to have the correct UTF-8 representation in the file.
@BuZZ-dEE I have edited my answer to address you concern. Chinese is ideographic language, if you can see Chinese character then you can see almost everything. Please let me know if it doesn't help.
Note that you can set the encoding at the file level as well (via the file's Properties from the Package Explorer or Navigator). Also, in your code be sure to use the load/store methods that take Reader/Writer objects, respectively. That ensures you can specify the encoding when reading the file into your app.
Note: in JAVA9 the UTF-8 is now the default for the properties docs.oracle.com/javase/9/intl/… - but you may have to configure eclipse specifically.

BuZZ-dEE · Accepted Answer · 2015-11-24 09:39:26Z

4

Add the following arguments to your eclipse.ini file.

-Dclient.encoding.override=UTF-8 -Dfile.encoding=UTF-8

By default Eclipse uses the encoding format picked up by the Java Virtual Machine (JVM). Also, you can set the file encoding to utf-8.

edited Nov 24, 2015 at 9:39

BuZZ-dEE

7,15316 gold badges75 silver badges105 bronze badges

answered Nov 23, 2015 at 23:49

user1363516

3685 silver badges16 bronze badges

2 Comments

BuZZ-dEE Over a year ago

The JVM uses the system encoding and my system uses UTF-8 and also my properties encoding is set to UTF-8.

user1363516 Over a year ago

I have requested a feature from oracle to remove the default 8859 encoding. No response yet. let's see if they will fix it.

BuZZ-dEE · Accepted Answer · 2023-11-28 14:31:40Z

Resolved by doing the below changes :

Modified below properties in eclipse.ini and close and start the eclipse applications
```
-Dclient.encoding.override=UTF-8 -Dfile.encoding=UTF-8 
```
Set the encoding to the UTF-8 [Navigation path : Edit -> Set encoding]

tilois · Accepted Answer · 2015-06-30 16:52:43Z

2

Properties Files are expected to be ISO-8859-1 (Latin-1) encoded. Most likely this what eclipse was set to by default as well.

You have to make sure that every tool which is run in the build or whatever disregards the spec and uses UTF-8 instead.

answered Jun 30, 2015 at 16:52

tilois

6826 silver badges15 bronze badges

12 Comments

BuZZ-dEE Over a year ago

But there also ä, ü and ö in the file, which are not replaced. Why those are not replaced? How should I find setting which cause this problem? Do I need to search all Eclipse settings and also for every Eclipse plugin to find the wrong setting?

tilois Over a year ago

My guess is that a tool (maybe a save action?) updates only lines which are somehow touched. But it will get hard to find the culprit.

BuZZ-dEE Over a year ago

But there are lines changed, that I have not touched.

Robert Over a year ago

\uFFFDis an Java escaped character. Regular ISO-8859-1 encoded files don't use such an escaping. Therefore it must be the editor you use. Make sure you are not using the "Properties File Editor" in Eclipse or a similar external tool.

pdem Over a year ago

It changes: since java 9 it is expected to be UTF-8 docs.oracle.com/javase/9/intl/…

|

Calon · Accepted Answer · 2015-11-25 08:49:11Z

This looks like a mixture of Eclipse and git encoding or rather not-encoding.

Git uses raw bytes and doesn't care about encoding. Using git diff you might get characters like shown here. An example there is R<C3><BC>ckg<C3><A4>ngig # should be "Rückgängig".

As you can see there's two funny bracket things showing per umlaut. And in your editor, there are always two \uFFFD for each umlaut in the lines starting with +.

So I assume that your UTF-8 editor tries to interpret the git notation and fails. This in turn leads to the representation \uFFFD, which basically meands that this is character whose value is unknown or unrepresentable (see here).

Like suggested in the first link, you can try setting LESSCHARSET=UTF-8 in your environment variable (Windows). Hmm, in Linux it should be in etc/profile ?

I used set LESSCHARSET UTF-8 in the FISH shell and after that I had also \uFFFD\uFFFD instead of correct € sign.

Bruce Zu · Accepted Answer · 2018-06-05 05:17:31Z

see: a marker such as FFFD (REPLACEMENT CHARACTER) in http://unicode.org/faq/utf_bom.html

and see native2ascii --help

 -encoding encoding_name Specifies the name of the character encoding to be used by the conversion procedure. If this option is not present, then the default character encoding (as determined by the java.nio.charset.Charset.defaultCharset method) is used. The encoding_name string must be the name of a character encoding that is supported by the JRE. See Supported Encodings at http://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html

a case

$ file yourfile.properties yourfile.properties : ISO-8859 text, with very long lines $ native2ascii -encoding ISO-8859-1 yourfile.properties yourfile.properties

Peter Alli Aguilera · Accepted Answer · 2023-03-06 09:06:03Z

You could solve that issue by changing your Region settings if you're using Windows 11. Don't know if this works on earlier versions.

Take a look a this full detailed answer

tequilacat · Accepted Answer · 2024-05-08 22:17:30Z

Same problem (or very close):

Switching to Eclipse 2023-09 (4.29.0) from some early version I have found that my property files (encoded with UTF-8) are always treated as ISO-8859-1 and it cannot be changed via Preferences/General/Content Types, Java Properties File. This entry is marked "Locked" and whatever encoding I put is overwritten with ISO-8859-1.

It can be fixed by creating an override preference file as described here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=68270#c9 :

Create a file in the workspace settings folder named

.metadata\.plugins\org.eclipse.core.runtime\.settings\org.eclipse.core.runtime.prefs

with a content:

content-types/org.eclipse.jdt.core.javaProperties/charset=UTF-8

and restart Eclipse for the changes to take effect. From now on all property files will be considered UTF-8 (at least all my *.properties files are).

Collectives™ on Stack Overflow

Eclipse wrong Java properties UTF-8 encoding

8 Answers 8

Root cause:

Solution 1

Solution 2

Solution 3

24 Comments

2 Comments

Comments

12 Comments

1 Comment

Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

Root cause:

Solution 1

Solution 2

Solution 3

24 Comments

2 Comments

Comments

12 Comments

1 Comment

Comments

1 Comment

Comments

Linked

Related