8

I am testing my application's i18n compatibility. I have a English version of Windows 7 which mean the system's display language is English. And I set the system locale as Chinese for non-unicode application.

My application encountered problems when exporting Html files with Chinese character under jdk1.6, but works fine when running under jdk1.7.

I debugged it and found the direct reason was that Charset.defaultCharset() returned different values.

Under jdk1.7 Charset.defaultCharset() returned GBK which is the charset for chinese.

Under jdk1.6 Charset.defaultCharset() returned window_1252 which is charset for Latin language.

I know the problem can be solved by designate charset,say utf-8, in code.

But I want to know why Charset.defaultCharset() return different values under JDK1.7 and JDK 1.6 .

3
  • 1
    At a guess, reading the "locale for non-Unicode application" setting is a new feature in Windows' JRE 7. (I'm guessing because it might not be important enough to mention in the release notes, and the search feature for the bug database doesn't actually search the bug database.) Commented Nov 18, 2011 at 2:40
  • 3
    There have been some Unicode and Internationalization Enhancements in Java 7 - perhaps this was bundled with it. Commented Nov 18, 2011 at 2:53
  • 1
    Can you post what do you get by calling System.getProperty("file.encoding") in both jdk 6 and 7? Commented Jan 14, 2012 at 12:15

2 Answers 2

3

Charset.defaultCharset() gives the charset of JVM running, so it is not always the same value. For example if you are running your programs with Netbeans, it will always return UTF-8, since that's the default encoding for Java Projects in Netbeans.

I have a setup similar to yours. My Windows is English (menus, dialogs are English) and I'm using Turkish for non-Unicode applications. When I start JVM without any flag or system parameter, both Java 7 and Java 6 runtimes give "CP1254" when Charset.defaultCharset() is called. System.getProperty("file.encoding") and default IO encoding are also the same. ( The locale of the system is different in these two Java versions, however that's another story. )

So I guess your problem is either about how you start your JVM, or about how JVM decides to default encoding it should use. If you are sure that the problem is not the former one (you run JVM without any encoding parameter and you do not attempt to change the default charset anywhere in your program), then JVM fetches the default encoding incorrectly and most probably that's abnormal behaviour.

Sign up to request clarification or add additional context in comments.

Comments

3

The Java 7 technote says:

The supported encodings vary between different implementations of the Java Platform, Standard Edition 7 (Java SE 7).

The Charset doc says:

Every instance of the Java virtual machine has a default charset, which may or may not be one of the standard charsets. The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.

Also, I've found a "bug" about using -Dfile.encoding with this final evaluation:

This is not a bug. The "file.encoding" property is not required by the J2SE platform specification; it's an internal detail of Sun's implementations and should not be examined or modified by user code. It's also intended to be read-only; it's technically impossible to support the setting of this property to arbitrary values on the command line or at any other time during program execution.

The preferred way to change the default encoding used by the VM and the runtime system is to change the locale of the underlying platform before starting your Java program.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.