In Java, I've been trying to write a String to a file using UTF-8 encoding which will later be read by another program written in a different programming language. While doing so I noticed that the bytes created when encoding a String into a byte array didn't seem to have the correct byte values.
I narrowed down the problem to the symbol "£" which seems to produce incorrect bytes when encoded to UTF-8
byte[] byteArray = "£".getBytes(Charset.forName("UTF-8")); // Print out the Byte Array of the UTF-8 converted string // Upcast byte values to print the bytes as unsigned for (byte signedByte : byteArray) { System.out.print((signedByte & 0xFF) + " "); } This outputs 6 bytes with the decimal values: 239 190 130 239 189 163, in hex this is: ef be 82 ef bd a3
http://www.utf8-chartable.de/ however says that the values for "£" in hex is: c2 a3, the output should then be: 194 163
Other strings seem to produce correct bytes when encoded as UTF-8, so I'm wondering why Java is producing these 6 bytes for "£", and how I should go about properly converting by Strings to byte arrays using UTF-8 encoding
I have also tried
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(outputFile), "UTF-8"); out.write("£"); out.close(); but this produced the same 6 bytes
£with\u00a3in your code, and I'm sure you'll find it works.