11

I want to encode a file it may be image or any pdf and send it to server. Which type of Encoding and decoding I have to follow. (Both server and client is in our company. we can write logic in both place). UTF-8 Encoding is by default supported in java. and to use Base-64 encoding I have to import external jar. for simple texts both the ways are working fine. I am using tcp socket programming.

Using UTF-8 Encoding

String str = "This is my Sample application"; String urlEncodedData = URLEncoder.encode(str, "UTF-8"); // Encoding with UTF-8 System.out.println("..after URL Encodingencoding..."+urlEncodedData ); String retrievedData = URLDecoder.decode(urlEncodedData , "UTF-8");// Decoding with UTF-8 System.out.println("..after decoding..."+retrievedData ); 

Using Base-64 (Using commons.codec jar of apache

byte[] b =Base64.encodeBase64(str.getBytes()); //Encoding base 64 Base64.decodeBase64(b); // Decoding with Base 64 
4
  • 8
    You're comparing apples and pears. Base64 is just a number base in which to express data. UTF-8 is an encoding scheme that encodes numbers (thought of as codepoints) in a byte stream. Commented Jul 22, 2011 at 15:09
  • See the question here. It's tagged as C# but the encoding information applies the same way. Commented Jul 22, 2011 at 15:10
  • 1
    Why do you want/need to encode the binary files (PDF and images)? Can't you just send it to the server? Commented Jul 22, 2011 at 15:11
  • It is a not only abot pdf, i have image files also. if the file is big i am sending the file in chunkwise Commented Jul 22, 2011 at 15:29

1 Answer 1

57

UTF-8 is a text encoding - a way of encoding text as binary data.

Base64 is in some ways the opposite - it's a way of encoding arbitrary binary data as ASCII text.

If you need to encode arbitrary binary data as text, Base64 is the way to go - you mustn't try to treat arbitrary binary data as if it's UTF-8 encoded text data.

However, you may well be able to transfer the file to the server just as binary data in the first place - it depends on what transport you're using.

Sign up to request clarification or add additional context in comments.

9 Comments

I am using tcp socket programming.
@Deepakkk: Well I'm sure you're using some protocol that's slightly higher level than that... depending on what the application protocol is, you may or may not need to perform binary to text encoding.
@JonSkeet Why can't we try to treat arbitrary binary data as if it's UTF-8 while Base64 is assuming the bytes are encoded in ASCII?
@sarahTheButterFly: Not every byte sequence is valid UTF-8-encoded text. There are rules around what's allowed - look up the Wikiedia UTF-8 article to find out the details. Even if every byte sequence were valid, you'd find that a lot of the characters produced might be hard to transmit over many transports, whereas Base64 uses only non-control characters within ASCII, which are generally easy to transmit.
@CᴴᴀZ What do you see the difference as? String is a type representing text. Bytes are binary data. "String to bytes" and "Text to binary" are the same thing.
|