Java: Which has more performance - for pushing a stream to a string?

Question

I wrote a utility method to write some small data from a stream to a String.

Which implementation has more performance?

Write all data to a byte array and then convert all of them to String at once.

OR

Convert each buffered part to String and concatenate them.

Implementation 1:

private String fileToString() throw ... { final byte[] buffer = new byte[bufLen]; int n; final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); while ((n = fileInputStream.read(buffer)) != -1) byteArrayOutputStream.write(buffer, 0, n); return new String(byteArrayOutputStream.toByteArray(), "UTF-8"); }

Implementation 2:

private String fileToString() throw ... { final byte[] buffer = new byte[bufLen]; int n; final StringBuilder stringBuilder = new StringBuilder(aProperValue); while ((n = fileInputStream.read(buffer)) != -1) stringBuilder.append(new String(buffer, 0, n, "UTF-8")); return stringBuilder.toString(); }

EDIT:

The second implementation is not correct! I was wrong! See my answer below.

@ThomasS. Perhaps will do it. But it would be good to make it accessible for all on the web. — Mir-Ismaili
– Mir-Ismaili, Commented Dec 18, 2017 at 20:36
The second one is incorrect: it might read half a character (since UTF8 encodes some characters to several bytes) and try to transform this half sequence to a character. Why don't you use a Reader to read characters? That's what they're for. — JB Nizet
– JB Nizet, Commented Dec 18, 2017 at 20:37
Don't forget to read about how to write a correct micro-benchmark before you try it. — azurefrog
– azurefrog, Commented Dec 18, 2017 at 20:38
@Mir-Ismaili fileInputStream.read(buffer) reads bytes, not UTF-8 characters. fileInputStream.read(buffer) can indeed split a UTF-8 character into multiple bytes. — Andrew Henle
– Andrew Henle, Commented Dec 18, 2017 at 21:06
@Mir-Ismaili you didn't get me. Suppose your String has two chars. Suppose encoding the first char to UTF8 gives the bytes [192, 128], and the encoding of the second gives [193, 129]. Now suppose that, when reading these bytes, you first get [192, 128, 193]. You'll transform these three bytes to a String, thus trying to decode the byte 193 as a character, which is invalid. — JB Nizet
– JB Nizet, Commented Dec 18, 2017 at 21:07

Mir-Ismaili · Accepted Answer · 2020-12-04 12:27:43Z

1

The second implementation is wrong. It doesn't work at boundaries! Thank @JB Nizet and @Andrew Henle. See their comments under my question.

edited Dec 4, 2020 at 12:27

answered Dec 18, 2017 at 21:20

Mir-Ismaili

17.7k9 gold badges108 silver badges126 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Oleksandr · Accepted Answer · 2017-12-18 20:52:03Z

Best way is to use some library. I've tried to solve same issue by myself, but performance was really slow. Consider using CharStreams.toString from Guava library, you'll see increased performance with naked eye.

Why add a library hog if you can solve a tiny problem with some thinking?

Collectives™ on Stack Overflow

Java: Which has more performance - for pushing a stream to a string?

2 Answers 2

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Linked

Related