0

I wrote a utility method to write some small data from a stream to a String.

Which implementation has more performance?

  1. Write all data to a byte array and then convert all of them to String at once.

OR

  1. Convert each buffered part to String and concatenate them.

Implementation 1:

private String fileToString() throw ... { final byte[] buffer = new byte[bufLen]; int n; final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); while ((n = fileInputStream.read(buffer)) != -1) byteArrayOutputStream.write(buffer, 0, n); return new String(byteArrayOutputStream.toByteArray(), "UTF-8"); } 

Implementation 2:

private String fileToString() throw ... { final byte[] buffer = new byte[bufLen]; int n; final StringBuilder stringBuilder = new StringBuilder(aProperValue); while ((n = fileInputStream.read(buffer)) != -1) stringBuilder.append(new String(buffer, 0, n, "UTF-8")); return stringBuilder.toString(); } 

EDIT:

The second implementation is not correct! I was wrong! See my answer below.

10
  • @ThomasS. Perhaps will do it. But it would be good to make it accessible for all on the web. Commented Dec 18, 2017 at 20:36
  • 7
    The second one is incorrect: it might read half a character (since UTF8 encodes some characters to several bytes) and try to transform this half sequence to a character. Why don't you use a Reader to read characters? That's what they're for. Commented Dec 18, 2017 at 20:37
  • 1
    Don't forget to read about how to write a correct micro-benchmark before you try it. Commented Dec 18, 2017 at 20:38
  • 1
    @Mir-Ismaili fileInputStream.read(buffer) reads bytes, not UTF-8 characters. fileInputStream.read(buffer) can indeed split a UTF-8 character into multiple bytes. Commented Dec 18, 2017 at 21:06
  • 2
    @Mir-Ismaili you didn't get me. Suppose your String has two chars. Suppose encoding the first char to UTF8 gives the bytes [192, 128], and the encoding of the second gives [193, 129]. Now suppose that, when reading these bytes, you first get [192, 128, 193]. You'll transform these three bytes to a String, thus trying to decode the byte 193 as a character, which is invalid. Commented Dec 18, 2017 at 21:07

2 Answers 2

1

The second implementation is wrong. It doesn't work at boundaries! Thank @JB Nizet and @Andrew Henle. See their comments under my question.

Sign up to request clarification or add additional context in comments.

Comments

0

Best way is to use some library. I've tried to solve same issue by myself, but performance was really slow. Consider using CharStreams.toString from Guava library, you'll see increased performance with naked eye.

1 Comment

Why add a library hog if you can solve a tiny problem with some thinking?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.