6

I've been toying around with some .NET features (namely Pipelines, Memory, and Array Pools) for high speed file reading/parsing. I came across something interesting while playing around with Array.Copy, Buffer.BlockCopy and ReadOnlySequence.CopyTo. The IO Pipeline reads data as byte and I'm attempting to efficiently turn it into char.

While playing around with Array.Copy I found that I am able to copy from byte[] to char[] and the compiler (and runtime) are more than happy to do it.

char[] outputBuffer = ArrayPool<char>.Shared.Rent(inputBuffer.Length); Array.Copy(buffer, 0, outputBuffer, 0, buffer.Length); 

This code runs as expected, though I'm sure there are some UTF edge cases not properly handled here.

My curiosity comes with Buffer.BlockCopy

char[] outputBuffer = ArrayPool<char>.Shared.Rent(inputBuffer.Length); Buffer.BlockCopy(buffer, 0, outputBuffer, 0, buffer.Length); 

The resulting contents of outputBuffer are garbage. For example, with the example contents of buffer as

{ 50, 48, 49, 56, 45 } 

The contents of outputBuffer after the copy is

{ 12338, 14385, 12333, 11575, 14385 } 

I'm just curious what is happening "under the hood" inside the CLR that is causing these 2 commands to output such different results.

2
  • Call Array.Clear before you copy, then the problem will become evident immediately. Commented Aug 12, 2018 at 15:58
  • As usual, what looks like garbage in decimal makes sense in hexadecimal Commented Aug 12, 2018 at 16:35

1 Answer 1

15

Array.Copy() is smarter about the element type. It will try to use the memmove() CRT function when it can. But will fall back to a loop that copies each element when it can't. Converting them as necessary, it considers boxing and primitive type conversions. So one element in the source array will become one element in the destination array.

Buffer.BlockCopy() skips all that and blasts with memmove(). No conversions are considered. Which is why it can be slightly faster. And easier to mislead you about the array content. Do note that utf8 encoded character data is visible in that array, 12338 == 0x3032 = "2 ", 14385 = 0x3831 = "18", etc. Easier to see with Debug > Windows > Memory > Memory 1.

Noteworthy perhaps is that this type-coercion is a feature. Say when you receive an int[] through a socket or pipe but have the data in a byte[] buffer. By far the fastest way to do it.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the insight!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.