3

I'm looking for the most efficient way to store and manage a large byte array in memory. I will have the need to both insert and delete bytes from any position within the array.

At first, I was thinking that a regular array was best.

byte[] buffer = new byte[ArraySize]; 

This would allow me to access any byte within the array. I can also resize the array. However, there doesn't appear to be any built-in support for shifting or moving items within the array.

One option is to have a loop to move items one by one but that sounds horribly inefficient in C#. Another option is to create a new array and copy bytes over to the correct position, but that requires copying all data in the array.

Is there no better option?

3
  • Um, what about List<byte>? Doesn't it use an underlying array? Commented Nov 12, 2012 at 21:31
  • I haven't researched it in detail but am assuming that would not be as efficient as a regular array. Commented Nov 12, 2012 at 21:32
  • @LewsTherin yes. List is basically an abstraction over an array. He would be correct that an array is (barely) faster than a List<T>. Not enough to usually worry about though Commented Nov 12, 2012 at 21:47

2 Answers 2

2

Actually, I just found the Buffer Class, which appears ideal for what I need.

It looks like the BlockCopy method will block copy a bunch of items and supports copying within the same array, and even correctly handles overlapping items.

Sign up to request clarification or add additional context in comments.

6 Comments

That's not how it seems to me. Most of the lower level .NET methods are written in hand-optimized assembly language (and there are specific machine instructions for moving blocks of bytes). I would expect this to be many times faster than copying each byte in a loop.
That doesn't appear to allow for shifting.
@smdrager: I'd have to implement some logic myself. For example, if I insert a byte at the start then I'd copy starting from the first byte to an area starting with the second byte. It looks like BlockCopy will ensure items are copied in the right order so this overlap won't cause a problem. And then I can add the new item. Similar deal for delete.
Yes, for regular copying this would be true. A single-dimension array has IL opcodes specifically for copying bits around. But, it doesn't handle things like shifting or inserting elements... An array is just the wrong data structure if you're doing this with huge data sets
@JonathanWood You have very little faith in optimizing compilers (and JITs)
|
1

I think the best option in this case is a hybrid between a regular array and a list. This would only be necessary with megabyte sized arrays though.

So you could do something like this:

List<byte[]> buffer; 

And have each element of the list just a chunk of the data(say 64K or something small and manageable)

It'd require quite a bit of custom code, but would definitely be the fastest option when having to shift data around in a large array.

Also, if you're doing a lot more shifting of bytes than anything else, LinkedList<T> may work better (but it's famously bad for everything but a specific set of cases)

To clarify why this is more correct than an array, consider inserting 1 byte to the beginning of an array. You must allocate another array (double memory consumption) and then copy every byte to the new array after inserting the new byte, and then free the old array (possible heap corruption depending on size)

Consider now this method with lists.

If you have to insert a lot of bytes, you'll probably want to insert at the beginning of the buffer list. This is an O(n) operation, so your ending efficiency for this operation is O(n/CHUNK_SIZE)

Or, if you just need to insert a single byte, you can just get the first element of the list and copy the array as normal. Then, the speed is O(CHUNK_SIZE), which isn't horrible, especially if n in comparison is very large (megabytes of data)

2 Comments

I've written a hex editor (Cygnus Hex Editor) in C++ that uses tricks to quickly manage many megabytes of data. But that seems beyond my current needs. I just want a nice compact way to store a bunch of bytes and do some shifting around. In my question, I said large array but I was actually thinking of much less than megabytes. A linked list would use several bytes per byte of storage and would involve significant overhead in any conversion to actual bytes (such as to a file). That doesn't seem the right approach to me.
@JonathanWood I'm not suggesting a linked list of bytes. I'm suggesting a linked (well probably just regular) list of byte arrays. If you're dealing with less than a megabyte of data though, unless it's super performance critical, I don't think my answer will help you at all though

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.