8

So it turns out all arrays are not created equal. Multi-dimensional arrays can have non-zero lower bounds. See for example Excel PIA's Range.Value property object[,] rectData = myRange.Value;

I need to convert these data into a jagged array. My first try below smells of complexity. Any suggestions to optimize it? It needs to handle the general case where lower bounds may not be zero.

I have this ex method:

 public static T[][] AsJagged<T>( this T[,] rect ) { int row1 = rect.GetLowerBound(0); int rowN = rect.GetUpperBound(0); int col1 = rect.GetLowerBound(1); int colN = rect.GetUpperBound(1); int height = rowN - row1 + 1; int width = colN - col1 + 1; T[][] jagged = new T[height][]; int k = 0; int l; for ( int i = row1; i < row1 + height; i++ ) { l = 0; T[] temp = new T[width]; for ( int j = col1; j < col1 + width; j++ ) temp[l++] = rect[i, j]; jagged[k++] = temp; } return jagged; } 

Used like this:

 public void Foo() { int[,] iRect1 = { { 1, 1, 1, 1 }, { 1, 1, 1, 1 }, { 1, 1, 1, 1 }, { 1, 1, 1, 1 }, { 1, 1, 1, 1 }, { 1, 1, 1, 1 }, { 1, 1, 1, 1 }, { 1, 1, 1, 1 } }; int[][] iJagged1 = iRect1.AsJagged(); int[] lengths = { 3, 5 }; int[] lowerBounds = { 7, 8 }; int[,] iRect2 = (int[,])Array.CreateInstance(typeof(int), lengths, lowerBounds); int[][] iJagged2 = iRect2.AsJagged(); } 

Curious if Buffer.BlockCopy() would work or be faster?

Edit: AsJagged needs to handle reference types.

Edit: Found bug in AsJagged(). Added int l; and added col1 + width to inner loop.

2 Answers 2

7

A view caveats/assumptions up front:

  • You seem to use only int as your data type (or at least seem to be OK with using Buffer.BlockCopy which would imply you can life with primitive types in general).
  • For the test data you show, I don't think there will be much different using any somewhat sane approach.

Having that said, the following implementation (which needs to be specialized for a specific primitive type (here int) because it uses fixed) is around 10 times faster than the approach using the inner loop:

 unsafe public static int[][] AsJagged2(int[,] rect) { int row1 = rect.GetLowerBound(0); int rowN = rect.GetUpperBound(0); int col1 = rect.GetLowerBound(1); int colN = rect.GetUpperBound(1); int height = rowN - row1 + 1; int width = colN - col1 + 1; int[][] jagged = new int[height][]; int k = 0; for (int i = row1; i < row1 + height; i++) { int[] temp = new int[width]; fixed (int *dest = temp, src = &rect[i, col1]) { MoveMemory(dest, src, rowN * sizeof(int)); } jagged[k++] = temp; } return jagged; } [DllImport("kernel32.dll", EntryPoint = "RtlMoveMemory")] unsafe internal static extern void MoveMemory(void* dest, void* src, int length); 

Using the following "test code":

 static void Main(string[] args) { Random rand = new Random(); int[,] data = new int[100,1000]; for (int i = 0; i < data.GetLength(0); i++) { for (int j = 0; j < data.GetLength(1); j++) { data[i, j] = rand.Next(0, 1000); } } Stopwatch sw = Stopwatch.StartNew(); for (int i = 0; i < 100; i++) { int[][] dataJagged = AsJagged(data); } Console.WriteLine("AsJagged: " + sw.Elapsed); sw = Stopwatch.StartNew(); for (int i = 0; i < 100; i++) { int[][] dataJagged2 = AsJagged2(data); } Console.WriteLine("AsJagged2: " + sw.Elapsed); } 

Where AsJagged (the first case) is your original function, I get the following output:

AsJagged: 00:00:00.9504376 AsJagged2: 00:00:00.0860492 

So there is indeed a faster way of doing it, however depending on the size of the test data, the number of times you actually perform this operation, and your willingness to allow unsafe and P/Invoke code, you're probably not going to need it.

Having that said, we were using large matrixes of double (say 7000x10000 elements) where it indeed did make a huge difference.

Update: about using Buffer.BlockCopy

I might overlook some Marshal or other trick, but I don't think using Buffer.BlockCopy is possible here. This is due to the fact that it requires both the source and destination array to, well, be an Array.

In our example, the destination is an array (e.g. int[] temp = ...) however the source is not. While we "know" that for two dimensional arrays of primitive types the layout is such, that each "row" (i.e. first dimension) is an array of the type in memory, there is no safe (as in unsafe) way to get that array without the overhead of copying it first. So we basically need to use a function that simply deals with memory and doesn't care about the actual content of it - like MoveMemory. BTW, the internal implementation of Buffer.BlockCopy does something similar.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the answer, the use case in my question is limited. AsJagged() does need to handle reference types... How would that change your solution? (PS: I will update the question.)
I think you'd be better off not resorting to unsafe code, unless it's absolutely crucial.
@dFlag: my solution already only works for non-reference, or more precisely primitive types. If you need to support multiple different types, you would require overloads of the AsJagged2 function for each type. However note that you should really measure given your anticipated requires (i.e. size of arrays) before dumping your approach.
@zmbq Well, that is quite general advice, and thus never really wrong ;-) But seriously, I hope I made it somewhat clear in my answer that one really needs to measure (in his problem domain) before resorting to such measures (I "highlighted" the statement just in case).
6

Your complexity is O(N*M) N - number of rows, M - number of columns. That's the best you can get when copying N*M values...

Buffer.BlockCopy might be faster than your inner for loop, but I wouldn't be surprised if the compiler knows how to handle this code properly and you won't gain any further speed. You should test it to make sure.

You may be able to achieve better performance by not copying the data at all (at the potential expense of slightly slower lookups). If you create an 'array row' class, that holds your rect and a row number, and provides an indexer that accesses the correct column, you can create an array of such rows, and save yourself the copying altogether.

The complexity of creating such an array of 'array rows' is O(N).

EDIT: An ArrayRow class, just because it bugs me...

The ArrayRow could look something like this:

class ArrayRow<T> { private T[,] _source; private int _row; public ArrayRow(T[,] rect, int row) { _source = rect; _row = row; } public T this[int col] { get { return _source[_row, col]; } } } 

Now you create an array of ArrayRows, you don't copy anything at all, and the optimizer has a good chance of optimizing accessing an entire row in sequence.

8 Comments

+1 for the mathematical thing. This is a non tribvial operation regardless how you turn it becasue you have to simply copy all the data per definition. The best you can come in is having this done with block methods, not manual item by item copy, but it IS a slow operation. Nothing in the world will change that.
+1 also for O-national analysis. Note however, that the C# compiler actually optimizes quite less than is commonly assumed. Most optimizations are actually done, when jitting the IL to machine code. So unless you are good in your assembler (x86, x64, whatever) it is not too easy to actually proof what is done and what is not.
Guys, take a look at my last suggestion, it avoids copying the data altogether.
@zmbq For a list of possible things the compiler does optimize see here. BTW, I'm not trying to counter your argument that optimization might happen. I'm just saying that there are already too many (unproven) myths, about what would potentially be optimized (compiler or jitter), around. So without some proof or analysis attached, I'm always suspicious. No offense. ;-)
It's OK, suspicion is good for a programmer... I agree it needs to be tested.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.