12
$\begingroup$

I have a bunch of datasets which look like:

3 7 1 6 5 8 2 4 1 2 8 1 5 2 5 ... 

and I need to make a list of pairs such that the first element makes a pair with other elements in the same line.

3 7 3 1 3 6 ... 

I usually use Table because it is fast enough for $10^3$ elements, but as my dataset gets bigger, it's no longer efficient. For example:

data = RandomInteger[{1, 10}, {10^7, 5}]; Flatten[Table[{data[[i, 1]], data[[i, j]]}, {i, 1, Length@data}, {j, 2, 5}], 1]//AbsoluteTiming//First (* 39.828744 *) 

What is the most efficient way to find this pair list?

Timing

Finally I had a chance to do a timing on my real machine:

(* Kuba *) 0.726869 (* ciao *) 1.297926 (* nested table*) 2.146111 (* Shutao Tang *) 11.290944 (* Mr.Wizard♦ *) 57.232387 

Although it seems results can be (completely) different if I do a timing on my old laptop!

$\endgroup$
2
  • $\begingroup$ Are the elements always a machine size Integer? Or always a machine-precision Real? Or mixture of different types of data? $\endgroup$ Commented Apr 11, 2015 at 21:38
  • 2
    $\begingroup$ @MichaelE2: They're always Integer. $\endgroup$ Commented Apr 11, 2015 at 21:49

6 Answers 6

16
$\begingroup$

About 4x faster:

Partition[Flatten @ data[[All, {1, 2, 1, 3, 1, 4, 1, 5}]], 2] 
$\endgroup$
5
  • 1
    $\begingroup$ Shouldn't you make this more general? $\endgroup$ Commented Apr 11, 2015 at 22:32
  • $\begingroup$ It depends. If it is slow, probably doesn't matter. If it is fast, ok. But atm I'm confused because I'm testing your and rasher's method with result that your is 20x slower than OP's. So as soon as I figure it out I will tell you :P $\endgroup$ Commented Apr 11, 2015 at 22:35
  • $\begingroup$ LOL -- looks like I wrote a function for the wrong shape of data. Okay, +1, and back to the drawing board. $\endgroup$ Commented Apr 11, 2015 at 22:43
  • $\begingroup$ @Mr.Wizard Ok :) I have to go but I will make it more general tomorrow, if it is worth it :). Feel free to delete my non relevant comments when you change something. Good night. :) $\endgroup$ Commented Apr 11, 2015 at 22:46
  • $\begingroup$ Ah, clean. I feel herp-a-derp for not thinking of that... +1 $\endgroup$ Commented Apr 12, 2015 at 1:54
10
$\begingroup$

I think you'll find this faster:

pairem[data_] := Module[{c = ConstantArray[0, 8*(Length@data)], p = Flatten@data[[All, 2 ;;]], p2 = data[[All, 1]]}, c[[2 ;; ;; 2]] = p; c[[1 ;; ;; 2]] = Flatten@Transpose[ConstantArray[p2, 4]]; Partition[c, 2]] 
$\endgroup$
1
  • $\begingroup$ @Pickett I don't know, I still get only 2.5 speed up here, maybe it is hardware related. $\endgroup$ Commented Apr 11, 2015 at 22:27
5
$\begingroup$

Edit: this method is optimized for long sublists which is exactly the opposite of your example. Sorry. I'll post if I find anything applicable.


This is fairly clean and on larger data at least as fast as rasher's code:

fn = ArrayFlatten[{{First@#, Rest@# ~Partition~ 1 }}] &; 

Test:

fn /@ {{3, 7, 1, 6, 5}, {8, 2, 4, 1, 2}, {8, 1, 5, 2, 5}} 
{{{3, 7}, {3, 1}, {3, 6}, {3, 5}}, {{8, 2}, {8, 4}, {8, 1}, {8, 2}}, {{8, 1}, {8, 5}, {8, 2}, {8, 5}}} 
x = RandomInteger[99, {5000, 5000}]; fn /@ x // AbsoluteTiming // First pairem[x] // AbsoluteTiming // First 
0.334019 0.383022 

A related question: Prepend 0 to sublists

$\endgroup$
2
  • 3
    $\begingroup$ Try with x = RandomInteger[99, {10^6, 5}] which is really the case. $\endgroup$ Commented Apr 11, 2015 at 22:39
  • $\begingroup$ @Kuba, x = RandomInteger[99, {10^6, 5}];pairem[x]; // AbsoluteTiming fn /@ x; // AbsoluteTiming ===> {0.2158204, Null},{13.1357421, Null} $\endgroup$ Commented Apr 12, 2015 at 5:12
4
$\begingroup$

Here is a refinement of @Kuba's clever Part syntax:

data = RandomInteger[{1,10},{10^7,5}]; r1 = Partition[ Flatten @ data[[All,{1,2,1,3,1,4,1,5}]], 2 ]; //AbsoluteTiming r2 = ArrayReshape[ data[[All, {1,2,1,3,1,4,1,5}]], {4 Length[data], 2} ]; //AbsoluteTiming r1===r2 

{1.55516, Null}

{0.568824, Null}

True

$\endgroup$
3
$\begingroup$

My trail:

data = RandomInteger[{1, 10}, {8, 5}] 
{{1, 1, 9, 10, 6}, {7, 2, 8, 5, 4}, {2, 1, 10, 1, 7}, {9, 1, 4, 5, 2}, {6, 10, 6, 5, 10}, {2, 10, 1, 7, 4}, {3, 5, 3, 2, 2}, {1, 2, 9, 6, 2}} 
Thread@{#1, {##2}} & @@@ data 
{ {{1, 1}, {1, 9}, {1, 10}, {1, 6}}, {{7, 2}, {7, 8}, {7, 5}, {7, 4}}, {{2, 1}, {2, 10}, {2, 1}, {2, 7}}, {{9, 1}, {9, 4}, {9, 5}, {9, 2}}, {{6, 10}, {6, 6}, {6, 5}, {6, 10}}, {{2, 10}, {2, 1}, {2, 7}, {2, 4}}, {{3, 5}, {3, 3}, {3, 2}, {3, 2}}, {{1, 2}, {1,9}, {1, 6}, {1, 2}} } 

Performance test

x = RandomInteger[99, {10^6, 5}]; Thread@{#1, {##2}} & @@@ x; // AbsoluteTiming 
{2.3535156, Null} 
$\endgroup$
2
  • $\begingroup$ Pretty, so +1, but much slower even than OP code, who is looking for efficiency. $\endgroup$ Commented Apr 12, 2015 at 5:01
  • $\begingroup$ @rasher, The high-efficiency methods are pairem and @Kuba method currently $\endgroup$ Commented Apr 12, 2015 at 5:22
2
$\begingroup$

Just for fun a solution with Riffle:

riffle[d_] := Module[{data = Transpose[d]}, Partition[Flatten[Transpose[Rest[Riffle[data, {First[data]}]]]], 2] ] 
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.