1

I'm trying to see if I can improve the speed of a for loop and an if condition statement. Basically it does a lookup on non repeating key values into an array and gets the value from another column.

If I run 100000 values it takes about 13 seconds see code below. Is there a way to make this more efficient? Ps i'm using octave 3.8.1 which works with matlab

%test if lookup statment clear all, clc, tic, clf; num_to_test=100000 %amount of numbers to test a1=(1:1:num_to_test)'; a2=(a1.*num_to_test); array=[a1,a2]; %array where values are stored lookupval=(randperm(num_to_test,num_to_test/2)/4)'; %lookup these random values of non repeating integers and floats and get another value amp=[]; freq=[]; found_array=[]; notfound_array=[]; for ii=1:1:rows(lookupval) if (find(lookupval(ii)==array(:,1))) %if you find a lookup value in array %disp('found'); [row,col] = find(lookupval(ii) == array(:,1)); amp=[amp;array(row,2)]; freq=[freq;array(row,1)]; found_array=[freq,amp]; else %add lookup value to another array and make amp value zero notfound_arraytmp=[lookupval(ii),0]; notfound_array=[notfound_array;notfound_arraytmp]; endif end comb_array=[found_array;notfound_array]; sort_comb_array=sortrows(comb_array,1); %sort array by first col incrementing fprintf('\nfinally Done-elapsed time -%4.4fsec- or -%4.4fmins- or -%4.4fhours-\n',toc,toc/60,toc/3600); 
0

4 Answers 4

2

Approach #1

This could be really efficient with ismember -

lookupval = sort(lookupval); %// Do sorting at the start sort_comb_array = [lookupval zeros(size(lookupval))]; %// Setup output array [idA,idB] = ismember(array(:,1),lookupval); %// Get matching IDs sort_comb_array(idB(idA),2) = array(idA,2); %// Index into second column %// of array and get corresponding values 

Approach #2

I would thrown in my favorite bsxfun too, but for such huge datasizes of 100,000, its memory inefficiency could make it slower -

lookupval = sort(lookupval); sort_comb_array = [lookupval zeros(size(lookupval))]; [idA,idB] = find(bsxfun(@eq,array(:,1),lookupval(:).')); %//'# Get matching IDs sort_comb_array(idB,2) = array(idA,2); 
Sign up to request clarification or add additional context in comments.

Comments

2

Several issues but the main one is probably that you don't preallocate - appending like this: amp=[amp;array(row,2)]; is generally slow in MATLAB. You don't need a loop here, though.

Let's start with a simple array, A:

1 500 2 700 3 900 7 1000 9 800 

And our lookup values are [2 6 3 9 7]; We want our output to show these lookup values, sorted, in the first column, and the second column to be either the values from the second column of A (where they exist) or zero.

lookup = sort(lookup); output = zeros(length(lookup),2); output(:,1) = lookup; [c a b ] = intersect(A(:,1),lookup); output(b,2) = A(a,2); 

The output is:

2 700 3 900 6 0 7 1000 9 800 

8 Comments

Thanks for the help. I ran your code online and the output doesn't match up (it makes all the lookup values 0) did I miss something? here's a link to what I ran ideone.com/bxMhig
@RickT I'd be interested to see how the other ismember based solution performs on Octave for this problem!
@Divakar Sure testing and comparing them now
@Divakar I posted the results as an answer
@RickT Make sure to clear out memory before the start of a new approach and sufficient number of iterations to make the timings around or more than 1sec.
|
0

Purely from an efficiency standpoint, I would rewrite the for loop as follows:

m = 0; % number of omitted values n = 0; % number of found values for ii=1:1:rows(lookupval) [row,col] = find(lookupval(ii) == array(:,1)); if ~isempty(row) %if you find a lookup value in array %disp('found'); n=n+1; amp(n)=array(row,2); freq(n)=;array(row,1); found_array=[freq,amp]; else %add lookup value to another array and make amp value zero m=m+1; notfound_array(2*m-1:2*m)=[lookupval(ii);0]; endif end 

This saves you a find call by using its output directly rather than recomputing it when find returns a position, and grows the arrays in a more efficient way (as shown in this question).

Comments

0

This is a test Divakar suggest I do to see the speed it takes octave 3.8.1 to run this. Results are below along with the code.

1) Using ismember with 2,000,000 is faster but uses more memory
-elapsed time -0.2306sec- or -0.0038mins-
Total is 15000001 elements using 106000008 bytes

2) Using intersect with 2,000,000 is slower but uses less memory.
-elapsed time -0.3057sec- or -0.0051mins-
Total is 11749047 elements using 93992376 bytes

3) Using bskfun with 100,000 produces an error: out of memory or dimensions too large for octave's index type

First test results:

clear all, clc, tic, clf; num_to_test=2000000 %amount of numbers to test a1=(1:1:num_to_test)'; a2=(a1.*num_to_test); array=[a1,a2]; %array where values are stored lookupval=(randperm(num_to_test,num_to_test/2)/4)'; %lookup these random vaules of intergers and floats and get another value lookupval = sort(lookupval); sort_comb_array = [lookupval zeros(size(lookupval))]; [idA1,idB1] = ismember(array(:,1),lookupval); sort_comb_array(idB1(idA1),2) = array(idA1,2); fprintf('\nfinally Done-elapsed time -%4.4fsec- or -%4.4fmins- or -%4.4fhours-\n',toc,toc/60,toc/3600); whos >>>num_to_test = 2000000 >>> finally Done-elapsed time -0.2306sec- or -0.0038mins- or -0.0001hours- >>>Variables in the current scope: Attr Name Size Bytes Class ==== ==== ==== ===== ===== a1 2000000x1 16000000 double a2 2000000x1 16000000 double array 2000000x2 32000000 double idA1 2000000x1 2000000 logical idB1 2000000x1 16000000 double lookupval 1000000x1 8000000 double num_to_test 1x1 8 double sort_comb_array 1000000x2 16000000 double Total is 15000001 elements using 106000008 bytes ======================================================================== 

Second test results:

clear all, clc, tic, clf; num_to_test=2000000 %amount of numbers to test a1=(1:1:num_to_test)'; a2=(a1.*num_to_test); array=[a1,a2]; %array where values are stored lookupval=(randperm(num_to_test,num_to_test/2)/4)'; %lookup these random vaules of intergers and floats and get another value lookupval = sort(lookupval); output = zeros(length(lookupval),2); output(:,1) = lookupval; [c a b ] = intersect(array(:,1),lookupval); output(b,2) =array(a,2); fprintf('\nfinally Done-elapsed time -%4.4fsec- or -%4.4fmins- or -%4.4fhours-\n',toc,toc/60,toc/3600); whos >>>num_to_test = 2000000 >>> finally Done-elapsed time -0.3057sec- or -0.0051mins- or -0.0001hours- >>>Variables in the current scope: Attr Name Size Bytes Class ==== ==== ==== ===== ===== a 250005x1 2000040 double a1 2000000x1 16000000 double a2 2000000x1 16000000 double array 2000000x2 32000000 double b 250005x1 2000040 double c 250005x1 2000040 double lookupval 1000000x1 8000000 double num_to_test 1x1 8 double output 1000000x2 16000000 double Total is 11750016 elements using 94000128 bytes ======================================================================= 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.