result storing with nested (double) loop

Question

I have double loop (it will be triple loop at very end but one thing at the time). I need to save its results into data frame. I have no trouble with doing it for single loop but I have problems when I need to make nested loop. I manage to write replicable code:

#first i create sample df with 3 random variables and index df=data.frame("var1"=runif(18,min=0,max=1),` "var2"=runif(18,min=0,max=1), "var3"=runif(18,min=0,max=1), "index2"=c(rep(c("A","B","C"),6)), "index1"=c(rep(1,9),rep(2,9))) #lists for subseting data in loops list.1=list(1,2) list.2=list("A","B","C") #first loop based on list.2 for (i in 1:length(list.2)){ i2=list.2[i]#indicator for inside loop to subset based on letter for (i in 1:length(list.1)){ x=subset(df,df$index1 %in% list.1[i] & df$index2 %in% i2 )#subseting data x=subset(x,select=c("var1","var2"))#second subset is not needed for example but it exists in my loop MyCalcs=data.frame( "INDEX1"=list.1[i], "CALC1"=mean(x$var1+x$var2), "CALC2"=mean(x$var1-x$var2), "CALC3"=mean(x$var1*x$var2) )#here I make some simple calculation print(MyCalcs)#this i want put into data.frame } }

For single loop using do.call(rbind,list) works well, but in this case result was last 2 rows of print(MyCalcs). I also tried with assign but with no success.

Can you show the desired output based on your data sample?

Z.Lin
– Z.Lin

2017-08-29 14:25:55 +00:00
Commented Aug 29, 2017 at 14:25 — Z.Lin
– Z.Lin, Commented Aug 29, 2017 at 14:25

Rieneke · Accepted Answer · 2017-08-31 12:38:21Z

I would solve this by initializing the dataset and adding rows to it. This avoids the use of rbind. My approach is prone to errors in indexing, so I changed the indexing variable of your second loop to a different variable than the indexing variable in your first loop.

#first i create sample df with 3 random variables and index df=data.frame("var1"=runif(18,min=0,max=1), "var2"=runif(18,min=0,max=1), "var3"=runif(18,min=0,max=1), "index2"=c(rep(c("A","B","C"),6)), "index1"=c(rep(1,9),rep(2,9))) #lists for subseting data in loops list.1=list(1,2) list.2=list("A","B","C") #here I initialize the dataset MyCalcs.tot <- as.data.frame(matrix(rep(NA, length(list.1)*length(list.2)*4), ncol = 4)) names(MyCalcs.tot) <- c("INDEX1","CALC1", "CALC2", "CALC3") #first loop based on list.2 for (i in 1:length(list.2)){ i2=list.2[i]#indicator for inside loop to subset based on letter #your second loop used the same index as the first, #this migth lead to confusion, thus i changed it to a j for (j in 1:length(list.1)){ x=subset(df,df$index1 %in% list.1[j] & df$index2 %in% i2 )#subseting data x=subset(x,select=c("var1","var2"))#second subset is not needed for example but it exists in my loop MyCalcs=data.frame( "INDEX1"=list.1[j], "CALC1"=mean(x$var1+x$var2), "CALC2"=mean(x$var1-x$var2), "CALC3"=mean(x$var1*x$var2) )#here I make some simple calculation MyCalcs.tot[(i - 1)*length(list.1) + j,] <- MyCalcs #adding your calculations to the next row. print(MyCalcs)#this i want put into data.frame }}

MyCalcs.tot is the required data frame

You could also avoid using loops altogether and use apply function:

#first i create sample df with 3 random variables and index df=data.frame("var1"=runif(48,min=0,max=1), "var2"=runif(48,min=0,max=1), "var3"=runif(48,min=0,max=1), "index3"=c(rep(c("do","re","mi","fa"),12)), "index2"=c(rep(c("A","B","C"),16)), "index1"=c(rep(1,24),rep(2,24))) comb <- as.data.frame(cbind(unlist(lapply(list.1,function(x)rep(x,length(list.2)*length(list.3)))), rep(unlist(lapply(list.2,function(x)rep(x,length(list.3)))),length(list.1)), rep(unlist(list.3),length(list.1)*length(list.2)))) names(comb) <- c("INDEX1","INDEX2","INDEX3") comb$CALC1 <- apply(comb,1,function(x)mean(apply(df[,1:2],1,function(y)y[1]+ y[2])[which(df$index1 == x[1] & df$index2 == x[2] & df$index3 == x[3])])) comb$CALC2 <- apply(comb,1,function(x)mean(apply(df[,1:2],1,function(y)y[1]- y[2])[which(df$index1 == x[1] & df$index2 == x[2] & df$index3 == x[3])])) comb$CALC3 <- apply(comb,1,function(x)mean(apply(df[,1:2],1,function(y)y[1]* y[2])[which(df$index1 == x[1] & df$index2 == x[2] & df$index3 == x[3])]))

It worked swell, however i have some troubles with this code when i tried adapt it to my more complex version of code. Luckily its easy to reproduce on my example - so lets say i want to add new column to MyCalcs, let it be list.2 and list.2=c("A","B","C") so i adjusted: MyCalcs.tot <- as.data.frame(matrix(rep(NA, length(list.1)*length(list.2)*5), ncol = 5)) and names(MyCalcs.tot) <- c("INDEX1","INDEX2","CALC1", "CALC2", "CALC3")(replace it in your code)
and add in MyCalcs data frame another column: "INDEX2"=list.2[j] (add it just in line under INDEX1=list.1[i]). What happens is MyCalcs.tot the column "INDEX2" should consist values `"A","B","C", but instead of that they are all ones.
Thank you very much for all your effort, I am still not sure which of those two approaches I will use for my code but this have nice educational value for me.

Collectives™ on Stack Overflow

result storing with nested (double) loop

1 Answer 1

10 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Related