Calculation in columns using previous row value without loops

Question

I have data in columns which I need to do calculations on. Is it possible to do this using previous row values without using a loop? E.g. if in the first column the value is 139, calculate the median of the last 5 values and the percent change of the value 5 rows above and the value in the current row?

ID Data PF 135 5 123 136 4 141 137 5 124 138 6 200 139 1 310 140 2 141 141 4 141

So here in this dataset you would do:

Find 139 in ID column
Return average of last 5 rows in Data (Gives 4.2)
Return performance of values in PF 5 rows above to current value (Gives 152%)

If I would do a loop it looks like this:

for (i in 1:nrow(data)){ if(data$ID == "139" & i>=3) {data$New_column <- data[i,"PF"] / data[i-4,"PF"] - 1 }

The problem is that the loop takes too long due to to many data points. The ID 139 will appear several times in the dataset.

Many thanks. Carlos

Can you define what you mean by performance of values in PF 5 rows above current value? Its mean? Its median? In any case you don't have 5 rows above 139 just 4. — aichao
– aichao, Commented Aug 3, 2016 at 13:43
Sorry, you are right 4 columns above, the current value should be included — carlosmaria
– carlosmaria, Commented Aug 3, 2016 at 13:44

aichao · Accepted Answer · 2016-08-03 14:33:15Z

As pointed out by Tutuchacn and Sotos, use the package zoo to get the mean of the Data in the last N rows (inclusive of the row) you are querying (assuming your data is in the data frame df):

library(zoo) ind <- which(df$ID==139) ## this is the row you are querying N <- 5 ## here, N is 5 res <- rollapply(df$Data, width=N, mean)[ind-(N-1)] print(res) ## [1] 4.2

rollapply(..., mean) returns the rolling mean of the windowed data of width=N. Note that the index used to query the output from rollapply is lagged by N-1 because the rolling mean is applied forward in the series.

To get the percent performance from PF as you specified:

percent.performance <- function(x) { z <- zoo(x) ## create a zoo series lz <- lag(z,4) ## create the lag version return(z/lz - 1) } res <- as.numeric(percent.performance(df$PF)[ind]) print(res) ## [1] 1.520325

Here, we define a function percent.performance that returns what you want for all rows of df for which the computation makes sense. We then extract the row we want using ind and convert it to a number.

Hope this helps.

Robert · Accepted Answer · 2016-08-03 13:49:27Z

Is that what you want?

ntest=139 sol<-sapply(5:nrow(df),function(ii){#ii=6 tdf<-df[(ii-4):ii,] if(tdf[5,1]==ntest) c(row=ii,aberage=mean(tdf[,"Data"]),performance=round(100*tdf[5,"PF"]/tdf[1,"PF"]-1,0)) }) sol<- sol[ ! sapply(sol, is.null) ] #remove NULLs sol [[1]] row aberage performance 5.0 4.2 251.0

Abdou · Accepted Answer · 2016-08-03 14:03:16Z

This could be a decent start:

mytext = "ID,Data,PF 135,5,123 136,4,141 137,5,124 138,6,200 139,1,310 140,2,141 141,4,141" mydf <- read.table(text=mytext, header = T, sep = ",") do.call(rbind,lapply(mydf$ID[which(mydf$ID==139):nrow(mydf)], function(x) { tempdf <- mydf[1:which(mydf$ID==x),] data.frame(ID=x,Data=mean(tempdf$Data),PF=100*(tempdf[nrow(tempdf),"PF"]-tempdf[(nrow(tempdf)-4),"PF"])/tempdf[(nrow(tempdf)-4),"PF"]) })) ID Data PF 139 4.200000 152.03252 140 3.833333 0.00000 141 3.857143 13.70968

The idea here is: You take ID's starting from 139 to the end and use the lapply function on each of them by generating a temporary data.frame which includes all the rows above that particular ID (including the ID itself). Then you grab the mean of the Data column and the rate of change (i.e. what you call performance) of the PF column.

Collectives™ on Stack Overflow

Calculation in columns using previous row value without loops

3 Answers 3

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Related