2

I have data in columns which I need to do calculations on. Is it possible to do this using previous row values without using a loop? E.g. if in the first column the value is 139, calculate the median of the last 5 values and the percent change of the value 5 rows above and the value in the current row?

ID Data PF 135 5 123 136 4 141 137 5 124 138 6 200 139 1 310 140 2 141 141 4 141 

So here in this dataset you would do:

  1. Find 139 in ID column
  2. Return average of last 5 rows in Data (Gives 4.2)
  3. Return performance of values in PF 5 rows above to current value (Gives 152%)

If I would do a loop it looks like this:

for (i in 1:nrow(data)){ if(data$ID == "139" & i>=3) {data$New_column <- data[i,"PF"] / data[i-4,"PF"] - 1 } 

The problem is that the loop takes too long due to to many data points. The ID 139 will appear several times in the dataset.

Many thanks. Carlos

4
  • Please add a reproducible example and expected output. Commented Aug 3, 2016 at 12:40
  • 2
    Check out rollapply in the zoo package. Commented Aug 3, 2016 at 12:41
  • Can you define what you mean by performance of values in PF 5 rows above current value? Its mean? Its median? In any case you don't have 5 rows above 139 just 4. Commented Aug 3, 2016 at 13:43
  • Sorry, you are right 4 columns above, the current value should be included Commented Aug 3, 2016 at 13:44

3 Answers 3

2

As pointed out by Tutuchacn and Sotos, use the package zoo to get the mean of the Data in the last N rows (inclusive of the row) you are querying (assuming your data is in the data frame df):

library(zoo) ind <- which(df$ID==139) ## this is the row you are querying N <- 5 ## here, N is 5 res <- rollapply(df$Data, width=N, mean)[ind-(N-1)] print(res) ## [1] 4.2 

rollapply(..., mean) returns the rolling mean of the windowed data of width=N. Note that the index used to query the output from rollapply is lagged by N-1 because the rolling mean is applied forward in the series.

To get the percent performance from PF as you specified:

percent.performance <- function(x) { z <- zoo(x) ## create a zoo series lz <- lag(z,4) ## create the lag version return(z/lz - 1) } res <- as.numeric(percent.performance(df$PF)[ind]) print(res) ## [1] 1.520325 

Here, we define a function percent.performance that returns what you want for all rows of df for which the computation makes sense. We then extract the row we want using ind and convert it to a number.

Hope this helps.

Sign up to request clarification or add additional context in comments.

Comments

0

Is that what you want?

ntest=139 sol<-sapply(5:nrow(df),function(ii){#ii=6 tdf<-df[(ii-4):ii,] if(tdf[5,1]==ntest) c(row=ii,aberage=mean(tdf[,"Data"]),performance=round(100*tdf[5,"PF"]/tdf[1,"PF"]-1,0)) }) sol<- sol[ ! sapply(sol, is.null) ] #remove NULLs sol [[1]] row aberage performance 5.0 4.2 251.0 

Comments

0

This could be a decent start:

mytext = "ID,Data,PF 135,5,123 136,4,141 137,5,124 138,6,200 139,1,310 140,2,141 141,4,141" mydf <- read.table(text=mytext, header = T, sep = ",") do.call(rbind,lapply(mydf$ID[which(mydf$ID==139):nrow(mydf)], function(x) { tempdf <- mydf[1:which(mydf$ID==x),] data.frame(ID=x,Data=mean(tempdf$Data),PF=100*(tempdf[nrow(tempdf),"PF"]-tempdf[(nrow(tempdf)-4),"PF"])/tempdf[(nrow(tempdf)-4),"PF"]) })) ID Data PF 139 4.200000 152.03252 140 3.833333 0.00000 141 3.857143 13.70968 

The idea here is: You take ID's starting from 139 to the end and use the lapply function on each of them by generating a temporary data.frame which includes all the rows above that particular ID (including the ID itself). Then you grab the mean of the Data column and the rate of change (i.e. what you call performance) of the PF column.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.