0

Some sample data first

yr1 <- sample(0:1, 365, replace = T) yr2 <- sample(0:1, 365, replace = T) yr3 <- sample(0:1, 365, replace = T) yr4 <- sample(0:1, 365, replace = T) value <- c(yr1, yr2, yr3, yr4) yr <- rep(2000:2003, each = 365) doy <- rep(1:365, times = 4) foo <- as.data.frame(cbind(value, yr, doy)) 

foo contains 3 columns. Column 1 has arbitary value which is either 1 or 0. Column 2 contains year and column 3 has day of the year (365 days)

I have two vectors with start and end days in Julian days

start <- c(258, 258,258,258) mid <- c(279, 281,285,288) end <- c(286, 295,300,320) range.val <- as.data.frame(cbind(start, mid, end)) range.val$yr<- c(2000, 2001, 2002, 2003) 

range.val gives me the julian days between which I have to sum the values for each year in foo.

For example, for 2000, I need to sum foo$value starting from 258 day till 279 day and then from 279 till 286. Similarly, for 2001, sum foo$value from 258 till 281 and then from 281 till 295.

I also need to calculate length of the longest continous occurrence of 1 between these indices for each year.

I did this:

for(yr in 2000:2003){ range.sub <- range.val[range.val$yr == yr,] foo.sub <- foo[foo$yr == yr,] sum.1 <- sum(foo.sub[range.sub$start:range.sub$mid,"value"]) sum.2 <- sum(foo.sub[range.sub$mid:range.sub$end,"value"]) length.1 <- rle(foo.sub[range.sub$start:range.sub$mid,"value"]) max.spell.length <- max(sort(length.1$lengths, , decreasing = TRUE)) length.1 <- rle(foo.sub[range.sub$mid:range.sub$start,"value"]) max.spell.length1 <- max(sort(length.1$lengths, , decreasing = TRUE)) } 

In my continous effort to minimise the use of for-loop, I wonder if I can shorten the above code using some other function.

1 Answer 1

1

Here's a solution using dplyr.

Create a joint data frame & indicate whether each yr-doy combination is in range 1 (start to mid), range 2 (mid to end), or neither.

library(dplyr) df <- left_join(foo, range.val, by = "yr") df <- df %>% mutate(in.range1 = doy >= start & doy <= mid, in.range2 = doy >= mid & doy <= end) # Note: I'm not sure if the ranges are supposed to be inclusive on both ends, but you # should be able to change that easily 

For total value in range X for each year, filter for range & summarise by year:

df.sum.1 <- df %>% filter(in.range1) %>% #change to in.range2 for mid-end group_by(yr) %>% summarise(value = sum(value)) > df.sum.1 # A tibble: 4 x 2 yr value <dbl> <int> 1 2000 12 2 2001 12 3 2002 10 4 2003 10 

For longest run of 1's, filter for range & do rle on values for each year. Note that we should filter for value == 1 first, else if there's a longer run of 0's, you may get that instead:

df.spell.length1 <- df %>% filter(in.range1) %>% #change to in.range2 for mid-end group_by(yr) %>% arrange(doy) %>% do(data.frame(unclass(rle(.$value)))) %>% filter(values == 1) %>% filter(lengths == max(lengths)) %>% unique() > df.spell.length1 # A tibble: 4 x 3 # Groups: yr [4] yr lengths values <dbl> <int> <int> 1 2000 7 1 2 2001 3 1 3 2002 3 1 4 2003 3 1 

(For reproducibility, the sample data was generated with set.seed(123).)

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. Just as a comment, do this detach(package:plyr) first otherwise it will give you some funny result

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.