R: using apply family instead of for-loops

Question

Some sample data first

yr1 <- sample(0:1, 365, replace = T) yr2 <- sample(0:1, 365, replace = T) yr3 <- sample(0:1, 365, replace = T) yr4 <- sample(0:1, 365, replace = T) value <- c(yr1, yr2, yr3, yr4) yr <- rep(2000:2003, each = 365) doy <- rep(1:365, times = 4) foo <- as.data.frame(cbind(value, yr, doy))

foo contains 3 columns. Column 1 has arbitary value which is either 1 or 0. Column 2 contains year and column 3 has day of the year (365 days)

I have two vectors with start and end days in Julian days

start <- c(258, 258,258,258) mid <- c(279, 281,285,288) end <- c(286, 295,300,320) range.val <- as.data.frame(cbind(start, mid, end)) range.val$yr<- c(2000, 2001, 2002, 2003)

range.val gives me the julian days between which I have to sum the values for each year in foo.

For example, for 2000, I need to sum foo$value starting from 258 day till 279 day and then from 279 till 286. Similarly, for 2001, sum foo$value from 258 till 281 and then from 281 till 295.

I also need to calculate length of the longest continous occurrence of 1 between these indices for each year.

I did this:

for(yr in 2000:2003){ range.sub <- range.val[range.val$yr == yr,] foo.sub <- foo[foo$yr == yr,] sum.1 <- sum(foo.sub[range.sub$start:range.sub$mid,"value"]) sum.2 <- sum(foo.sub[range.sub$mid:range.sub$end,"value"]) length.1 <- rle(foo.sub[range.sub$start:range.sub$mid,"value"]) max.spell.length <- max(sort(length.1$lengths, , decreasing = TRUE)) length.1 <- rle(foo.sub[range.sub$mid:range.sub$start,"value"]) max.spell.length1 <- max(sort(length.1$lengths, , decreasing = TRUE)) }

In my continous effort to minimise the use of for-loop, I wonder if I can shorten the above code using some other function.

Z.Lin · Accepted Answer · 2018-01-10 04:52:05Z

Here's a solution using dplyr.

Create a joint data frame & indicate whether each yr-doy combination is in range 1 (start to mid), range 2 (mid to end), or neither.

library(dplyr) df <- left_join(foo, range.val, by = "yr") df <- df %>% mutate(in.range1 = doy >= start & doy <= mid, in.range2 = doy >= mid & doy <= end) # Note: I'm not sure if the ranges are supposed to be inclusive on both ends, but you # should be able to change that easily

For total value in range X for each year, filter for range & summarise by year:

df.sum.1 <- df %>% filter(in.range1) %>% #change to in.range2 for mid-end group_by(yr) %>% summarise(value = sum(value)) > df.sum.1 # A tibble: 4 x 2 yr value <dbl> <int> 1 2000 12 2 2001 12 3 2002 10 4 2003 10

For longest run of 1's, filter for range & do rle on values for each year. Note that we should filter for value == 1 first, else if there's a longer run of 0's, you may get that instead:

df.spell.length1 <- df %>% filter(in.range1) %>% #change to in.range2 for mid-end group_by(yr) %>% arrange(doy) %>% do(data.frame(unclass(rle(.$value)))) %>% filter(values == 1) %>% filter(lengths == max(lengths)) %>% unique() > df.spell.length1 # A tibble: 4 x 3 # Groups: yr [4] yr lengths values <dbl> <int> <int> 1 2000 7 1 2 2001 3 1 3 2002 3 1 4 2003 3 1

(For reproducibility, the sample data was generated with set.seed(123).)

Thanks. Just as a comment, do this detach(package:plyr) first otherwise it will give you some funny result

Collectives™ on Stack Overflow

R: using apply family instead of for-loops

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related