Filtering df on multiple columns from the same row in another table?

Question

I have a dataset with start and end times for events (called df_time), and another dataset with when an event happened (df_val). I want to use df_time to filter down df_val only to events that happened within recorded time intervals.

I'm a bit lost on how to accomplish this though.

start = c(1, 5, 7, 4) end = c(2, 7, 11, 7) df_time = data.frame(start, end) time = c(3, 6, 2, 10, 11) val = c(100, 20, 30, 40, 50) df_val = data.frame(time, val) df_val %>% select_all() %>% filter(time >= df_time$start & time <= df_time$end)

Output:

 time val 1 6 20 Warning messages: 1: In time >= df_time$start : longer object length is not a multiple of shorter object length 2: In time <= df_time$end : longer object length is not a multiple of shorter object length

The above will run with warning messages (above), and gives me the wrong output (ignores starts/ends that are equal to value timestamps). Above, all values but 3 should be printed.

I'm unsure on how to fix this, and would appreciate any help/resources!

Tyler Smith · Accepted Answer · 2019-07-25 19:29:57Z

Is this what you are trying to accomplish?

library(tidyverse) start = c(1, 5, 7, 4) end = c(2, 7, 11, 7) df_time = data.frame(start, end) time = c(3, 6, 2, 10, 11) val = c(100, 20, 30, 40, 50) df_val = data.frame(time, val) # return one row for each start/end pair that time falls between map2_dfr(start, end, ~filter(df_val, time >= .x, time <= .y) %>% mutate(start = .x, end = .y)) #> time val start end #> 1 2 30 1 2 #> 2 6 20 5 7 #> 3 10 40 7 11 #> 4 11 50 7 11 #> 5 6 20 4 7 #return unique pairs map2_dfr(start, end, ~filter(df_val, time >= .x, time <= .y)) %>% unique() #> time val #> 1 2 30 #> 2 6 20 #> 3 10 40 #> 4 11 50 #simpler method, probably df_val %>% filter(map_lgl(time, ~any((.x >= start) & .x <= end))) #> time val #> 1 6 20 #> 2 2 30 #> 3 10 40 #> 4 11 50

^{Created on 2019-07-25 by the reprex package (v0.2.1)}

Edit: added some alternatives

It is! This answers my question, but I'm curious- let's say in addition to filtering to between two sets of time, I also needed to match an additional ID column (both df_time and df_val have a column called ID). Is there any way to accomplish in the method above?
sure, you could also just do something like left_join(df_val, df_time, by = "id") %>% filter(time >= start, time <= end). Without the id you could also do crossing(df_val, df_time) %>% filter(time >= start, time <= end) to get something similar to the first solution. I probably wouldn't suggest crossing if the tables are very large, though.

chinsoon12 · Accepted Answer · 2020-03-23 07:50:49Z

Here is another option using non-equi inner join using data.table:

library(data.table) setDT(df_time) setDT(df_val) df_time[df_val, on=.(ID, start<time, end>time), nomatch=0L, c(mget(paste0("x.", names(df_time))), mget(paste0("i.", names(df_val))))]

output:

 x.ID x.start x.end i.ID i.time i.val 1: 1 5 7 1 6 20 2: 1 4 7 1 6 20 3: 1 7 11 1 10 40

Collectives™ on Stack Overflow

Filtering df on multiple columns from the same row in another table?

2 Answers 2

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Related