find the shortest time difference between two dataframes

Question

Suppose I have two dataframes,

df1 id time1 1 2016-04-07 21:39:10 1 2016-04-05 11:19:17 2 2016-04-03 10:58:25 2 2016-04-02 21:39:10 df2 id time2 1 2016-04-07 21:39:11 1 2016-04-05 11:19:18 1 2016-04-06 21:39:11 1 2016-04-04 11:19:18 2 2016-04-03 10:58:26 2 2016-04-02 21:39:11 2 2016-04-04 10:58:26 2 2016-04-05 21:39:11

I want to find for each entry in df1, the shortest time difference in df2. Suppose we take the first entry, it has id 1, so I want to loop through df2, filter for id 1, then check the time difference between one entry of df1 and remaining entries of df2 and find the shortest difference and fetch the corresponding entry. My sample output should be

id time time2 diff(in secs) 1 2016-04-07 21:39:10 2016-04-07 21:39:10 1 1 2016-04-05 11:19:17 2016-04-05 11:19:17 1 2 2016-04-03 10:58:25 2016-04-03 10:58:25 1 2 2016-04-02 21:39:10 2016-04-02 21:39:10 1

the following is my try,

for(i in unique(df1$id)){ temp1 = df1[df1$id == i,] temp2 = df2[df2$id == i,] for(j in unique(df1$time1){ for(k in unique(df2$time2){ diff = abs(df1$time1[j] - df2$time2[k] print(diff)}}}

I am not able to progress after this, getting many errors. Can anybody help me in correcting this? May be suggest a better efficient way to do this? Any help would be appreciated.

Update:

Reproducable data:

 df1 <- data.frame( id = c(1,1,2,2), time1 = c('2016-04-07 21:39:10', '2016-04-05 11:19:17', '2016-04-03 10:58:25', '2016-04-02 21:39:10') ) df2 <- data.frame( id = c(1,1,1,1,2,2,2,2), time2 = c('2016-04-07 21:39:11', '2016-04-05 11:19:18','2016-04-07 21:39:11', '2016-04-05 11:19:18', '2016-04-03 10:58:26', '2016-04-02 21:39:11','2016-04-03 10:58:26', '2016-04-02 21:39:11') ) df1$time1 = as.POSIXct(df1$time1) df2$time2 = as.POSIXct(df2$time2)

do the ids matter at all?? sounds like shortest difference within id — jaimedash
– jaimedash, Commented May 12, 2016 at 23:03

chappers · Accepted Answer · 2016-05-12 23:20:27Z

You can achieve this using dplyr. Basically the idea is since we want to produce an entry we'll assign each element in df1 a new id (in this case I just called it rowname).

After this, all we're interested in is joining the two dataframes on the id and filter them based on the minimum absolute difference.

library(dplyr) df1$time1 <- as.POSIXct(as.character(df1$time1)) df2$time2 <- as.POSIXct(as.character(df2$time2)) df1 %>% add_rownames("rowname") %>% left_join(df2, "id") %>% mutate(diff=time2-time1) %>% group_by(rowname) %>% filter(min(abs(diff)) == abs(diff)) %>% distinct

This is my output:

Source: local data frame [4 x 5] Groups: rowname [4] rowname id time1 time2 diff (chr) (dbl) (time) (time) (dfft) 1 1 1 2016-04-07 21:39:10 2016-04-07 21:39:11 1 secs 2 2 1 2016-04-05 11:19:17 2016-04-05 11:19:18 1 secs 3 3 2 2016-04-03 10:58:25 2016-04-03 10:58:26 1 secs 4 4 2 2016-04-02 21:39:10 2016-04-02 21:39:11 1 secs

Community · Accepted Answer · 2017-05-23 10:28:15Z

You can also do this in base R. To generate random dates (useful), I borrowed and edited a nice function from elsewhere on StackOverflow:

latemail <- function(N, st="2011/01/01", et="2016/12/31") { st <- as.POSIXct(as.Date(st)) et <- as.POSIXct(as.Date(et)) dt <- as.numeric(difftime(et,st,unit="sec")) ev <- sort(runif(N, 0, dt)) return(st + ev) } df1 <- data.frame(id=c(1,1,2,2), time1=latemail(4)) df2 <- data.frame(id=c(rep(1,4), rep(2,4)), time2=latemail(8))

And then your answer can be achieved in two lines:

shortest <- sapply(df1$time1, function(x) which(abs(df2$time2 - x) == min(abs(df2$time2 - x)))) cbind(df1, df2[shortest,])

Output:

id time1 id time2 1 2011-10-08 02:00:21 1 2011-08-17 18:07:47 1 2012-05-06 17:49:03 1 2012-09-04 19:52:40 2 2013-10-29 13:14:51 1 2012-10-29 20:09:31 2 2016-06-17 19:23:43 2 2015-11-24 02:07:15

Bulat · Accepted Answer · 2016-05-13 02:22:39Z

If you work with data.table:

library(data.table) df1 <- data.table( id = c(1,1,2,2), time1 = c('2016-04-07 21:39:10', '2016-04-05 11:19:17', '2016-04-03 10:58:25', '2016-04-02 21:39:10') ) df2 <- data.table( id = c(1,1,1,1,2,2,2,2), time2 = c('2016-04-07 21:39:11', '2016-04-05 11:19:18','2016-04-07 21:39:11', '2016-04-05 11:19:18', '2016-04-03 10:58:26', '2016-04-02 21:39:11','2016-04-03 10:58:26', '2016-04-02 21:39:11') ) df1$time1 = as.POSIXct(df1$time1) df2$time2 = as.POSIXct(df2$time2) res <- df1[df2, .(time1, time2), by = .EACHI, on = "id"][, diff:= abs(time2 -time1)] setkey(res, id, time1, diff) res <- res[, row := seq_along(.I), by = .(id, time1)][row == 1]

Collectives™ on Stack Overflow

find the shortest time difference between two dataframes

3 Answers 3

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Linked

Related