1

Suppose I have two dataframes,

df1 id time1 1 2016-04-07 21:39:10 1 2016-04-05 11:19:17 2 2016-04-03 10:58:25 2 2016-04-02 21:39:10 df2 id time2 1 2016-04-07 21:39:11 1 2016-04-05 11:19:18 1 2016-04-06 21:39:11 1 2016-04-04 11:19:18 2 2016-04-03 10:58:26 2 2016-04-02 21:39:11 2 2016-04-04 10:58:26 2 2016-04-05 21:39:11 

I want to find for each entry in df1, the shortest time difference in df2. Suppose we take the first entry, it has id 1, so I want to loop through df2, filter for id 1, then check the time difference between one entry of df1 and remaining entries of df2 and find the shortest difference and fetch the corresponding entry. My sample output should be

id time time2 diff(in secs) 1 2016-04-07 21:39:10 2016-04-07 21:39:10 1 1 2016-04-05 11:19:17 2016-04-05 11:19:17 1 2 2016-04-03 10:58:25 2016-04-03 10:58:25 1 2 2016-04-02 21:39:10 2016-04-02 21:39:10 1 

the following is my try,

for(i in unique(df1$id)){ temp1 = df1[df1$id == i,] temp2 = df2[df2$id == i,] for(j in unique(df1$time1){ for(k in unique(df2$time2){ diff = abs(df1$time1[j] - df2$time2[k] print(diff)}}} 

I am not able to progress after this, getting many errors. Can anybody help me in correcting this? May be suggest a better efficient way to do this? Any help would be appreciated.

Update:

Reproducable data:

 df1 <- data.frame( id = c(1,1,2,2), time1 = c('2016-04-07 21:39:10', '2016-04-05 11:19:17', '2016-04-03 10:58:25', '2016-04-02 21:39:10') ) df2 <- data.frame( id = c(1,1,1,1,2,2,2,2), time2 = c('2016-04-07 21:39:11', '2016-04-05 11:19:18','2016-04-07 21:39:11', '2016-04-05 11:19:18', '2016-04-03 10:58:26', '2016-04-02 21:39:11','2016-04-03 10:58:26', '2016-04-02 21:39:11') ) df1$time1 = as.POSIXct(df1$time1) df2$time2 = as.POSIXct(df2$time2) 
12
  • Can you please add the code to generate df1 and df2 Commented May 12, 2016 at 23:03
  • do the ids matter at all?? sounds like shortest difference within id Commented May 12, 2016 at 23:03
  • @jaimedash yes along with the corresponding times Commented May 12, 2016 at 23:04
  • @Divi yeah will do it Commented May 12, 2016 at 23:04
  • Please provide data using dput. Commented May 12, 2016 at 23:06

3 Answers 3

2

You can achieve this using dplyr. Basically the idea is since we want to produce an entry we'll assign each element in df1 a new id (in this case I just called it rowname).

After this, all we're interested in is joining the two dataframes on the id and filter them based on the minimum absolute difference.

library(dplyr) df1$time1 <- as.POSIXct(as.character(df1$time1)) df2$time2 <- as.POSIXct(as.character(df2$time2)) df1 %>% add_rownames("rowname") %>% left_join(df2, "id") %>% mutate(diff=time2-time1) %>% group_by(rowname) %>% filter(min(abs(diff)) == abs(diff)) %>% distinct 

This is my output:

Source: local data frame [4 x 5] Groups: rowname [4] rowname id time1 time2 diff (chr) (dbl) (time) (time) (dfft) 1 1 1 2016-04-07 21:39:10 2016-04-07 21:39:11 1 secs 2 2 1 2016-04-05 11:19:17 2016-04-05 11:19:18 1 secs 3 3 2 2016-04-03 10:58:25 2016-04-03 10:58:26 1 secs 4 4 2 2016-04-02 21:39:10 2016-04-02 21:39:11 1 secs 
Sign up to request clarification or add additional context in comments.

Comments

1

You can also do this in base R. To generate random dates (useful), I borrowed and edited a nice function from elsewhere on StackOverflow:

latemail <- function(N, st="2011/01/01", et="2016/12/31") { st <- as.POSIXct(as.Date(st)) et <- as.POSIXct(as.Date(et)) dt <- as.numeric(difftime(et,st,unit="sec")) ev <- sort(runif(N, 0, dt)) return(st + ev) } df1 <- data.frame(id=c(1,1,2,2), time1=latemail(4)) df2 <- data.frame(id=c(rep(1,4), rep(2,4)), time2=latemail(8)) 

And then your answer can be achieved in two lines:

shortest <- sapply(df1$time1, function(x) which(abs(df2$time2 - x) == min(abs(df2$time2 - x)))) cbind(df1, df2[shortest,]) 

Output:

id time1 id time2 1 2011-10-08 02:00:21 1 2011-08-17 18:07:47 1 2012-05-06 17:49:03 1 2012-09-04 19:52:40 2 2013-10-29 13:14:51 1 2012-10-29 20:09:31 2 2016-06-17 19:23:43 2 2015-11-24 02:07:15 

Comments

0

If you work with data.table:

library(data.table) df1 <- data.table( id = c(1,1,2,2), time1 = c('2016-04-07 21:39:10', '2016-04-05 11:19:17', '2016-04-03 10:58:25', '2016-04-02 21:39:10') ) df2 <- data.table( id = c(1,1,1,1,2,2,2,2), time2 = c('2016-04-07 21:39:11', '2016-04-05 11:19:18','2016-04-07 21:39:11', '2016-04-05 11:19:18', '2016-04-03 10:58:26', '2016-04-02 21:39:11','2016-04-03 10:58:26', '2016-04-02 21:39:11') ) df1$time1 = as.POSIXct(df1$time1) df2$time2 = as.POSIXct(df2$time2) res <- df1[df2, .(time1, time2), by = .EACHI, on = "id"][, diff:= abs(time2 -time1)] setkey(res, id, time1, diff) res <- res[, row := seq_along(.I), by = .(id, time1)][row == 1] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.