1

I am trying to create a new value in df_test based on if a row has been found in a second dataframe with 3 conditions being true. The conditions are as follows:

  1. The names must be equal in both tables (df_test$name = df_prod$name)
  2. The df_prod$birthdate must be at least 2020-01-01 (the persons birthdate can not be earlier than 2020-01-01)
  3. The df_test$import_date must be smaller than the df_prod$sent_date

I think of it as a Vlookup with multiple conditions.

This is what I have so far:

df_test$V5 = case_when(df_test$name %in% df_prod$name & df_test$import_date < df_prod$sent_date & df_prod$birthday > as.Date('2020-01-01') & ~ 1, TRUE ~ 0) 

Does anyone know how to proceed?

2
  • Does the two dataset have the same number of rows Commented Mar 19, 2021 at 18:48
  • No they dont, they also have a different amount of columns Commented Mar 19, 2021 at 18:49

1 Answer 1

1

This may be a case of join i.e assuming that we don't have duplicates for 'name' in the 'df_prod', do a left_join from 'df_test' with the selected columns in 'df_prod', and create the 'V5' by converting the compound logical expression to binary (as.integer)

library(dplyr) df_test2 <- left_join(df_test, df_prod %>% select(name, send_date, birthday), by = 'name') %>% mutate(V5 = as.integer(import_date < sent_date & birthday > as.Date('2020-01-01'))) %>% select(-send_date, birthday) 

or we can do this with data.table

library(data.table) setDT(df_test)[df_prod, V5 := as.integer(import_date < sent_date & birthday > as.Date('2020-01-01')), on = .(name)] 
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your help! The only problem is that there are duplicates in the df_prod$name column and now it is multiplying the results. is there anyway I can surpass this issue?
@Gabriela if both of them have duplicates, there is a question of which value element it should match to. Do you think you can use the distinct rows based on the name i.e. left_join(df_test, df_prod %>% select(name, send_date, birthday) %>% distinct(name, .keep_all = TRUE), by = 'name')

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.