1

I have this dataset (example):

dt <- data.table(ID = c(1,1,1,2,2,3,4,5,5,5), diagnosis = c("cancer", "cancer", "cancer", "cancer", "cancer", "cancer", "cancer", "cancer", "cancer", "cancer"), Date = c(2008,2001,2013,2008,2013,2013,2013,2001,2002,2013)) 

I ONLY want patients with a first diagnosis in 2013. So any other year should be out of the dataset.

However a patient should not be counted in the new dataset if the patients has a diagnosis in 2008. If the patient hav had a diagnosis before 2008, then we wil keep them, with their 2013 diagnosis.

So the final dataset will look like this:

 ID diagnosis Date 3 cancer 2013 4 cancer 2013 5 cancer 2013 

How can I do so by using data.table

3
  • 3
    Not clear what you need to do Commented Feb 8, 2023 at 11:51
  • dont understand. why is there no ID1 anymore? Also, please share your example data in a ready to use format, e.g. with dput() Commented Feb 8, 2023 at 11:57
  • Because if you have a diagnosis in 2008 and 2013 you cant not be in the final data. only if you have a diagnosis in 2013 alone or before 2008 and 2013 - makes sense? Commented Feb 8, 2023 at 12:16

4 Answers 4

1

Updated code:

dt <- data.table(ID = c(1,1,1,2,2,3,4,5,5,5), diagnosis = c("cancer", "cancer", "cancer", "cancer", "cancer", "cancer", "cancer", "cancer", "cancer", "cancer"), Date = c(2008,2001,2013,2008,2013,2013,2013,2001,2002,2013)) dt[diagnosis=="cancer" & Date == 2013 & !(ID %in% dt[diagnosis=="cancer" & Date == 2008, ID]),] 

Output:

 ID diagnosis Date 1: 3 cancer 2013 2: 4 cancer 2013 3: 5 cancer 2013 
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you - but not really - I have editet my question :)
Please see the code, I have updated it @Hellihansen
1

Using a not-join (see ?data.table):

dt[Date == 2013][!dt[Date == 2008], on=.(ID)] 

Output

 ID diagnosis Date <num> <char> <num> 1: 3 cancer 2013 2: 4 cancer 2013 3: 5 cancer 2013 

I guess this it's more efficient to use a filter than an aggregate condition like any.

Comments

0
dt[, .SD[!2008 %in% unlist(.SD) & Date == 2013], ID] 

or probably a bit better after seeing Waldi's answer and do:

dt[, .SD[!any(Date == 2008) & Date == 2013], ID] 

results

# ID diagnosis Date # 1: 3 cancer 2013 # 2: 4 cancer 2013 # 3: 5 cancer 2013 

Comments

0
dt[, .SD[Date == 2013 & !any(between(Date, 2008, 2012)),], ID] # ID diagnosis Date # <num> <char> <num> # 1: 3 cancer 2013 # 2: 4 cancer 2013 # 3: 5 cancer 2013 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.