0

I'm trying to back into a fake birthdate based on the age of a consumer. I'm using lubridate package. Here is my code:

ymd(today) - years(df$age) - months(sample(1:12, 1)) - days(sample(1:31, 1)).

I want to use this to generate a different dob that equals the age. When I run this inline it gives every row the same month and day and different year. I want the month and day to vary as well.

5
  • Because you are sampling vector of length 1 which is recycled Commented Aug 4, 2019 at 19:35
  • so I guess I need to build a loop to take a range of samples? Commented Aug 4, 2019 at 19:45
  • You can try months(sample(1:12, nrow(df))) - days(sample(1:31, nrow(df))) Commented Aug 4, 2019 at 19:54
  • No loop required, take advantage of R's vectorized calculations. If you're doing this in a dplyr pipeline, perhaps replace ,1 with ,n() in your two calls to sample. Commented Aug 4, 2019 at 20:04
  • The logic is completely wrong. The years function does not return a date but rather a number, which is not going to correspond with the scale of Date values that are returned by ymd. You might get closer by multiplying by 365 for years, 30 for months, and making the month and date samples the correct length. Commented Aug 4, 2019 at 20:12

2 Answers 2

3

You can make a date with the year of birth at 1st of January and then add random duration of days to it.

library(lubridate) library(dplyr) set.seed(5) df <- data.frame(age = c(18, 33, 58, 63)) df %>% mutate(dob = make_date(year(Sys.Date()) - age, 1, 1) + duration(sample(0:364, n()), unit = "days")) 
Sign up to request clarification or add additional context in comments.

2 Comments

I think this is a good approach, but you shouldn't have to bother with duration and the unit specification. A Date is just a number anyway so you can just do df %>% mutate(dob = make_date(year(Sys.Date()) - age, 1, 1) + sample(0:364, n()))
Thank you both for contributing. This worked for me when I made the small change of allowing for sampling replacement.
0

In base R, we can extract the year from the age column subtract it from current year, select a random month and date, paste the values together and create a Date object.

set.seed(123) df <- data.frame(age = sample(100, 5)) as.Date(paste(as.integer(format(Sys.Date(), "%Y")) - df$age, sprintf("%02d", sample(12, nrow(df))), sprintf("%02d", sample(30, nrow(df))), sep = "-")) #[1] "1990-01-29" "1940-06-14" "1978-09-19" "1933-05-16" "1928-04-03" 

However, in this case you might need to make an extra check for month of February, or to be safe you might want to sample dates only from 28 instead of 30 here.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.