3

I wanna create a new column based on the data in column 2 - 4, conditioning if the date column is before or after the previous row. My data looks as follows:

date city1 city2 city3 2022-01-25 Paris London Berlin 2022-01-28 Paris London Berlin 2022-02-04 Paris London Berlin 2022-01-26 Paris London Berlin 2022-02-08 Paris London Berlin 2022-02-02 Paris London Berlin 2022-02-04 Paris London Berlin 2022-02-06 Paris London Berlin 

The expected output looks as follow:

date city1 city2 city3 NewColumn 2022-01-25 Paris London Berlin Paris 2022-01-28 Paris London Berlin Paris 2022-02-04 Paris London Berlin Paris 2022-01-26 Paris London Berlin London 2022-02-08 Paris London Berlin London 2022-02-02 Paris London Berlin Berlin 2022-02-04 Paris London Berlin Berlin 2022-02-06 Paris London Berlin Berlin 

This is what I have tried:

for (i in 2:4){ new_data <- data %>% mutate(NewColumn = ifelse( as.Date(date) > lag(as.Date(date)), data[,i], data[,i+1]))} 

But this only generates:

date city1 city2 city3 NewColumn 2022-01-25 Paris London Berlin <NA> 2022-01-28 Paris London Berlin Berlin 2022-02-04 Paris London Berlin Berlin 2022-01-26 Paris London Berlin London 2022-02-08 Paris London Berlin London 2022-02-02 Paris London Berlin Berlin 2022-02-04 Paris London Berlin Berlin 2022-02-06 Paris London Berlin Berlin 

How can I solve this? Any advice?

3
  • tidyr::fill(NewColumn, .direction = "up") would be a quick fix. Commented Aug 14, 2022 at 8:55
  • Could you illustrate this further, given my sample code. I don¨t get it to work. Commented Aug 14, 2022 at 8:59
  • 1
    Hi @KalleBlomkvist, I rolled back your edit removing your question text — even though your question has been answered, leaving it here will help others with similar questions in the future. If you really want to remove it, you could delete it (rather than editing it to remove all the content), but I would encourage you to leave it up. Commented Aug 15, 2022 at 5:40

2 Answers 2

2

Using dplyr. Creating a custom group, we can use it to mutate the new values into the column

library(dplyr) df |> group_by(grp = cumsum(c(1, diff(lubridate::ymd(date))) < 0) + 1) |> rowwise() |> mutate(NewColumn = c_across(city1:city3)[grp]) |> ungroup() |> select(-grp) 
 date city1 city2 city3 NewColumn <chr> <chr> <chr> <chr> <chr> 1 2022-01-25 Paris London Berlin Paris 2 2022-01-28 Paris London Berlin Paris 3 2022-02-04 Paris London Berlin Paris 4 2022-01-26 Paris London Berlin London 5 2022-02-08 Paris London Berlin London 6 2022-02-02 Paris London Berlin Berlin 7 2022-02-04 Paris London Berlin Berlin 8 2022-02-06 Paris London Berlin Berlin 
Sign up to request clarification or add additional context in comments.

Comments

0
  • You can try this
j <- 2 lag_d <- c(0 , df$date) for(i in 1:(length(lag_d)-1)){ if(lag_d[i+1] > lag_d[i]) df$NewColumn[i] <- df[i,j] else { j <- j + 1 df$NewColumn[i] <- df[i,j] } } 
  • Output
 date city1 city2 city3 NewColumn 1 2022-01-25 Paris London Berlin Paris 2 2022-01-28 Paris London Berlin Paris 3 2022-02-04 Paris London Berlin Paris 4 2022-01-26 Paris London Berlin London 5 2022-02-08 Paris London Berlin London 6 2022-02-02 Paris London Berlin Berlin 7 2022-02-04 Paris London Berlin Berlin 8 2022-02-06 Paris London Berlin Berlin 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.