0

Problems with conditional merging of data in R using a loop

I have a main dataset of 48134 unique observations with a total of 35 variables (a surgical population 2015-2020). One is LopNr which is a unique case identifier, two others are OPERATION_START (POSIXct %d%B%Y:%H:%M:%S) and YearOfSurgery (character) which both include the year a patient underwent surgery.

I now want to include additional socioeconomic data i have stored as six separate csv files, one for each inclusion year. I want to add SES data for each case based on the year it underwent surgery. If surgery in 2015, extract data from the 2015 csv file, for example. I also want the variables i add to be regarded as one, even though var1 can be extracted from any of the six csv files depending on year of surgery.

I'm using tidyverse and the last loop i tried was as follows (Raks_SummaInk is one of the variables i want to extract):

#TEST OF LOOP 230523 08:24 years <- c(2015, 2016, 2017, 2018, 2019, 2020) HIP2_SPOR_SES <- HIP2_SPOR # Create a new dataset to store the merged data (basically a copy of HIP2_SPOR) for (year in years) { csv_file <- paste0("MC_Lev_LISA_", year, ".csv") socioeco_data <- read_csv(csv_file) %>% select(Lopnr, Raks_SummaInk) merged_data <- HIP2_SPOR_SES %>% filter(YearOfSurgery == as.character(year)) %>% left_join(socioeco_data, by = c("LopNr" = "Lopnr")) %>% summarize(Raks_SummaInk = sum(Raks_SummaInk, na.rm = TRUE)) HIP2_SPOR_SES <- merge(HIP2_SPOR_SES, merged_data, all = TRUE) # Merge the new data with the existing dataset } 

This resulted in a new dataset HIP2_SPOR_SES with 45 variables, all the new ones named Raks_SummaInk but with different suffixes, and unfortunately just NAs...

2
  • It should help to first bind_rows the imported csv data and then join the entire set with the soceco data. Commented May 23, 2023 at 10:46
  • Could you elaborate please? Commented May 23, 2023 at 11:15

1 Answer 1

0

one approach, taking advantage of Mapping and Reduceing:

library(dplyr) 
  • create example surgic. data:
main_data <- data.frame(LopNr = paste0('ID_', 1:2), OPERATION_START = as.Date(c('2015/04/08 10:00:00', '2016/02/16 11:00:00')), YearOfSurgery = as.character(2015:2016) ) 
# > main_data LopNr OPERATION_START YearOfSurgery 1 ID_1 2015-04-08 2015 2 ID_2 2016-02-16 2016 
  • set path to the folder with socio-economic files:
data_dir <- 'csv_dir' ## (replace with path to your soceco csv folder) 
  • create some example csv files with socioeconomic data:
read.table(text = 'RaksSummaInk,Lopnr\n123,ID_1\n516,ID_2\n', sep = ',', header = TRUE) |> write.csv(file = 'csv_dir/MC_Lev_LISA_2015.csv') read.table(text = 'RaksSummaInk,Lopnr\n725,ID_1\n400,ID_2\n', sep = ',', header = TRUE) |> write.csv(file = 'csv_dir/MC_Lev_LISA_2016.csv') 
  • read in, row-bind and transform (rename, make unique) the soceco files:
soceco_data <- c(2015, 2016) |> Map(f = \(yr) file.path(data_dir, sprintf('MC_Lev_LISA_%s.csv', yr)) |> read.csv() |> mutate(year = yr) ) |> Reduce(f = bind_rows) |> distinct(LopNr = Lopnr, ## match captialisation in main_data RaksSummaInk, YearOfSurgery = as.character(year)) 
> soceco_data RaksSummaInk LopNr YearOfSurgery 1 123 ID_1 2015 2 516 ID_2 2015 3 725 ID_1 2016 4 400 ID_2 2016 
  • join soceco data to main data:
main_data |> left_join(soceco_data, c('LopNr', 'YearOfSurgery')) 
 LopNr OPERATION_START YearOfSurgery RaksSummaInk 1 ID_1 2015-04-08 2015 123 2 ID_2 2016-02-16 2016 400 
Sign up to request clarification or add additional context in comments.

2 Comments

This worked. So thankful for your help! Had some trouble reading the csv files (probably due to some odd Swedish letters included), but could solve this by using read_csv instead. Happy coding and thanks again for the assistance, this seems like a great community to be part of.
You're welcome :-) Find more about the "character encoding hell" here: stackoverflow.com/questions/18789330/…

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.