I am attempting to migrate a database and would like to use R to assist in the process. As part of the migration process, I need to update "Item IDs" as they have changed. I have created a function to map the old id's to the new:
old_to_new <- function(id, df) { return (df[which(df$Old == id), ]$New) } However, whenever I attempt to apply it to add a new column in my dataframe (loaded from an database table):
library(tidyverse) library(RODBC) cn <- odbcDriverConnect(connection="Driver={SQL Server Native Client 11.0};server=xxx;database=xxx;uid=xxx;pwd=xxx;") df <- sqlQuery(cn, "SELECT * FROM [MaintDB_New].[dbo].[Priority]") ticket_df <- sqlQuery(cn, "SELECT * FROM [MaintDB_New].[dbo].[Tickets]") ticket_details_df <- sqlQuery(cn, "SELECT * FROM [MaintDB_New].[dbo].[Ticket_Details]") new_items <- read_csv("./ticket_itm_export_temp.csv", col_names = c("Old", "Name", "New")) ticket_df_new <- ticket_df %>% mutate(item_id = old_to_new(itemID, new_items)) I receive the following error:
Error in `[[<-.data.frame`(`*tmp*`, col, value = c(NA_integer_, NA_integer_, : replacement has 280 rows, data has 69430 In addition: Warning message: In df$Old == id : longer object length is not a multiple of shorter object length What am I doing wrong, and what is the proper approach. I received a similar error while attempting to use ddplyr.
I am new to R, so I apologize if this an obvious question.
EDIT - Added data structure:
head(ticket_df) ticketID propertyID itemID roomNumber assignedToID isOpen openID latestID 1 11 10 1 <NA> NA 0 22 23 2 12 17 1 <NA> NA 0 24 289 3 13 17 1 <NA> NA 0 25 292 4 14 17 17 <NA> NA 0 26 4411 5 15 17 68 <NA> NA 0 27 296 6 16 17 74 <NA> NA 0 28 294 head(new_items) Old Name New <int> <chr> <int> 1 257 Register Cash Drawers 425 2 253 Alarm System 426 3 135 CREDENZA/ ARMOIRE 427 4 55 Back Office PC 428 5 183 Backup All Data 429 6 260 Base Boards 430
left_join, something liketicket_df %>% left_join(new_items, by = c("id" == "Old")) %>% mutate(item_id = New). Also make surenew_itemsdoesn't have duplicateOldentries or you'll end up with more rows than your started with. If this doesn't work, please post reproducible sample data so we can see what's going on. Usedputto give us a copy/pasteable version of the first 10 rows of the relevant data frames (looks liketicket_dfandnew_itemsare the relevant ones here).dput.)ticket_df$itemIdandnew_items$Old, so any code we might work on will do nothing. (I was trying to infer "key" based on finding any columns with matches. Thank you for clarifying the underlying data structure, so we now just need a more representative data sample ... and please usedput.)