1

I have a table with the below structure.

ID Entity UserName UserRole UserStatus UpdatedDate 

Clustered key is defined on columns Entity, UserName, UserRole.

UserStatus for an Entity-UserName-UserRole could be either Active or Inactive.

I could implement the INSERT and UPDATE statements for a dataframe df as follows:

Insert:

values <- paste("('",df$Entity,"','", df$UserName,"','", ,df$UserRole,"','" ,df$UserStatus,"','" , df$UpdatedDate"')", sep="", collapse=",") sqlStr_Insert<- sqlQuery(con, sprintf(paste("Insert into TBL (Entity, UserName,UserRole,UserStatus,UpdatedDate) values ",values))) 

Update:

sql_string <- "UPDATE TBL SET UserStatus = '%s', UpdatedDate = '%s' WHERE Entity='%s' AND UserName ='%s' AND UserRole = '%s';" sqlStr_update <-sqlQuery(con, paste(sprintf(sql_string, UserStatus, UpdatedDate,df$Entity,df$UserName,df$UserRole), collapse="")) 

The code works fine but I would like it to check if for every row in the df for a combination of Entity-UserName-UserRole already exists in the TBL. If found, run the UPDATE statement else run the INSERT statement.

For example - SQL Server table data:

ID Entity UserName UserRole UserStatus UpdatedDate --------------------------------------------------- 1 A Ford Analyst Active 1/1/2020 2 B Acura Analyst Active 1/5/2020 3 A Tesla Engineer Inactive 1/6/2020 4 A Kia Analyst Active 1/1/2020 

df to be inserted or updated:

 Entity UserName UserRole UserStatus UpdatedDate --------------------------------------------------- A Accord Analyst Active 1/10/2020 B Acura Analyst Active 1/10/2020 C BMW Engineer Active 1/10/2020 

What should happen:

Only rows 1st and 3rd should get 'INSERTED' to the database TBL. UPDATE should only happen if the 'UserStatus' changes for an existing Entity-UserName-UserRole Combination.

I guess I could run a 'FOR' loop for every row of the dataframe df but I think that would be slow and not an optimum solution.

5
  • 1
    You need MERGE. Or two separate statements: an INSERT...WHERE NOT EXISTS and a joined UPDATE. You may want to look at passing through a Table-Valued Parameter and do it all in one batch. And your current code is wide open to SQL injection and syntax errors, use parameterized queries instead Commented Mar 17, 2021 at 6:20
  • okay good point. I do perform numerous data checks before Insert/Update, so the df gets populated with 'valid' data. The above code passes the entire df in one go to the SQL server. How would I write the MERGE statement in this scenario? Commented Mar 17, 2021 at 13:30
  • 1
    Does this answer your question? SQL MERGE statement to update data Commented Mar 17, 2021 at 13:47
  • partly as it does not mention how to perform SQL merge statement in R using the data frame. I will try it anyways. Commented Mar 17, 2021 at 14:40
  • CodeMaster, I suspect that the sql "merge" command is what you need. In many circles, this operation is referred to as an "UPSERT", where you update if it exists (based on one or more keys) and insert if not. I know of no R function that does it natively (though the dbx package supports other DBMSes, not sqlserver, perhaps they'll add it at some point). Waldi's two-step approach is what I've been using for a few years. Commented Mar 18, 2021 at 18:02

1 Answer 1

2

You could create a temporary table (use # prefix in SQL Server) and send a MERGE query as suggested by @CharlieFace:

data.new <- read.table(text=' Entity UserName UserRole UserStatus UpdatedDate A Accord Analyst Active 1/10/2020 B Acura Analyst Active 1/10/2020 C BMW Engineer Active 1/10/2020', header = T) library(DBI) conn <- dbConnect(...) # Create temporary table dbWriteTable(conn,'#NEWDATA',data.new) dbExecute(conn," MERGE INTO TBL WITH (HOLDLOCK) AS target USING #NEWDATA AS source ON target.Entity = source.Entity AND target.UserName = source.UserName AND target.UserRole = source.UserRole WHEN MATCHED THEN UPDATE SET target.UserStatus = source.UserStatus WHEN NOT MATCHED BY TARGET THEN INSERT (Entity,UserName,UserRole,UserStatus,UpdatedDate) VALUES (Entity,UserName,UserRole,UserStatus,UpdatedDate); " ) # Remove temporary table dbExecute(conn,"DROP TABLE #NEWDATA" ) # Check results result <- dbGetQuery(conn,"SELECT * from TBL") result ID Entity UserName UserRole UserStatus UpdatedDate 1 1 A Ford Analyst Active 1/1/2020 2 2 B Acura Analyst Active 1/5/2020 3 3 A Tesla Engineer Inactive 1/6/2020 4 4 A Kia Analyst Active 1/1/2020 5 NA A Accord Analyst Active 1/10/2020 6 NA C BMW Engineer Active 1/10/2020 

Not sure how ID is managed on the server : autonumber?

Sign up to request clarification or add additional context in comments.

3 Comments

Thnks! yes. ID is autonumber. Will this process suffer if for some reason multiple people login to make changes at the same time; table creation and deletion...
According to the post shared by @CharlieFace, HOLDLOCK should protect against concurrency problems.
I implemented the above method and it worked! I also did try with the 'for loop' that checks if a certain row already exists and it worked as well. Of course, the MERGE option is way faster, in my opinion and that will be my approach. thnx for the suggestions @waldi

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.