51

I have tried the readLines and the read.csv functions but then don't work.

Here is the contents of the my_script.sql file:

SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees WHERE HireDate >= '1-july-1993' 

and it is saved on my Desktop.

Now I want to run this query from my R script. Here is what I have:

conn = connectDb() fileName <- "C:\\Users\\me\\Desktop\\my_script.sql" query <- readChar(fileName, file.info(fileName)$size) query <- gsub("\r", " ", query) query <- gsub("\n", " ", query) query <- gsub("", " ", query) recordSet <- dbSendQuery(conn, query) rate <- fetch(recordSet, n = -1) print(rate) disconnectDb(conn) 

And I am not getting anything back in this case. What can I try?

8
  • If you manually do dbSendQuery(conn, "SELECT Emp..."), are any rows returned there? Are you confident that this is a problem with the read-from-file part and not a data-is-not-present problem? Commented Jun 30, 2017 at 19:21
  • 1
    Yes, if I do that then data is returned. I am confident that I just don't know how to get it to read from a .sql file. Commented Jun 30, 2017 at 19:29
  • I've tested this process on both SQL Server and Postgres with no problems (I don't have mysql available). I've also used paste(readLines(...), collapse = " ") with the same success. Since a .sql file is merely a text file with a different file extension, there's nothing cosmic. Is it something to do with your use of query versus query2? You don't show how you got to that second variable. Commented Jun 30, 2017 at 19:38
  • I removed the query2 as it was a mistake, I changed it to just query. Can you suggest a way to make the file be read as UTF-8? I am getting some non UTF-8 symbols. Commented Jun 30, 2017 at 20:10
  • 1
    Have you tried readLines(..., encoding="UTF-8")? You may also want to look at iconv(). Commented Jun 30, 2017 at 20:14

8 Answers 8

64

I've had trouble with reading sql files myself, and have found that often times the syntax gets broken if there are any single line comments in the sql. Since in R you store the sql statement as a single line string, if there are any double dashes in the sql it will essentially comment out any code after the double dash.

This is a function that I typically use whenever I am reading in a .sql file to be used in R.

getSQL <- function(filepath){ con = file(filepath, "r") sql.string <- "" while (TRUE){ line <- readLines(con, n = 1) if ( length(line) == 0 ){ break } line <- gsub("\\t", " ", line) if(grepl("--",line) == TRUE){ line <- paste(sub("--","/*",line),"*/") } sql.string <- paste(sql.string, line) } close(con) return(sql.string) } 
Sign up to request clarification or add additional context in comments.

8 Comments

Thank you. This function works for me. I just want to ask a small caveat. Sometimes we need to specify the encoding we use to read a file, for example: con = file(filepath, "r", encoding = "UTF-16LE")
This is great, thanks. Another point, perhaps obvious (but tripped me up!): the Regex replacing line comments '--' with '/*...*/' will replace ALL occurrences of '--' in the query, regardless of whether it prefaces a comment or not. I had a query which contained multiple '--' in within quotes which should not have been changed so beware this kind of thing. I've (lazily) just removed this regex replace section from the function, and then will make sure I take out SQL comments from the file before importing.
Works perfect - I just repalced "line <- paste(sub("--","/*",line),"*/")" with "next" and added an else case where I inserted "sql.string <- paste(sql.string, line)" as the comments somebody puts in do not matter for me when query the db in question.
Wouldn't it be useful to have function for getSQL which you don't need to define everytime? So that e.g. you only have to type data <- getSQL('path/data.sql') in a new script without wirting the function above?
I agree. One option is to create your own custom package, which you can load like any other library. I do that for any small utility functions like this that I tend to use a lot.
|
48

I've found for queries with multiple lines, the read_file() function from the readr package works well. The only thing you have to be mindful of is to avoid single quotes (double quotes are fine). You can even add comments this way.

Example query, saved as query.sql

SELECT COUNT(1) as "my_count" -- comment goes here FROM -- tabs work too my_table 

I can then store the results in a data frame with

df <- dbGetQuery(con, statement = read_file('query.sql')) 

3 Comments

thanks you . does this hold true for a postgres query as well?
'statement' in selecting a method for function 'dbGetQuery': Error: could not find function "read_file" >
@kRazzyR the answer says read_file() function from the readr package. You will need to install that package and load it with library(readr) to use it.
17

You can use the read_file() function from the readr package.

fileName = read_file("C:/Users/me/Desktop/my_script.sql") 

You will get a string variable fileName with the desired text.

Note: Use / instead of \\\

Comments

6

The answer by Matt Jewett is quite useful, but I wanted to add that I sometimes encounter the following warning when trying to read .sql files generated by sql server using that answer:

Warning message: In readLines(con, n = 1) : line 1 appears to contain an embedded nul

The first line returned by readLines is often "ÿþ" in these cases (i.e. the UTF-16 byte order mark) and subsequent lines are not read properly. I solved this by opening the sql file in Microsoft SQL Server Management Studio and selecting

File -> Save As ...

then on the small downarrow next to the save button selecting

Save with Encoding ...

and choosing

Unicode (UTF-8 without signature) - Codepage 65001

from the Encoding dropdown menu.

If you do not have Microsoft SQL Server Management Studio and are using a Windows machine, you could also try opening the file with the default text editor and then selecting

File -> Save As ...

Encoding: UTF-8

to save with a .txt file extension.

Interestingly changing the file within Microsoft SQL Server Management Studio removes the BOM (byte order mark) altogether, whereas changing the file within the text editor converts the BOM to the UTF-8 BOM but nevertheless causes the query to be properly read using the referenced answer.

Comments

1

The combination of readr and textclean works well without having to create any new functions. read_file() reads the file into a character vector and replace_white() ensures all escape sequence characters are removed from your .sql file. Note: Does cause problems if you have comments in your SQL string !!

library(readr) library(textclean) SQL <- replace_white(read_file("file_path"))) 

Comments

0

Late to the game but I've been playing around this today and written a slightly different approach to gathering SQL code to run within DBI::dbGetQuery() function:

get_sql_code_from_file <- function(file_path){ # Get contents of file file_lines <- readLines(file_path) # Remove single line comments file_lines <- file_lines[grepl(pattern = "^--", file_lines) == FALSE] # Note indices of lines with multi-line comments start/ends multi_line_start_indices <- grep(pattern = "^/\\*", file_lines) multi_line_end_indices <- grep(pattern = "\\*/", file_lines) # Check if any multi-line comments found if (length(multi_line_end_indices) != 0) { # Note lines defining and within multi-line comments multi_line_indices <- sapply( seq_along(multi_line_start_indices), FUN = function(index, multi_line_start_indices, multi_line_end_indices){ return(multi_line_start_indices[index]:multi_line_end_indices[index]) }, multi_line_start_indices, multi_line_end_indices ) # Remove multi-line_comments file_lines <- file_lines[-multi_line_indices] } # Combine into single string file_string <- paste(file_lines, collapse = "\n") return(file_string) } 

This approach differs from the accepted answer as it removes both single and multi-line comments from the file.

Comments

0

This answer is a bit late for the OP but may be helpful for future readers.

The sqlhelper package provides functions for for reading, preparing and executing files of sql, similar to the discussion in comments of the accepted answer.

Examples might look like:

# Use this if you need your queries as strings my_sql_queries <- sqlhelper::read_sql("file.SQL") # Use this if you want to execute the queries results <- sqlhelper::run_files(c("file1.SQL", "file2.SQL")) results[[1]] # result of first query in file1.SQL 

The read_sql() function handles single and multi-line comments, including comment delimiters within quotes, so SELECT * FROM tab WHERE var = '--oops' will be executed as is.

(disclosure: I wrote sqlhelper)

Comments

0

The glue package is excellent for working with sql including for parameterized queries.

my_script.sql

 /* Notice how comments and additional lines are fine in your sql file */ SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees -- in-line comments are also ok to be left in WHERE HireDate >= {hire_date} -- dynamic input will be filled in using the glue package /* Make sure to leave a blank line at the end of the file */ 

Function to read SQL script

library(glue) get_sql <- function(con, file) { sql_txt <- readLines(file) |> paste(collapse = "\n") sql <- glue_sql(sql_txt, .con = con) return(sql) } 

Putting it together

conn <- connectDb() # your connection object sql_file <- "my_script.sql" hire_date <- "1-july-1993" qry <- get_sql(conn, sql_file) print(qry) 
<SQL> /* Notice how comments and additional lines are fine in your sql file */ SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees -- in-line comments are also ok to be left in WHERE HireDate >= '1-july-1993' -- dynamic input will be filled in using the glue package /* Make sure to leave a blank line at the end of the file */ 

This is a SQL class object ready to be executed on the database e.g.

res <- DBI::dbGetQuery(conn, qry) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.