Skip to main content
fixed quote characters to non-directional within code blocks
Source Link

I didn't see this question initially and asked a similar question a few days later. I am going to take my previous question down, but I thought I'd add an answer here to explain how I used sqldf() to do this.

There's been little bit of discussion as to the best way to import 2GB or more of text data into an R data frame. Yesterday I wrote a blog post about using sqldf() to import the data into SQLite as a staging area, and then sucking it from SQLite into R. This works really well for me. I was able to pull in 2GB (3 columns, 40mm rows) of data in < 5 minutes. By contrast, the read.csv command ran all night and never completed.

Here's my test code:

Set up the test data:

bigdf <- data.frame(dim=sample(letters, replace=T, 4e7), fact1=rnorm(4e7), fact2=rnorm(4e7, 20, 50)) write.csv(bigdf, ‘bigdf'bigdf.csv’csv', quote = F) 

I restarted R before running the following import routine:

library(sqldf) f <- file(”bigdf"bigdf.csv”csv") system.time(bigdf <- sqldf(”select"select * from f”f", dbname = tempfile(), file.format = list(header = T, row.names = F))) 

I let the following line run all night but it never completed:

system.time(big.df <- read.csv(’bigdf'bigdf.csv’csv')) 

I didn't see this question initially and asked a similar question a few days later. I am going to take my previous question down, but I thought I'd add an answer here to explain how I used sqldf() to do this.

There's been little bit of discussion as to the best way to import 2GB or more of text data into an R data frame. Yesterday I wrote a blog post about using sqldf() to import the data into SQLite as a staging area, and then sucking it from SQLite into R. This works really well for me. I was able to pull in 2GB (3 columns, 40mm rows) of data in < 5 minutes. By contrast, the read.csv command ran all night and never completed.

Here's my test code:

Set up the test data:

bigdf <- data.frame(dim=sample(letters, replace=T, 4e7), fact1=rnorm(4e7), fact2=rnorm(4e7, 20, 50)) write.csv(bigdf, ‘bigdf.csv’, quote = F) 

I restarted R before running the following import routine:

library(sqldf) f <- file(”bigdf.csv”) system.time(bigdf <- sqldf(”select * from f”, dbname = tempfile(), file.format = list(header = T, row.names = F))) 

I let the following line run all night but it never completed:

system.time(big.df <- read.csv(’bigdf.csv’)) 

I didn't see this question initially and asked a similar question a few days later. I am going to take my previous question down, but I thought I'd add an answer here to explain how I used sqldf() to do this.

There's been little bit of discussion as to the best way to import 2GB or more of text data into an R data frame. Yesterday I wrote a blog post about using sqldf() to import the data into SQLite as a staging area, and then sucking it from SQLite into R. This works really well for me. I was able to pull in 2GB (3 columns, 40mm rows) of data in < 5 minutes. By contrast, the read.csv command ran all night and never completed.

Here's my test code:

Set up the test data:

bigdf <- data.frame(dim=sample(letters, replace=T, 4e7), fact1=rnorm(4e7), fact2=rnorm(4e7, 20, 50)) write.csv(bigdf, 'bigdf.csv', quote = F) 

I restarted R before running the following import routine:

library(sqldf) f <- file("bigdf.csv") system.time(bigdf <- sqldf("select * from f", dbname = tempfile(), file.format = list(header = T, row.names = F))) 

I let the following line run all night but it never completed:

system.time(big.df <- read.csv('bigdf.csv')) 
URL shortener cleanup
Source Link
Chris Frederick
  • 5.6k
  • 3
  • 38
  • 44

I didn't see thethis question initially and asked a similiarsimilar question a few days later. I am going to take my previous question down, but I thought I'd add an answer here to this threadexplain how I used sqldf()sqldf() to do this.

There's been little bit of discussion as to the best way to import 2GB or more of text data into an R data frame. Yesterday I wrote a blog postblog post about using sqldf()sqldf() to import the data into sqliteSQLite as a staging area, and then sucking it from sqliteSQLite into R. This works really well for me. I was able to pull in 2GB (3 columns, 40mm rows) of data in < 5 minutes. By contrast, the read.csvread.csv command ran all night and never completed.

Here's my test code:

setSet up the test data:

bigdf <- data.frame(dim=sample(letters, replace=T, 4e7), fact1=rnorm(4e7), fact2=rnorm(4e7, 20, 50)) write.csv(bigdf, ‘bigdf.csv’, quote = F) 

I restarted R before running the following import routine:

library(sqldf) f <- file(”bigdf.csv”) system.time(bigdf <- sqldf(”select * from f”, dbname = tempfile(), file.format = list(header = T, row.names = F))) 

I let the following line run all night but it never completed:

system.time(big.df <- read.csv(’bigdf.csv’)) 

I didn't see the question initially and asked a similiar question a few days later. I am going to take my previous question down, but thought I'd add to this thread how I used sqldf() to do this.

There's been little bit of discussion as to the best way to import 2GB or more of text data into an R data frame. Yesterday I wrote a blog post about using sqldf() to import the data into sqlite as a staging area, and then sucking it from sqlite into R. This works really well for me. I was able to pull in 2GB (3 columns, 40mm rows) of data in < 5 minutes. By contrast, the read.csv command ran all night and never completed.

Here's my test code:

set up the test data:

bigdf <- data.frame(dim=sample(letters, replace=T, 4e7), fact1=rnorm(4e7), fact2=rnorm(4e7, 20, 50)) write.csv(bigdf, ‘bigdf.csv’, quote = F) 

I restarted R before running the following import routine:

library(sqldf) f <- file(”bigdf.csv”) system.time(bigdf <- sqldf(”select * from f”, dbname = tempfile(), file.format = list(header = T, row.names = F))) 

I let the following line run all night but it never completed:

system.time(big.df <- read.csv(’bigdf.csv’)) 

I didn't see this question initially and asked a similar question a few days later. I am going to take my previous question down, but I thought I'd add an answer here to explain how I used sqldf() to do this.

There's been little bit of discussion as to the best way to import 2GB or more of text data into an R data frame. Yesterday I wrote a blog post about using sqldf() to import the data into SQLite as a staging area, and then sucking it from SQLite into R. This works really well for me. I was able to pull in 2GB (3 columns, 40mm rows) of data in < 5 minutes. By contrast, the read.csv command ran all night and never completed.

Here's my test code:

Set up the test data:

bigdf <- data.frame(dim=sample(letters, replace=T, 4e7), fact1=rnorm(4e7), fact2=rnorm(4e7, 20, 50)) write.csv(bigdf, ‘bigdf.csv’, quote = F) 

I restarted R before running the following import routine:

library(sqldf) f <- file(”bigdf.csv”) system.time(bigdf <- sqldf(”select * from f”, dbname = tempfile(), file.format = list(header = T, row.names = F))) 

I let the following line run all night but it never completed:

system.time(big.df <- read.csv(’bigdf.csv’)) 
deleted 59 characters in body
Source Link
JD Long
  • 61k
  • 58
  • 209
  • 300

I didn't see the question initially and asked a similiar question a few days later. I am going to take my previous question down, but thought I'd add to this thread how I used sqldf() to do this.

There's been little bit of discussionlittle bit of discussion as to the best way to import 2GB or more of text data into an R data frame. Yesterday I wrote a blog post about using sqldf() to import the data into sqlite as a staging area, and then sucking it from sqlite into R. This works really well for me. I was able to pull in 2GB (3 columns, 40mm rows) of data in < 5 minutes. By contrast, the read.csv command ran all night and never completed.

Here's my test code:

set up the test data:

bigdf <- data.frame(dim=sample(letters, replace=T, 4e7), fact1=rnorm(4e7), fact2=rnorm(4e7, 20, 50)) write.csv(bigdf, ‘bigdf.csv’, quote = F) 

I restarted R before running the following import routine:

library(sqldf) f <- file(”bigdf.csv”) system.time(bigdf <- sqldf(”select * from f”, dbname = tempfile(), file.format = list(header = T, row.names = F))) 

I let the following line run all night but it never completed:

system.time(big.df <- read.csv(’bigdf.csv’)) 

I didn't see the question initially and asked a similiar question a few days later. I am going to take my previous question down, but thought I'd add to this thread how I used sqldf() to do this.

There's been little bit of discussion as to the best way to import 2GB or more of text data into an R data frame. Yesterday I wrote a blog post about using sqldf() to import the data into sqlite as a staging area, and then sucking it from sqlite into R. This works really well for me. I was able to pull in 2GB (3 columns, 40mm rows) of data in < 5 minutes. By contrast, the read.csv command ran all night and never completed.

Here's my test code:

set up the test data:

bigdf <- data.frame(dim=sample(letters, replace=T, 4e7), fact1=rnorm(4e7), fact2=rnorm(4e7, 20, 50)) write.csv(bigdf, ‘bigdf.csv’, quote = F) 

I restarted R before running the following import routine:

library(sqldf) f <- file(”bigdf.csv”) system.time(bigdf <- sqldf(”select * from f”, dbname = tempfile(), file.format = list(header = T, row.names = F))) 

I let the following line run all night but it never completed:

system.time(big.df <- read.csv(’bigdf.csv’)) 

I didn't see the question initially and asked a similiar question a few days later. I am going to take my previous question down, but thought I'd add to this thread how I used sqldf() to do this.

There's been little bit of discussion as to the best way to import 2GB or more of text data into an R data frame. Yesterday I wrote a blog post about using sqldf() to import the data into sqlite as a staging area, and then sucking it from sqlite into R. This works really well for me. I was able to pull in 2GB (3 columns, 40mm rows) of data in < 5 minutes. By contrast, the read.csv command ran all night and never completed.

Here's my test code:

set up the test data:

bigdf <- data.frame(dim=sample(letters, replace=T, 4e7), fact1=rnorm(4e7), fact2=rnorm(4e7, 20, 50)) write.csv(bigdf, ‘bigdf.csv’, quote = F) 

I restarted R before running the following import routine:

library(sqldf) f <- file(”bigdf.csv”) system.time(bigdf <- sqldf(”select * from f”, dbname = tempfile(), file.format = list(header = T, row.names = F))) 

I let the following line run all night but it never completed:

system.time(big.df <- read.csv(’bigdf.csv’)) 
Source Link
JD Long
  • 61k
  • 58
  • 209
  • 300
Loading