I didn't see this question initially and asked a similar question a few days later. I am going to take my previous question down, but I thought I'd add an answer here to explain how I used sqldf() to do this.
There's been little bit of discussion as to the best way to import 2GB or more of text data into an R data frame. Yesterday I wrote a blog post about using sqldf() to import the data into SQLite as a staging area, and then sucking it from SQLite into R. This works really well for me. I was able to pull in 2GB (3 columns, 40mm rows) of data in < 5 minutes. By contrast, the read.csv command ran all night and never completed.
Here's my test code:
Set up the test data:
bigdf <- data.frame(dim=sample(letters, replace=T, 4e7), fact1=rnorm(4e7), fact2=rnorm(4e7, 20, 50)) write.csv(bigdf, ‘bigdf'bigdf.csv’csv', quote = F) I restarted R before running the following import routine:
library(sqldf) f <- file(”bigdf"bigdf.csv”csv") system.time(bigdf <- sqldf(”select"select * from f”f", dbname = tempfile(), file.format = list(header = T, row.names = F))) I let the following line run all night but it never completed:
system.time(big.df <- read.csv(’bigdf'bigdf.csv’csv'))