6

I have a 14GB data.txt file. I was comparing the speed of fread and read.table by reading the first 1M rows. It looks like fread is much slower although it is not supposed to be. It takes some time until the percentage counts show up.

What could be the reason? I thought it was supposed to be super fast... I am using a Windows OS computer.

3
  • 2
    Define "much slower" - if it's measured in microseconds then I wouldn't be losing sleep. Also, without example code noone can verify what you're doing. Commented Aug 28, 2015 at 5:04
  • @thelatemail: I have a data table 100M rows, 60 columns. This is 14 GB. When I read first 1M rows, it takes 1.5-2 mins (there is a wait time until the percentage count shows) whereas read.table takes less than a minute. Irrespective of this comparison, I have been hearing from others that fread is reading their 4GB table in 40 sec. There is something wrong that I can't figure out. Commented Aug 28, 2015 at 5:13
  • This is the code I use: data=read.table('data.txt',sep=',',nrow=1000000,header=TRUE,stringsAsFactors=FALSE) data=fread('data.txt',sep=',',nrow=1000000) Commented Aug 28, 2015 at 5:14

1 Answer 1

9

fread mmaps the file. This takes some time, and will map the whole file. This means subsequent "read-ins" will be faster.

read.table does not mmap the whole file. It can read in the file line by line [and stop at line 1000000].

You can see some background on mmap at mmap() vs. reading blocks

The examples in the help from fread highlight this behaiviour

Sign up to request clarification or add additional context in comments.

4 Comments

So if I will read the file only once, can we say that using fread won't give much of an advantage?
@KTY, if you are only trying to read in the first million lines, and only once, then you may have found a case where fread won't give and advantage. If you want to read the whole file, or read the rest of the lines in subsequently, then fread should almost definitely be faster.
yes, it seems like the main difference comes when reading big files...now reading the whole 14GB file, it is very fast compared to read.table. Thanks for the information on mmap.
@KTY We could speed up reading the first N rows. Just wasn't a priority as normally you want to read the whole file. I filed a feature request #1300.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.