How do you read multiple .txt files into R? [duplicate]

Question

I'm using R to visualize some data all of which is in .txt format. There are a few hundred files in a directory and I want to load it all into one table, in one shot.

Any help?

EDIT:

Listing the files is not a problem. But I am having trouble going from list to content. I've tried some of the code from here, but I get a bug with this part:

all.the.data <- lapply( all.the.files, txt , header=TRUE)

saying

 Error in match.fun(FUN) : object 'txt' not found

Any snippets of code that would clarify this problem would be greatly appreciated.

The problem is txt is not a function. The link you pointed to is about the read.csv function. — Wok
– Wok, Commented Aug 3, 2010 at 17:56

Greg · Accepted Answer · 2010-08-03 16:24:13Z

43

You can try this:

filelist = list.files(pattern = ".*.txt") #assuming tab separated values with a header datalist = lapply(filelist, function(x)read.table(x, header=T)) #assuming the same header/columns for all files datafr = do.call("rbind", datalist)

answered Aug 3, 2010 at 16:24

Greg

11.8k5 gold badges43 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

RockScience Over a year ago

slightly cleaner: lapply(filelist, FUN=read.table, header=TRUE)

joffie Over a year ago

Is there a way to add the filenames using this approach? So that the column headers of each dataframe start with (a part of) the filename?

nouse Over a year ago

Yes, this method has the problem that the filenames of datalist remains empty.

Tung · Accepted Answer · 2021-07-12 21:52:02Z

There are three fast ways to read multiple files and put them into a single data frame or data table

First get the list of all txt files (including those in sub-folders)

list_of_files <- list.files(path = ".", recursive = TRUE, pattern = "\\.txt$", full.names = TRUE)

1) Use fread() w/ rbindlist() from the data.table package

#install.packages("data.table", repos = "https://cran.rstudio.com") library(data.table) # Read all the files and create a FileName column to store filenames DT <- rbindlist(sapply(list_of_files, fread, simplify = FALSE), use.names = TRUE, idcol = "FileName")

2) Use readr::read_table2() w/ purrr::map_df() from the tidyverse framework:

#install.packages("tidyverse", # dependencies = TRUE, repos = "https://cran.rstudio.com") library(tidyverse) # Read all the files and create a FileName column to store filenames df <- list_of_files %>% set_names(.) %>% map_df(read_table2, .id = "FileName")

3) (Probably the fastest out of the three) Use vroom::vroom():

#install.packages("vroom", # dependencies = TRUE, repos = "https://cran.rstudio.com") library(vroom) # Read all the files and create a FileName column to store filenames df <- vroom(list_of_files, .id = "FileName")

Note: to clean up file names, use basename or gsub functions

Benchmark: readr vs data.table vs vroom for big data

Edit 1: to read multiple csv files and skip the header using readr::read_csv

list_of_files <- list.files(path = ".", recursive = TRUE, pattern = "\\.csv$", full.names = TRUE) df <- list_of_files %>% purrr::set_names(nm = (basename(.) %>% tools::file_path_sans_ext())) %>% purrr::map_df(read_csv, col_names = FALSE, skip = 1, .id = "FileName")

Edit 2: to convert a pattern including a wildcard into the equivalent regular expression, use glob2rx()

How can I select only first three variables/columns of the list_of_files?
If you use fread: use select = c(1:3) or select = c("colname 1", "colname 2", "colname 3"). If you use read_table2, check the argument col_types = cols_only(colname1 = "i", colname2 = "d") where i is integer and d is double. HTH
See my recent answer for more options for cleaning up filenames stackoverflow.com/a/49546846/786542

Ken Benoit · Accepted Answer · 2017-07-28 15:42:05Z

11

There is a really, really easy way to do this now: the readtext package.

readtext::readtext("path_to/your_files/*.txt")

It really is that easy.

answered Jul 28, 2017 at 15:42

Ken Benoit

14.9k31 silver badges51 bronze badges

2 Comments

EcologyTom Over a year ago

This is a nice function, but readtext will just import all of the text into a single column. In most cases there will be additional manipulation required after this to make the data usable.

Ken Benoit Over a year ago

True, that's what the quanteda package is for.

Dirk is no longer here · Accepted Answer · 2010-08-03 15:14:39Z

Look at the help for functions dir() aka list.files(). This allows you get a list of files, possibly filtered by regular expressions, over which you could loop.

If you want to them all at once, you first have to have content in one file. One option would be to use cat to type all files to stdout and read that using popen(). See help(Connections) for more.

Eric Brotto · Accepted Answer · 2010-08-03 20:01:41Z

Thanks for all the answers!

In the meanwhile, I also hacked a method on my own. Let me know if it is any useful:

library(foreign) setwd("/path/to/directory") files <-list.files() data <- 0 for (f in files) { tempData = scan( f, what="character") data <- c(data,tempData) }

Collectives™ on Stack Overflow

How do you read multiple .txt files into R? [duplicate]

5 Answers 5

3 Comments

7 Comments

2 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

7 Comments

2 Comments

Comments

Comments

Linked

Related