0

I am looking at parallel processing in R and was wondering if I could read in multiple txt files in parallel versus doing it sequentially. Reason for this is I have a shiny application and I want to cut down on the loading time and a large chunk is coming from loading the files.

Current situation:

Shipments_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_month.txt', fill = TRUE) ShipmentsYear_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_year.txt', fill = TRUE) Open_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_wip.txt', fill = TRUE) WIP_Short_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_short.txt', fill = TRUE) WIP_RTQT_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_sno_tasks_year.txt', fill = TRUE) Invoiced_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_inv.txt', fill = TRUE) 

I have seen examples of running in parallel but they all end with combining all of the files. Each file I import, I want as a separate dataframe.

Here are some examples:

How do you read in multiple .txt files into R?

https://www.r-bloggers.com/import-all-text-files-in-a-folder-with-parallel-execution/

Ideal situation (although I know this isn't the code):

RunParallel { Shipments_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_month.txt', fill = TRUE) ShipmentsYear_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_year.txt', fill = TRUE) Open_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_wip.txt', fill = TRUE) WIP_Short_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_short.txt', fill = TRUE) WIP_RTQT_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_sno_tasks_year.txt', fill = TRUE) Invoiced_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_inv.txt', fill = TRUE) } 

After comment from below

 tic <- Sys.time() Shipments_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_month.txt', fill = TRUE) ShipmentsYear_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_year.txt', fill = TRUE) Open_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_wip.txt', fill = TRUE) WIP_Short_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_short.txt', fill = TRUE) WIP_RTQT_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_sno_tasks_year.txt', fill = TRUE) Invoiced_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_inv.txt', fill = TRUE) toc <- Sys.time() Sequential <- toc - tic tic <- Sys.time() file <- c("/srv/samba/share/SAP data//_zmrosales_ship_month.txt", "/srv/samba/share/SAP data//_zmrosales_ship_year.txt", "/srv/samba/share/SAP data//_zmrosales_inv.txt", "/srv/samba/share/SAP data//_zmrosales_wip.txt", "/srv/samba/share/SAP data//_zmro_short.txt", "/srv/samba/share/SAP data//_zmro_sno_tasks_year.txt") x2 <- lapply(file, data.table::fread) Shipments_Raw <- as.data.frame(x2[1]) ShipmentsYear_Raw <- as.data.frame(x2[2]) Invoiced_Raw <- as.data.frame(x2[3]) Open_Raw <- as.data.frame(x2[4]) WIP_Short_Raw <- as.data.frame(x2[5]) WIP_RTQT_Raw <- as.data.frame(x2[6]) toc <- Sys.time() Lapply <- toc - tic Sequential Lapply 

Difference in time:

> Sequential Time difference of 6.011156 secs > Lapply Time difference of 0.8015034 secs 
2
  • 1
    lapply(files, data.table::fread) Commented Apr 4, 2018 at 14:20
  • Awesome! I didn't realize that gave me the data sets in elements and not merge them. If you put your answer below, I'll mark it as correct Commented Apr 4, 2018 at 15:50

1 Answer 1

1

Just use lapply in combination with data.tables super fast fread:

lapply(files, data.table::fread) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.