Read columns of a csv file using shell or pipe inside R - Windows

Question

I'm looking for a way of reading only a few columns from a csv file into R using shell() or pipe. I found this thread that explains how to accomplish that on Linux: Quicker way to read single column of CSV file

On Linux this works adding the what argument:

a <-as.data.frame(scan(pipe("cut -f1,2 -d, Main.csv"), what=list("character","character"),sep= ","))

However this doesn't seem to work on Windows. When using pipe("cut -f1 -d, Main.csv") the connection gets opened but it doesn't return anything.

What would be the functions/syntax I need to use in order to make this work on Windows.

Is it possible to accomplish this by using shell()?

G. Grothendieck · Accepted Answer · 2022-03-02 00:06:23Z

7

Make sure that cut is on your path - its in Rtools. This works for me:

# check that cut is availble Sys.which("cut") # create test data Lines <- "a,b,c 1,2,3 4,5,6" cat(Lines, file = "in.csv") # read it DF <- read.csv(pipe("cut -f1,2 -d, in.csv"))

Added

Rtools is now Rtools40 and cut is at C:\Rtools40\usr\bin\cut.exe .

edited Mar 2, 2022 at 0:06

answered May 30, 2014 at 21:18

G. Grothendieck

273k18 gold badges221 silver badges365 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Diego Over a year ago

Thank you G. I think I don't have installed RTools in my windows because it uses Cygwin and I had troubles in the past with Ruby's active record library. I'll have to double-check that when I get back to my Windows machine, but if that works for you probably that's the problem. Thanks again.

G. Grothendieck Over a year ago

You can install Rtools without putting it on your path. In that case its unlikely there would be any conflicts. Once installed that way try: DF <- read.csv(pipe("\\Rtools\\bin\\cut -f1,2 -d, in.csv"))

Diego Over a year ago

Nice G. It works well. Below you can see the results of my tests. For a reason I don't understand yet, the number of rows is not accurate with most of the implementations. Only fread() was accurate and faster than the default read.csv(). Do you know the reason, or how I can improve that behaviour?

G. Grothendieck Over a year ago

Try cutting down the example data until you get it to produce a not-intended number of rows with only a few rows of input.

Diego Over a year ago

Thank you G. Not sure if I'm going to explore this, but certainly if I do it I'll post back. Thanks again

Diego · Accepted Answer · 2014-05-31 11:19:29Z

> system.time(a <- read.csv("in.csv")) user system elapsed 1.24 0.04 1.26 > dim(a) [1] 4706 46 > system.time(b <-read.csv(pipe("C:/Rtools/bin/cut -f1,2 -d, in.csv"))) user system elapsed 0.22 1.27 2.37 Warning message: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : EOF within quoted string > dim(b) [1] 2726 2 > system.time(d <-as.data.frame(scan(pipe("C:/Rtools/bin/cut -f1,2 -d, in.csv"), + what=list("character","character"),sep= ","))) Read 1715 records user system elapsed 0.31 1.19 2.47 Warning message: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : EOF within quoted string > dim(d) [1] 1715 2 > library(data.table) data.table 1.9.2 For help type: help("data.table") Warning message: closing unused connection 3 (C:\Windows\system32\cmd.exe /c C:/Rtools/bin/cut -f1,2 -d, in.csv) > system.time(e <-fread("C:/Rtools/bin/cut -f1,2 -d, in.csv")) user system elapsed 0.02 0.01 0.80 > dim(e) [1] 4706 2

It is more clean to use library(rbenchmark). Then you can show all functions at a time. Example: benchmark(f1(x), f2(x), f3(x), f4(x), columns = c("test", "replications", "elapsed", "relative"), order = "relative", relative = "elapsed", replications = 10). (In this case substitue f1(x) by fread("C:/Rtools/bin/cut -f1,2 -d, in.csv")), and so on. Anyway, data.table wins. +1

Collectives™ on Stack Overflow

Read columns of a csv file using shell or pipe inside R - Windows

2 Answers 2

Added

5 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Added

5 Comments

2 Comments

Linked

Related