Running R scripts
I've written an R script to take arguments of a file name and sum the lines.
#! /usr/local/bin/R file=commandArgs(trailingOnly=TRUE)[1] sum(as.numeric(readLines(file)))
This can be sped up with the "data.table" or "vroom" package as follows:
#! /usr/local/bin/R file=commandArgs(trailingOnly=TRUE)[1] sum(data.table::fread(file))
#! /usr/local/bin/R file=commandArgs(trailingOnly=TRUE)[1] sum(vroom::vroom(file))
Benchmarking
Same benchmarking data as @glenn jackman.
for ((i=0; i<1000000; i++)) ; do echo $RANDOM; done > random_numbers
In comparison to the R call above, running R 3.5.0 as a script is comparable to other methods (on the same Linux Debian server).
$ time R -e 'sum(scan("random_numbers"))' 0.37s user 0.04s system 86% cpu 0.478 total
R script with readLines
$ time Rscript sum.R random_numbers 0.53s user 0.04s system 84% cpu 0.679 total
R script with data.table
$ time Rscript sum.R random_numbers 0.30s user 0.05s system 77% cpu 0.453 total
R script with vroom
$ time Rscript sum.R random_numbers 0.54s user 0.11s system 93% cpu 0.696 total
Comparison with other languages
For reference here as some other methods suggested on the same hardware
Python 2 (2.7.13)
$ time python2 -c "import sys; print sum((float(l) for l in sys.stdin))" < random_numbers 0.27s user 0.00s system 89% cpu 0.298 total
Python 3 (3.6.8)
$ time python3 -c "import sys; print(sum((float(l) for l in sys.stdin)))" < random_number 0.37s user 0.02s system 98% cpu 0.393 total
Ruby (2.3.3)
$ time ruby -e 'sum = 0; File.foreach(ARGV.shift) {|line| sum+=line.to_i}; puts sum' random_numbers 0.42s user 0.03s system 72% cpu 0.625 total
Perl (5.24.1)
$ time perl -nle '$sum += $_ } END { print $sum' random_numbers 0.24s user 0.01s system 99% cpu 0.249 total
Awk (4.1.4)
$ time awk '{ sum += $0 } END { print sum }' random_numbers 0.26s user 0.01s system 99% cpu 0.265 total $ time awk '{ sum += $1 } END { print sum }' random_numbers 0.34s user 0.01s system 99% cpu 0.354 total
C (clang version 3.3; gcc (Debian 6.3.0-18) 6.3.0 )
$ gcc sum.c -o sum && time ./sum < random_numbers 0.10s user 0.00s system 96% cpu 0.108 total
Update with additional languages
Lua (5.3.5)
$ time lua -e 'sum=0; for line in io.lines() do sum=sum+line end; print(sum)' < random_numbers 0.30s user 0.01s system 98% cpu 0.312 total
tr (8.26) must be timed in bash, not compatible with zsh
$time { { tr "\n" + < random_numbers ; echo 0; } | bc; } real 0m0.494s user 0m0.488s sys 0m0.044s
sed (4.4) must be timed in bash, not compatible with zsh
$ time { head -n 10000 random_numbers | sed ':a;N;s/\n/+/;ta' |bc; } real 0m0.631s user 0m0.628s sys 0m0.008s $ time { head -n 100000 random_numbers | sed ':a;N;s/\n/+/;ta' |bc; } real 1m2.593s user 1m2.588s sys 0m0.012s
note: sed calls seem to work faster on systems with more memory available (note smaller datasets used for benchmarking sed)
Julia (0.5.0)
$ time julia -e 'print(sum(readdlm("random_numbers")))' 3.00s user 1.39s system 136% cpu 3.204 total $ time julia -e 'print(sum(readtable("random_numbers")))' 0.63s user 0.96s system 248% cpu 0.638 total
Notice that as in R, file I/O methods have different performance.
awkandbc). These all finished adding a million numbers up in less than 10 seconds. Take a look at those and see how it can be done in pure shell.