273

I have a file which contains several thousand numbers, each on its own line:

34 42 11 6 2 99 ... 

I'm looking to write a script which will print the sum of all numbers in the file. I've got a solution, but it's not very efficient. (It takes several minutes to run.) I'm looking for a more efficient solution. Any suggestions?

6
  • 6
    What was your slow solution? Maybe we can help you figure out what was slow about it. :) Commented Apr 23, 2010 at 23:59
  • 4
    @brian d foy, I'm too embarrassed to post it. I know why it's slow. It's because I call "cat filename | head -n 1" to get the top number, add it to a running total, and call "cat filename | tail..." to remove the top line for the next iteration... I have a lot to learn about programming!!! Commented Apr 24, 2010 at 1:22
  • 9
    That's...very systematic. Very clear and straight forward, and I love it for all that it is a horrible abomination. Built, I assume, out of the tools that you knew when you started, right? Commented Apr 24, 2010 at 2:43
  • 4
    full duplicate: stackoverflow.com/questions/450799/… Commented Apr 26, 2010 at 11:39
  • @MarkRoberts It must have taken you a long while to work that out. It's a very cleaver problem solving technique, and oh so wrong. It looks like a classic case of over think. Several of Glen Jackman's solutions shell scripting solutions (and two are pure shell that don't use things like awk and bc). These all finished adding a million numbers up in less than 10 seconds. Take a look at those and see how it can be done in pure shell. Commented Aug 22, 2013 at 14:24

34 Answers 34

480

You can use awk:

awk '{ sum += $1 } END { print sum }' file 
Sign up to request clarification or add additional context in comments.

6 Comments

program exceeded: maximum number of field sizes: 32767
With the -F '\t' option if your fields contain spaces and are separated by tabs.
Please mark this as the best answer. It also works if you want to sum the first value in each row, inside a TSV (tab-separated value) file.
If you have big numbers: awk 'BEGIN {OFMT = "%.0f"} { sum += $1 } END { print sum }' filename
@EthanFurman I actually have a tab delimited file as you explained but not able to make -F '\t' do the magic. Where exactly is the option meant to be inserted? I have it like this awk -F '\t' '{ sum += $0 } END { print sum }' file
|
147

None of the solution thus far use paste. Here's one:

paste -sd+ filename | bc 

If the file has a trailing newline, a trailing + will incur a syntax error. Fix the error by removing the trailing +:

paste -sd+ fiilename | sed 's/+$//g' | bc 

As an example, calculate Σn where 1<=n<=100000:

$ seq 100000 | paste -sd+ | bc -l 5000050000 

(For the curious, seq n would print a sequence of numbers from 1 to n given a positive number n.)

2 Comments

seq 100000 | paste -sd+ - | bc -l on Mac OS X Bash shell. And this is by far the sweetest and the unixest solution!
@SimoA. I vote that we use the term unixiest in place of unixest because to the sexiest solution is always the unixiest ;)
122

For a Perl one-liner, it's basically the same thing as the awk solution in Ayman Hourieh's answer:

 % perl -nle '$sum += $_ } END { print $sum' 

If you're curious what Perl one-liners do, you can deparse them:

 % perl -MO=Deparse -nle '$sum += $_ } END { print $sum' 

The result is a more verbose version of the program, in a form that no one would ever write on their own:

BEGIN { $/ = "\n"; $\ = "\n"; } LINE: while (defined($_ = <ARGV>)) { chomp $_; $sum += $_; } sub END { print $sum; } -e syntax OK 

Just for giggles, I tried this with a file containing 1,000,000 numbers (in the range 0 - 9,999). On my Mac Pro, it returns virtually instantaneously. That's too bad, because I was hoping using mmap would be really fast, but it's just the same time:

use 5.010; use File::Map qw(map_file); map_file my $map, $ARGV[0]; $sum += $1 while $map =~ m/(\d+)/g; say $sum; 

6 Comments

Wow, that shows a deep understanding on what code -nle actually wraps around the string you give it. My initial thought was that you shouldn't post while intoxicated but then I noticed who you were and remembered some of your other Perl answers :-)
-n and -p just put characters around the argument to -e, so you can use those characters for whatever you want. We have a lot of one-liners that do interesting things with that in Effective Perl Programming (which is about to hit the shelves).
Nice, what are these non-matching curly braces about?
-n adds the while { } loop around your program. If you put } ... { inside, then you have while { } ... { }. Evil? Slightly.
Big bonus for highlighting the -MO=Deparse option! Even though on a separate topic.
|
103

Just for fun, let's benchmark it:

$ for ((i=0; i<1000000; i++)) ; do echo $RANDOM; done > random_numbers $ time perl -nle '$sum += $_ } END { print $sum' random_numbers 16379866392 real 0m0.226s user 0m0.219s sys 0m0.002s $ time awk '{ sum += $1 } END { print sum }' random_numbers 16379866392 real 0m0.311s user 0m0.304s sys 0m0.005s $ time { { tr "\n" + < random_numbers ; echo 0; } | bc; } 16379866392 real 0m0.445s user 0m0.438s sys 0m0.024s $ time { s=0;while read l; do s=$((s+$l));done<random_numbers;echo $s; } 16379866392 real 0m9.309s user 0m8.404s sys 0m0.887s $ time { s=0;while read l; do ((s+=l));done<random_numbers;echo $s; } 16379866392 real 0m7.191s user 0m6.402s sys 0m0.776s $ time { sed ':a;N;s/\n/+/;ta' random_numbers|bc; } ^C real 4m53.413s user 4m52.584s sys 0m0.052s 

I aborted the sed run after 5 minutes


I've been diving to , and it is speedy:

$ time lua -e 'sum=0; for line in io.lines() do sum=sum+line end; print(sum)' < random_numbers 16388542582.0 real 0m0.362s user 0m0.313s sys 0m0.063s 

and while I'm updating this, ruby:

$ time ruby -e 'sum = 0; File.foreach(ARGV.shift) {|line| sum+=line.to_i}; puts sum' random_numbers 16388542582 real 0m0.378s user 0m0.297s sys 0m0.078s 

Heed Ed Morton's advice: using $1

$ time awk '{ sum += $1 } END { print sum }' random_numbers 16388542582 real 0m0.421s user 0m0.359s sys 0m0.063s 

vs using $0

$ time awk '{ sum += $0 } END { print sum }' random_numbers 16388542582 real 0m0.302s user 0m0.234s sys 0m0.063s 

4 Comments

+1: For coming up with a bunch of solutions, and benchmarking them.
time cat random_numbers|paste -sd+|bc -l real 0m0.317s user 0m0.310s sys 0m0.013s
that should be just about identical to the tr solution.
Your awk script should execute a bit faster if you use $0 instead of $1 since awk does field splitting (which obviously takes time) if any field is specifically mentioned in the script but doesn't otherwise.
42

Another option is to use jq:

$ seq 10|jq -s add 55 

-s (--slurp) reads the input lines into an array.

2 Comments

Wonderful solution. I had a tab delimited file where I wanted to sum column 6. Did that with the following command: awk '{ print $6 }' myfile.log | jq -s add
This is amazing. Never knew jq could do this. Glad this is a thing. And the benchmark that @glenn-jackman left in his answer also scores sub 0.5s real time, so the difference in performance between awk and jq is actually insignificant.
11

This is straight Bash:

sum=0 while read -r line do (( sum += line )) done < file echo $sum 

1 Comment

And it's probably one of the slowest solutions and therefore not so suitable for large amounts of numbers.
11

I prefer to use GNU datamash for such tasks because it's more succinct and legible than perl or awk. For example

datamash sum 1 < myfile 

where 1 denotes the first column of data.

2 Comments

This does not appear to be a standard component as I do not see it in my Ubuntu installation. Would like to see it benchmarked, though.
It seems the fastest from general purpose progs by far to me! For seq 10000000, awk with $0 takes 2.1 sec, python 1.9 sec, perl 1.5 sec, but datamash amazing 0.9 sec. Only the custom written C answer was better at 0.8 sec.
9

Raku

say sum lines 
~$ raku -e '.say for 0..1000000' > test.in ~$ raku -e 'say sum lines' < test.in 500000500000 

The way this works is that lines produces a sequence of strings which are the input lines.
sum takes that sequence, turns each line into a number and adds them together.
All that is left is for say to print out that value followed by a newline. (It could have been print or put, but say is more alliterative.)

Comments

8

I prefer to use R for this:

$ R -e 'sum(scan("filename"))' 

1 Comment

I'm a fan of R for other applications but it's not good for performance in this way. File I/O is a major issue. I've tested passing args to a script which can be sped up using the vroom package. I'll post more details when I've benchmarked some other scripts on the same server.
7

Here's another one-liner

( echo 0 ; sed 's/$/ +/' foo ; echo p ) | dc 

This assumes the numbers are integers. If you need decimals, try

( echo 0 2k ; sed 's/$/ +/' foo ; echo p ) | dc 

Adjust 2 to the number of decimals needed.

Comments

7

C always wins for speed:

#include <stdio.h> #include <stdlib.h> int main(int argc, char **argv) { ssize_t read; char *line = NULL; size_t len = 0; double sum = 0.0; while (read = getline(&line, &len, stdin) != -1) { sum += atof(line); } printf("%f\n", sum); return 0; } 

Timing for 1M numbers (same machine/input as my python answer):

$ gcc sum.c -o sum && time ./sum < numbers 5003371677.000000 real 0m0.188s user 0m0.180s sys 0m0.000s 

2 Comments

Best answer! Best speed)
Using sum.c and GNU Parallel: seq 1077139031 > 10gb; time parallel --pipepart --block -1 -a 10gb sum | sum = 10 secs or ~100M numbers/sec on a 64 core machine.
6
$ perl -MList::Util=sum -le 'print sum <>' nums.txt 

Comments

5

More succinct:

# Ruby ruby -e 'puts open("random_numbers").map(&:to_i).reduce(:+)' # Python python -c 'print(sum(int(l) for l in open("random_numbers")))' 

1 Comment

Converting to float seems to be about twice as fast on my system (320 vs 640 ms). time python -c "print(sum([float(s) for s in open('random_numbers','r')]))"
5

I couldn't just pass by... Here's my Haskell one-liner. It's actually quite readable:

sum <$> (read <$>) <$> lines <$> getContents 

Unfortunately there's no ghci -e to just run it, so it needs the main function, print and compilation.

main = (sum <$> (read <$>) <$> lines <$> getContents) >>= print 

To clarify, we read entire input (getContents), split it by lines, read as numbers and sum. <$> is fmap operator - we use it instead of usual function application because sure this all happens in IO. read needs an additional fmap, because it is also in the list.

$ ghc sum.hs [1 of 1] Compiling Main ( sum.hs, sum.o ) Linking sum ... $ ./sum 1 2 4 ^D 7 

Here's a strange upgrade to make it work with floats:

main = ((0.0 + ) <$> sum <$> (read <$>) <$> lines <$> getContents) >>= print 
$ ./sum 1.3 2.1 4.2 ^D 7.6000000000000005 

Comments

4
cat nums | perl -ne '$sum += $_ } { print $sum' 

(same as brian d foy's answer, without 'END')

2 Comments

I like this, but could you explain the curly brackets? It's weird to see } without { and vice versa.
@drumfire see @brian d foy's answer above with perl -MO=Deparse to see how perl parses the program. or the docs for perlrun: perldoc.perl.org/perlrun.html (search for -n). perl wraps your code with { } if you use -n so it becomes a complete program.
4

Just for fun, lets do it with PDL, Perl's array math engine!

perl -MPDL -E 'say rcols(shift)->sum' datafile 

rcols reads columns into a matrix (1D in this case) and sum (surprise) sums all the element of the matrix.

2 Comments

How fix Can't locate PDL.pm in @INC (you may need to install the PDL module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.22.1 ?)) for fun of course=)
You have to install PDL first, it isn't a Perl native module.
4

C++ "one-liner":

#include <iostream> #include <iterator> #include <numeric> using namespace std; int main() { cout << accumulate(istream_iterator<int>(cin), istream_iterator<int>(), 0) << endl; } 

Comments

3

Here is a solution using python with a generator expression. Tested with a million numbers on my old cruddy laptop.

time python -c "import sys; print sum((float(l) for l in sys.stdin))" < file real 0m0.619s user 0m0.512s sys 0m0.028s 

1 Comment

A simple list comprehension with a named function is a nice use-case for map(): map(float, sys.stdin)
2
sed ':a;N;s/\n/+/;ta' file|bc 

Comments

2

Running R scripts

I've written an R script to take arguments of a file name and sum the lines.

#! /usr/local/bin/R file=commandArgs(trailingOnly=TRUE)[1] sum(as.numeric(readLines(file))) 

This can be sped up with the "data.table" or "vroom" package as follows:

#! /usr/local/bin/R file=commandArgs(trailingOnly=TRUE)[1] sum(data.table::fread(file)) 
#! /usr/local/bin/R file=commandArgs(trailingOnly=TRUE)[1] sum(vroom::vroom(file)) 

Benchmarking

Same benchmarking data as @glenn jackman.

for ((i=0; i<1000000; i++)) ; do echo $RANDOM; done > random_numbers 

In comparison to the R call above, running R 3.5.0 as a script is comparable to other methods (on the same Linux Debian server).

$ time R -e 'sum(scan("random_numbers"))' 0.37s user 0.04s system 86% cpu 0.478 total 

R script with readLines

$ time Rscript sum.R random_numbers 0.53s user 0.04s system 84% cpu 0.679 total 

R script with data.table

$ time Rscript sum.R random_numbers 0.30s user 0.05s system 77% cpu 0.453 total 

R script with vroom

$ time Rscript sum.R random_numbers 0.54s user 0.11s system 93% cpu 0.696 total 

Comparison with other languages

For reference here as some other methods suggested on the same hardware

Python 2 (2.7.13)

$ time python2 -c "import sys; print sum((float(l) for l in sys.stdin))" < random_numbers 0.27s user 0.00s system 89% cpu 0.298 total 

Python 3 (3.6.8)

$ time python3 -c "import sys; print(sum((float(l) for l in sys.stdin)))" < random_number 0.37s user 0.02s system 98% cpu 0.393 total 

Ruby (2.3.3)

$ time ruby -e 'sum = 0; File.foreach(ARGV.shift) {|line| sum+=line.to_i}; puts sum' random_numbers 0.42s user 0.03s system 72% cpu 0.625 total 

Perl (5.24.1)

$ time perl -nle '$sum += $_ } END { print $sum' random_numbers 0.24s user 0.01s system 99% cpu 0.249 total 

Awk (4.1.4)

$ time awk '{ sum += $0 } END { print sum }' random_numbers 0.26s user 0.01s system 99% cpu 0.265 total $ time awk '{ sum += $1 } END { print sum }' random_numbers 0.34s user 0.01s system 99% cpu 0.354 total 

C (clang version 3.3; gcc (Debian 6.3.0-18) 6.3.0 )

 $ gcc sum.c -o sum && time ./sum < random_numbers 0.10s user 0.00s system 96% cpu 0.108 total 

Update with additional languages

Lua (5.3.5)

$ time lua -e 'sum=0; for line in io.lines() do sum=sum+line end; print(sum)' < random_numbers 0.30s user 0.01s system 98% cpu 0.312 total 

tr (8.26) must be timed in bash, not compatible with zsh

$time { { tr "\n" + < random_numbers ; echo 0; } | bc; } real 0m0.494s user 0m0.488s sys 0m0.044s 

sed (4.4) must be timed in bash, not compatible with zsh

$ time { head -n 10000 random_numbers | sed ':a;N;s/\n/+/;ta' |bc; } real 0m0.631s user 0m0.628s sys 0m0.008s $ time { head -n 100000 random_numbers | sed ':a;N;s/\n/+/;ta' |bc; } real 1m2.593s user 1m2.588s sys 0m0.012s 

note: sed calls seem to work faster on systems with more memory available (note smaller datasets used for benchmarking sed)

Julia (0.5.0)

$ time julia -e 'print(sum(readdlm("random_numbers")))' 3.00s user 1.39s system 136% cpu 3.204 total $ time julia -e 'print(sum(readtable("random_numbers")))' 0.63s user 0.96s system 248% cpu 0.638 total 

Notice that as in R, file I/O methods have different performance.

Comments

2

In Go:

package main import ( "bufio" "fmt" "os" "strconv" ) func main() { scanner := bufio.NewScanner(os.Stdin) sum := int64(0) for scanner.Scan() { v, err := strconv.ParseInt(scanner.Text(), 10, 64) if err != nil { fmt.Fprintf(os.Stderr, "Not an integer: '%s'\n", scanner.Text()) os.Exit(1) } sum += v } fmt.Println(sum) } 

2 Comments

What is "64"? "10" I suppose is base?
Yes, 10 is the base. 64 is the number of bits, if the resulting int can't be represented with that many bits then an error is returned. See golang.org/pkg/strconv/#ParseInt
2

Bash variant

raw=$(cat file) echo $(( ${raw//$'\n'/+} )) $ wc -l file 10000 file $ time ./test 323390 real 0m3,096s user 0m3,095s sys 0m0,000s 

What is happening here? Read the content of a file into $raw var. Then create math statement from this var by changing all new lines into '+'

Comments

2

As long as there only integer-numbers i basically translate the file into an bash math expression and execute it. It is similar to the solution with 'bc' from further above, but faster. Observe the zero at the end of the inner expression is needed for an argument of the final line. I have tested it with 475.000 lines and it is less than a second.

echo $(($(cat filename | tr '\n' '+')0)) 

1 Comment

There is no "above" or "below"; the answers are sorted according to each visitor's personal preference.
1

Another for fun

sum=0;for i in $(cat file);do sum=$((sum+$i));done;echo $sum 

or another bash only

s=0;while read l; do s=$((s+$l));done<file;echo $s 

But awk solution is probably best as it's most compact.

Comments

1

With Ruby:

ruby -e "File.read('file.txt').split.inject(0){|mem, obj| mem += obj.to_f}" 

1 Comment

Another option (when input is from STDIN) is ruby -e'p readlines.map(&:to_f).reduce(:+)'.
0

I don't know if you can get a lot better than this, considering you need to read through the whole file.

$sum = 0; while(<>){ $sum += $_; } print $sum; 

5 Comments

Very readable. For perl. But yeah, it's going to have to be something like that...
$_ is the default variable. The line input operator, <>, puts it's result in there by default when you use <> in while.
@Mark, $_ is the topic variable--it works like the 'it'. In this case <> assigns each line to it. It gets used in a number of places to reduce code clutter and help with writing one-liners. The script says "Set the sum to 0, read each line and add it to the sum, then print the sum."
@Stefan, with warnings and strictures off, you can skip declaring and initializing $sum. Since this is so simple, you can even use a statement modifier while: $sum += $_ while <>; print $sum;
for the rest of us who can't easily, how about you indicate which language this is in? PHP? Perl?
0

I have not tested this but it should work:

cat f | tr "\n" "+" | sed 's/+$/\n/' | bc 

You might have to add "\n" to the string before bc (like via echo) if bc doesn't treat EOF and EOL...

2 Comments

It doesn't work. bc issues a syntax error because of the trailing "+" and lack of newline at the end. This will work and it eliminates a useless use of cat: { tr "\n" "+" | sed 's/+$/\n/'| bc; } < numbers2.txt or <numbers2.txt tr "\n" "+" | sed 's/+$/\n/'| bc
tr "\n" "+" <file | sed 's/+$/\n/' | bc
0

Here's another:

open(FIL, "a.txt"); my $sum = 0; foreach( <FIL> ) {chomp; $sum += $_;} close(FIL); print "Sum = $sum\n"; 

Comments

0

You can do it with Alacon - command-line utility for Alasql database.

It works with Node.js, so you need to install Node.js and then Alasql package:

To calculate sum from TXT file you can use the following command:

> node alacon "SELECT VALUE SUM([0]) FROM TXT('mydata.txt')" 

Comments

0

It is not easier to replace all new lines by +, add a 0 and send it to the Ruby interpreter?

(sed -e "s/$/+/" file; echo 0)|irb 

If you do not have irb, you can send it to bc, but you have to remove all newlines except the last one (of echo). It is better to use tr for this, unless you have a PhD in sed .

(sed -e "s/$/+/" file|tr -d "\n"; echo 0)|bc 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.