Skip to main content
3 of 3
added 6 characters in body
Kusalananda
  • 356.1k
  • 42
  • 737
  • 1.1k

Here's a couple of utilities for you. The first one calculates the average of the numbers given to it (one number per line). The second one uses the first to calculate the standard deviation from the numbers in a file.


The executable file average:

#!/usr/bin/awk -f /^[0-9.+-]/ { sum += $0; ++n } END { print sum / n } 

This awk script will read input from a file or from standard input and compute the average of the numbers therein. It expects one number per line.


The executable file stdev:

#!/bin/sh awk -v avg="$( ./average "$1" )" \ '/^[0-9.+-]/ { sum += ($0 - avg)^2; ++n } END { print sqrt(sum / (n - 1)) }' "$1" 

This shell script will first use the above average script to compute the average of the data in the file given on the command line. This number is assigned to the awk variable avg. It then uses the same kind of number detection as the average script to calculate the standard deviation.

As this script is written right now, it requires data from a file, not on standard input.


One way of using this on your data:

sed -n '/^<Score>/s///p' input.dat >output.dat 

With the given data, this will generate a file called output.dat containing the following:

4 2 3 

Using the stdev script above on this file:

$ ./stdev output.dat 1 

Which is correct, as far as I can see.


Of course, you may do it directly in one awk call as well, without building any type of reusable tools:

awk -F '>' '/^<Score>/ { v[++n] = $2; s += $2 } END { avg = s/n; for (i=1; i<=n; ++i) { std += (v[i] - avg)^2; } print sqrt(std / (n - 1)); }' input.dat 
Kusalananda
  • 356.1k
  • 42
  • 737
  • 1.1k